It also runs sudo for you when needed.
The wrapper part is only for the package name as it was already taken on pypi.
Apt search somepackage
Apt-cache search somepackage
apt-cache search somepackage | grep '^somepackage'
apt-cache policy <somepackage>
Instead, think of all the millions of satisfied users of the software, none of whom bothered to fix up this problem until now.
One byte at a time could've been a quick hack that was "good enough" in the original author's use case, because he wasn't reading a lot of data or only calling it once or something. Somebody else saw "readline" and decided it was exactly what he needed and decided to use it everywhere.
In my experience, "self documenting" code is brittle and hard to change because nobody knows (or remembers) which implementation details are important and which aren't, so they're afraid to change it. Is it reading a byte at a time for a reason? Or does it not matter and it's just happens to be done that way?
There's more to software development than just churning out code, and being disciplined about keeping comments up to date is one of those things that just needs to be done, IMO.
In addition to good naming, strong static types and functional purity help a great deal (but I won't try to evangelize haskell here :))
> ...hidden (because if someone is reading code that calls myVeryWellDocumentedFunction, they won't see the documentation, especially if the function is just incompletely named)
what is your proposed alternative? Ideally a function would do the only possible sensible thing imaginable, but that is frequently not attainable.
> ...inaccurate (because the only accurate thing is the code)
Well when the code doesn't match the spec in the comments, we've found a bug in the code, or a bug in the spec. Either way we can find someone to blame. This is vastly better than finding a bug and not knowing whether the function or the caller should be fixed.
Can take a long time to get to step 3 with something as safety critical as an update tool.
To their credit, they started really early. And burning it all down seems to be an increasingly better option for purely extrinsic reasons; the late mover advantage keeps increasing, and better and better options keep emerging. A LMDB-based refactor perhaps? This is probably a pipe dream, alas: Debian developers must have a hundred different programs that all build and extend forward the original "make it work" technology, to make it survivable. And that's really my final counter to your cute linear notion of ongoing improvement: the mold is set, and there are dozens of things bound up in what has happened.
RPM systems have the advantage IIRC that they can basically just download a bunch of sqlite3 databases, which means they do not have to parse any data.
We have so many non- and semi- skilled people using APT that if you Google many APT errors, you actually tend to come across people using Cydia as the primary people discussing the error condition ;P. Our package churn and update rate is faster than Debian, and we have run into all of the arbitrary limits in various versions of APT (total number of packages known about, total number of delete versions, total number of bytes in the cache): really, we use APT a lot.
1) Despite APT supporting cumulative diffs (where the client gets a single diff from the server to bring them up to date rather than downloading a ton of tiny incremental updates and applying them in sequence), Debian's core repositories are not configured to generate these. I can tell you from experience that providing cumulative diffs is seriously important.
So, while a 20x speed-up applying a diff is cool and all, users of Debian's servers are doing this 20x more often that they need to, applying diff after diff after diff to get the final file. This is an example of an optimization at a low-level that may or may not be useful as the real issue is at the higher-level in the algorithm design.
What is extra-confusing is that the most popular repository management tool, reprepro, can build cumulative diffs automatically, and I think it does so by default. Debian really should switch to using this feature: I keep seeing Debian users complain on forums and blog posts that APT diff updates are dumb as you end up downloading 30 files... no: the real issue is that debian isn't using their own tool well :(.
2) The #1 performance issue I deal with while using APT even on my server is the amount of time it takes to build the cache file every time there is a package update. It sits there showing you a percentage as it does this on the console. On older iPhones this step was absolutely brutal. This step was taking some of my users minutes, but again: that is the step I most notice on my server.
I spent a week working on this years ago, and made drastic improvements. I determined most of the time was spent in "paper cuts": tiny memory allocations and copies distributed through the entire project which over the course of running the code hemorrhaged all the time.
The culprit (of course ;P) was std::string. As a 20-year user of C++ who spent five years in the PC gaming industry, I hate std::string (and most of STL really: std::map is downright idiotic.. it allocates memory even if you never put anything into the map, and I can tell you from writing my own C++ red-black tree tools that there is no good reason for this).
Sure, maybe APT is using C++11 by now and has a bunch of move constructors all over the place that mitigate the issue somewhat (I haven't looked recently), but it still feels "weirdly slow" to do this step on my insanely fast server (where by all rights it should be instantaneous) and frankly: APT's C++ code when I was last seriously looking at the codebase was abysmal. It was essentially written against one of the very first available versions of C++ by someone who didn't really know much about the language (meaning it uses all the bad parts and none of the good; this happens when Java programmers try to use C++98, for example, but APT is much much worse) and has no rhyme or reason to a lot of the design. It reminds me a little of WebKit's slapped together "hell of random classes and pointers that constantly leads to use-after-free bugs".
Regardless, I rewrote almost every single usage of std::string in the update path to use a bare pointer and a size and pass around fragments of what had been memory mapped from the original file whenever possible without making any copies. I got to be at least twice if not four times faster (I don't remember). I made the code entirely unmaintable while doing this, though, and so I have never felt my patches were worth even trying to merge back (though it also took me years to ever find the version control repository where APT was developed anyway... ;P). To this day I ship some older version of APT that I forked rather than updating to something newer, due to a combination of this and the gratuitous-and-unnecessary ABI breakage in APT (they blame using C++, but that isn't quite right: the primary culprit is their memory-mapped cache format, and rather than use tricks when possible to maintain the ABI for it they just break it with abandon; but even so, the C++ is buying me as a user absolutely nothing: they should give me a thin C API to their C++ core.)
If I were to do this again "for real" I would spend the time to build some epic string class designed especially for APT, but I just haven't needed to do this as my problem is now "sort of solved well enough" as I almost have never cared about the new features that have been added to APT, and I have back ported the few major bug fixes I needed (and frankly have much better error correction now in my copy, which is so unmaintainably drifted due to this performance patch as to not be easily mergable back :/ but we really really need APT to never just give up entirely or crash if a repository is corrupt, and so they are also critical for us in a way they aren't for Debian or Ubuntu).
If anyone is curious what these miserable patches look like, here you go... check out "tornado" in particular. (Patches are applied by my build system in alphabetical order.) (Actually, I have been reading through my tornado patch and I actually did at some point while working on it build a tiny custom string class to help abstract the fix, but I assuredly didn't do it well or anything. I really only point any of this maintainability issue out at all, by the way, as I don't want people to assume that performance fundamentally comes at the price of unmaintainable implementations.)
(2) The cache generation is somewhat slow, it currently takes about 2 seconds on a 2 generation old laptop.
I sped it up a bit now, and replaced a map in that path with an unordered_map in the process.
Profiling it does not show anything obvious that can be optimized, though, most of the time is spent parsing files.
Most importantly, as you can see in https://people.debian.org/~jak/gencache.svg only about 10% are spent in the string code.
(3) We don't want to recover from errors. Error recovery masks problems.
(4) ABI breaks: We try to avoid them a lot. We added tons of dpointers everywhere these days, so we can extend stuff as needed without breaking ABI.
Sometimes you do need to break ABI. For example, offsets were 32-bit sometimes causing large files not to work correctly.
But fixing that requires an ABI break, so it has to wait for a chance. Like in a year or something.
The web page for raptorial seems to be offline, the project is two years stale, and I spent a few minutes glancing through the code and came across a "fix me" comment next to the version comparison function (one of the most important parts of anything that even sort of works with APT, and something I have now written many implementations of in various languages) saying it doesn't work correctly for numbers, and so the project has earned a "vote of no confidence" from me (sorry :().
Here's the commit:
also further improvement:
and finally, dropping strings out of StoreString(), so it's now almost entirely gone:
... or you could instead re-work it to use what are now called string_view and what used to be called string_ref.
I'd be happy to try and configure/host such a thing if there's no difficulties.
I think you are one of the few people that truly understand the ecosystem and probably are the right person to influence this.
Incidentally, I had a pretty interesting bthread today on the question bwhether we will see a single package manager format on Linux. Wonder what your thoughts are...and in general the state of package management on Linux. Would you build Cydia on something else today?
The ftpmasters and the apt maintainers would be the appropriate pair of contacts.
The reality here is that I don't have the months required to really fix the one place here you could say I have "bagged on" APT, and it is not clear to me that the real problem (which is that the APT community is using C++ in weird or even simply "too many" ways) is fixable.
That said, I will also ask you a question: have you ever really cared how fast APT is? I notice it being slow on my server, but I know why it is slow and where it is slow... it is certainly much faster than Ruby bundler, for example. Is there any good reason at all for anyone but me to change all this code?
The OP complained about some performance issue applying diffs, but as I explained, the real issue with diff updates is that Debian isn't sending cumulative diffs: it doesn't really matter that that is slow either.
So I don't know what you really think I should be doing here. Do you want me to try to train the APT developers in some way? That won't work and is frankly kind of arrogant in a way that commenting from the sidelines isn't. The truly asshole "Web 2.0 GitHub generation" solution at this point would be to fork APT and "compete" with it, but I really really think people who push public forks of key projects are assholes: I am not going to do that.
But if we are going to talk about performance issues in this tool, I am going to try to provide some access to my background about what is going on and why: you essentially will never see me sit around and complain about APT in a context outside of this (and in fact you will generally find me defending APT against developers who are quick to judge something by quality of detail rather than quality of design: reddit makes it difficult to find old comments, but if you really cared you would find me constantly pointing out that APT can be improved rather than thrown out).
Try it out, we're always listening and around in #debian-apt on OFTC.
I once found a similar thing in a commercial setting. A program had been written which could set one bit in a shared memory region for use as a feature flag. It was slowly expanded in use until it was being run thousands of times a day on hundreds of very expensive computers. I wrote a replacement program which read commands from a file and set all the bits in one invocation. This saved well over 100k USD per year if you just added up the cost of computer time without staffing. I imagine this APT fix is worth more than that.
No, most of the time is spent fsync'ing and it is a problem at a lower layer (related to dpkg), but ot apt. Too many bug reports to list but see  for the latest one.
You can use apt with eatmydata as a workaround.
Why not use memory mapped files and let the operating system deal with caching in a way that is most efficient for that specific case?
Even more so because after gem, or pip, or something (can't remember) had a similar issue a while ago (think they had an n2 algorithm) a lot of people jumped on it as being bad computer science. There were all sorts of calls about how web people were not real computer scientists.
Either way, good useful products were made and they've been further optimised. That's great. More of that more of the time.