> easily half the code I was writing in Multics was error recovery code.
And it can be worse. IBM, I think it was, back in the 1970s or 80s, said that 80% of the lines of production-ready code dealt with errors - only 20% was doing the actual program functionality.
> I've yet to see numbers on the probability of losing access to my email on Gmail vs another provider.
This is a great point. People often act as the other providers are 100 percent reliable without any numbers to back it up. Grass is always greener on the other side. To be fair, Google’s customer service is non-existent though.
Thing is, with other providers, all I'm getting is email. With Google, I'm getting a bunch of services, all interconnected, and any of them could potentially get my entire Google account banned. One of the fuck ups I can recall is a bunch of people getting their Google account banned because they typed in chat of a Youtube livestream and some algorithm falsely picked it up, cutting them off from everything.
I own the domain, I control DNS, I pay a provider for email, and my phone and laptop have full downloads via IMAP. The last step aside I don't think that is an uncommon setup. Uptime might be worse, I don't know that there is a real problem there losing access.
I feel copyleft licenses look more favourable at this point of time. What’s the value of more free/business friendly licenses if you can’t guarantee that the same license will apply for all the future releases? Looks more like a bait and switch policy.
Am I right in understanding that the relicensing was possible because of the CLA, not just because of the BSD license? Would a permissively licensed project that didn't use a CLA be vulnerable in the same way?
A key concern is that BSD isn't viral, so anyone can take BSD Redis and fork it into a commercial offering. If you want to, you can. The Redis trademark prevents anyone but Redis the company from calling their fork "Redis".
A CLA may impact relicencing, it depends on the terms. A simple CLA may only say "I am the owner of the code and I release it under $LICENSE". The current Redis CLA also has a copyright grant, which gives Redis the company greater rights.
“Viral” just means that the license has a “no additional restrictions” clause, not that you can’t make a commercial offering out of it. That’s why GPL and AGPL don’t really solve the problem.
And the problem with the trademark model is that AWS, and especially Microsoft, already have established brand recognition with the people who sign the big SaaS and support contracts. The people who know what a Redis is are just nerds with no money, the real big shots do everything in Microsoft Excel.
A permissively licensed project without a CLA would be similarly vulnerable, because the BSD license allows them to make releases that include your code under a stricter license. To prevent them relicensing you would need both a strong copyleft in the license and no CLA/copyright assignment (like e.g. Linux - which can't even move to GPLv3 even if they wanted to, because it would be simply impossible to get all contributors' permission).
No, since you can include BSD-licensed code in non-free software with just an attribution. The only difference between relicensing Redis from BSD+CLA to SSPL and BSD to SSPL is that the former would've had a more detailed REDISCONTRIBUTIONS.txt.
The copyright owners of a GPL software can do whatever they want with future versions, even going proprietary. The problem is that all the owners must agree on that. That's why some GPL software only accepts contributions by people that give copyright to a single maintainer entity. An example is FSF's copyright transfer, which to be fair is more nuanced than that and has also other purposes.
The big outlier not listed here is apple. Quick overview from someone who's written binary analysis tools targeting most of these:
Mach-O is the format they use, going back to nextstep. They use it everywhere including their kernels and their internal only L4 variant they run on the secure enclave. Instead of being structured as a declarative table of the expected address space when loaded (like PE and and ELF), Mach-O is built around a command list that has to be run to load the file. So instead of an entry point address in a table somewhere, mach-o has a 'start a thread with this address' command in the command list. Really composable, which means binaries can do a lot of nasty things with that command list.
They also very heavily embrace their ld cache to the point that they don't bother including a lot of the system libraries on the actual root filesystem anymore, and the kernel is ultimately a cache of the minimal kernel image itself as well as the drivers need at least to boot all in one file (and actually all of the drivers I think on iOS with driver loading disabled if it's not in the cache?).
There's a neat wrapper format of Mach-O called "fat binaries" that lets you hold multiple Mach-O images in one file, tagged by architecture. This is what's letting current osx have the same application be a native arm binary and a native x86_64 binary, the loader just picks the right one based on current arch and settings.
I think those are the main points, but I might have missed something; this was pretty off the cuff.
> Mach-O is built around a command list that has to be run to load the file. So instead of an entry point address in a table somewhere, mach-o has a 'start a thread with this address' command in the command list. Really composable, which means binaries can do a lot of nasty things with that command list.
Conceptually not much has changed since the book was written, but in practice there has been a lot of advancement. For example, ASLR and the increase in the number of libraries has greatly increased the pressure to make relocations efficient, modern architecture including PC relative load/store and branch instructions has greatly reduced the cost of PIC code, and code signing has made mutating program text to apply relocations problematic.
On Darwin we redesigned our fixup format so it can be efficiently applied during page in. That did in include adding a few new load commands to describe the new data, as well as a change in how we store pointers in the data segments, but those are not really properties of mach-o so much as the runtime.
I generally find that a lot of things attribute to the file format are actually more about how it is used rather than what it supports. Back when Mac OS X first shipped people argued about PEF vs mach-o, but what all the arguments all boiled down to was the calling convention (TOC vs GOT), either of which could have been support by either format.
Another example is symbol lookup. Darwin uses two level namespaces (where binds include both the symbol name and the library it is expected to be resolved from), and Linux uses flat namespaces (where binds only include the symbol name which is then searched for in all the available libraries). People often act as though that is a file format difference, but mach-o supports both (though the higher level parts of the Darwin OS stack depend on two level namespaces, the low level parts can work with a flat namespace, which is important since a lot of CLI tools that are primarily developed on Linux depend on that). Likewise, ELF also supports both, Solaris uses two level namespaces (they call it ELF Direct Binding).
I don’t disagree about the nature of load commands but Apple has been moving away from, say, LC_UNIXTHREAD for years at this point. For the most part load commands are becoming more and more similar to what ELF has, with a list of segments, some extra metadata, etc.
The mechanisms for Windows DLLs have been changed a lot(like how thread local vars are handled). Besides, this book could not cover C++11's magic statics, or Windows' ARM64X format, or Apple's universal2, because these things are very new. Windows now has the apiset concept, which is very unique. Upon it there are direct forwarding and reverse forwarders.
I think for C/C++ programmers it is more practical to know that:
1. The construction/destruction order for global vars in DLLs(shared libs) are very different between Linux and Windows. It means the same code might work fine on one platform but no the other one. It imposes challenges on writing portable code.
2. On Linux it is hard to get a shared lib cleanly unloaded, and it might affect how global vars are destructed, and might cause unexpected crashes at exit.
3. Since Windows has a DLL loader lock, there are a lot of things you cannot do in C++ classes constructors/destructors if the classes could be used to define a global variable. For example, no thread synchronization is allowed.
4. It is difficult to cleanup a thread local variable if the variable lies in a DLL and the variable's destructor depends on another global object.
5. When one static lib depends on another, a linker wouldn't use this information to sort the initialization order of global vars. It means it could be the case that A.lib depends on B.lib but A.lib get initialized first. The best way to avoid this problem is using dynamic linking.
For Windows related topics I highly recommend the "Windows Internals" book.
I have a hard-copy of this book, and it seems like the PDF isn't the final version, judging by the hand drawn illustrations at least.
The book does dive into some old and arcane object file formats, but that's interesting in its own way. Nice to get a comparison of how different systems did things.
After reading that book and other sources, I examined how people typically create libraries in the UNIX/Linux world. Some of the common practices seem to be lacking, so I wrote an article about how they might be improved: https://begriffs.com/posts/2021-07-04-shared-libraries.html
Edit: That covers the dynamic linking side of things at least.
Still pretty much relevant in terms of introductory understanding. Some specific details seem slightly anachronistic for the general population (segmented memory for example, which still exists in modern PC hardware but is of little import to the great majority).
The concepts are still relevant, but the specifics are mostly outdated. If you read this and then read the relevant standards documents for your platform, you should have a good grounding. I don't know of any other books that cover the topic well.
The syntax and also the semantics. For instance, you can take the sizeof() a variably-modified type, or the offsetof() one of its fields, and the compiler has to do all the layout calculations implied by the type declaration at runtime. These features are partially what motivated the mandatory support. The only part that is still optional is using such types by value as stack variables (i.e., variables with automatic storage duration).
Snarky or just know a lot? It changes quite a bit how the compiler works. It has to know to make sure malloc gets the array element size argument multiplied in runtime by n. To a mere user it broke my mental shorthand of how a C compiler works.
so you can have "char *y = malloc(sizeof(x)); memcpy(y, x, sizeof(x));" and it must work since C89 at least. The main problem with VLAs is that they make exact the stack frame size unknown until runtime which complicates function prologues/epilogues but that's the problem in the codegen part of the backend, the semantics machinery is mostly in the place already.
P.S. And yes, uecker is a member of the ISO C WG14 and GCC contributor, according to his profile.
I think there is a real difference: in the static case, the compiler can just recurse into the type definition at any point, compute fully-baked sizes and offsets, and cache them for the rest of the compilation. But in the dynamic case, you end up with the novel dataflow of types that depend on runtime values, and more machinery is necessary to track those dependencies.
Of course, this runtime tracking has always been necessary for C99 VLA support, but I can easily see how it would be surprising for someone not deeply familiar with VLAs, especially given how the naive mental model of "T array[n]; is just syntax sugar for T *array = alloca(n * sizeof(T));" is sufficient for almost all of their uses in existing code. (In any case, it's obviously not "creating an object on the heap" that's the unusual part here!)