I chased that rabbit hole briefly and it's not very clear that the hashed value is required to be <= UINT32_MAX. Closest is a claim by the same author as this post:
> It seems obvious that on 32-bit and 64-bit systems, the function should not give different results
and a commit to mask off the low bits in an implementation elsewhere.
Well, maybe that would be convenient, but overall it seems unimportant. It's necessary for the tool writing the table and the tool reading it to agree but cross compilation is absolutely full of hazards like this anyway.
The code looks fine to me for what that's worth. I can see the assignment in the if being contentious.
The behavior is obviously an oversight. I'd bet 1 to 10 that if you chased down the original author he would agree.
No sensible engineer would design a hash function that populates the lower 28 bits of the hash code, ALWAYS leaves bits 28 through 31 clear, and then SOMETIMES sets bit 32, but only rarely and only on certain architectures.
It makes no sense as a conscious design. The logical conclusion is that the intent was to create a 28-bit hash function, and the fact that the provided code sometimes sets bit 32 is clearly a bug.
I think the issue is that, when written, a `long` was 32 bits. I would guess the author was familiar with the concept of a carry bit, but they didn't care because the carry bit was discarded by their architecture.
Back when ELF was designed that architectures larger than 32 bits were extremely uncommon, either obsolete (36 and 40 bit) or expensive and exotic (Cray) so in neither case part of the ELF design space. So not a huge surprise.
I remember thinking at the time that it was an oversight but it took more than another decade for that to even matter.
I have a question: what should I read for an introduction to the implementation/internals/design of hash functions?
I would like to to beyond my current understanding, which is basically “they’re effectively one-way functions”, and be able to participate in discussions of articles such as this one.
Cryptographic hash functions are designed to resist existing attacks, so you'll want an understanding of differential & linear cryptanalysis, as well as a variety of algebraic attacks.
I don't know of a good textbook on the subject, so you might find yourself searching keywords on https://eprint.iacr.org/
If someone checked in that code, it would definitely fail my code review. I understand back in the day it was different, but today there should be a lot of named intermediates. Additionally, `long` and any such keywords should not make it into any commit unless the commit explains
1) why its needed and
2) how, with any standard conforming implementation, it couldnt possibly cause a bug.
As always in C programming, the bugs arise from people doing stuff that any sane guideline tells them to not do.
Named intermediates for what? This function munges a single hash value, h, and then folds its bits 24 to 31, g, into bits 4 to 7. It implements a mathematical formula, basically, and given it’s a hash, there isn’t much meaning to the contents of any pair of parens in that formula. Perhaps *name++, but any C programmer just thinks “next character of name” when they see that, don’t they?
(I’ve seen people code with a lot of intermediates with two- and three-word names, and I just don’t see why, in general. But here especially—hashes are not exactly oases of meaning.)
The 'unsigned long' point is correct in a cross-platform context, of course. If the code is from the SunOS linker, then its authors defined the ABI, so could guarantee 'long' was 32 bits. It’s the Glibc port that was careless.
(Although if I heard the phrase “any standard-conforming implementation” in this context, I’d be tempted to point out that an implementation doesn’t have to provide uint32_t at all, though a POSIX 2008 one does, and may have 33-bit ints, automatically promoting the uint32_t to signed int and immediately hitting UB on overflow. Either use C23 _BitInt(32), not subject to promotions for precisely this reason, or add +0U as needed to force the hypothetical signed int to unsigned.)
There is a lovely piece of text in the third version of PNG specification: "PNG four-byte unsigned integers are limited to the range 0 to 2^31-1 to accommodate languages that have difficulty with unsigned four-byte values" [0]. Gee, I wonder what languages those may be?
C has no problem dealing with a uint32_t. Not sure what you are getting at.
This is more of an issue with languages like java that abstract away integer widths and signs, which is convenient if you're only doing arithmetic but becomes a huge pain when dealing with binary data.
But the rest of the code? I guess you could make it so the dereference and pointer-increment happen separately. And for someone unfamiliar you could expand out the loop condition.
But ultimately it's a hash function. How would you write it?
I don't understand what it is you expect the compiler to stop. There's nothing actually wrong with the code. It's just written in a platform-dependent manner.
The base integer types have always been open ended since C89. It has always been wrong to assume an exact size when writing code that depends on modular wraparound. The code is wrong.
Prior to stdint.h there was no portable way to address this but new code should be written using the type system as intended. Open ended when the minimum is sufficient and larger storage is inconsequential. Exact size when the algorithm requires it.
There's actually a fairly big difference with the context of the language preventing the error.
If the code is wrong as in undefined behavior, then the compiler can and maybe should try to prevent it. If the programmer is wrong about the code, then the compiler can't and definitely shouldn't try to prevent it.
Its not the same. The compiler can protect against code which is incorrect. The compiler cannot do anything to help with code that is correct but does the wrong thing.
But if the code was written with intention to be platform-independent, and turned out to be platform-dependent... it's the programmer who wrote it is wrong, not the code itself? Or do I misunderstand you completely?
Let's say I know the Swedish language, in which "öl" means "beer", and I travel to Germany to partake in the Oktoberfest celebrations. I arrive and in Germany and immediately try to order a beer and try to say "One Beer Please", but not mastering the language I say "Ein Öl Bitte" (meaning "one oil please"). I don't think the that's a bug in the German language, and the sentence itself is perfectly good German. It just doesn't say what I think it does.
ELF is way too complex and not really adapted anymore.
We should start to deprecate DT_NEEDED and make dlopen/dlsym/dlclose (maybe, dlvsym) hard symbols in the loader.
And game devs should stop using main() as some genius glibc dev did add a new libc_start_main version in 2.34. Namely, any game executable linked with a glibc from 2.34 will refuse to load on system with a previous glibc.
Actually, game binaries should be pure ELF64 binaries (not using main()) which "libdl" (dlopen/dlsym/dlclose) everything they need from the system. And of course, as much as possible should be statically linked (I think this is what unity is doing, but unreal/godot have a big issue: the static libstdc++ which, as of late, does not libdl anything from the system).
Use rpath and ship your dependencies. Even if you dlopen every shared library explicitly you have the same problem.
The glibc version problem is not new, you have never been able to rely (safely) that a version of glibc exists that works for your compiled program. This is an example of why containers exist, but unfortunately the problem isn't ELF - it's glibc!
And for that matter, glibc also ships the loader. This is the part that's mildly insane.
Yep, the issue is mostly the glibc but ELF does not help. Additionaly, the static libstdc++ is also an issue as it seems not "libdl-ing" any of its system dependencies (did not check if c++ gcc devs fixed that already).
That does not mean ELF is not overkill nowdays. In the case of dynamic linking, deprecating DT_NEEDED to rely on hardcoded (with probably specific relocations) and simplified dlopen/dlsym/dlclose in the ELF interpreter (that would deprecate tls_get_addr() as it would become redondant with dlsym) seems to be a sane cleanup: explicitely split dynamic linking from static linking.
Nowadays mitigation:game devs should go pure ELF64 (no main()) for their binaries and fully libdl-ized. The hard part is to fork a gcc static libstdc++ to libdl-ize its system dependencies (dunno if it was done).
I think what you're advocating for is getting rid of automatic dynamic loading, which is certainly a take.
> go pure ELF64 (no main())
What does "pure" ELF64 even mean? No dependency on libc?
ELF is just an object file format. It doesn't imply anything about how the loader behaves or what features an ELF loader must have, or how that object relates to other ELF objects.
Yep, services from the libc would be libdl-ed too. You would have to decide: either you use the machine code in a shared object, or you would statically link that machine code, but the idea is to make those explicit and different, not that current mess.
It means using the sysv x86_64 ABI entry point (which is basically a main()...).
The file format alone is useless, you need the ELF and ABI specs to know how to use properly the information defined by this very file format.
There is a huge amount of tooling relying on DT_NEEDED for dependency detection. I am not so sure about general purpose Linux, but in the embedded Linux world this would be a disaster. The Yocto system for example would no longer be able to determine the runtime dependencies of generated binaries.
For the static library part, this is such a beaten down argument I just will not argue. I hope you enjoy re-installing your OS every time there is an security update on a library like openssl.
You missed the point: using "shared objects" would have to be explicit with "dlopen/dlsym/dlclose'.
Mixing static linking with dynamic linking was not a good idea in the first place, and I mean it.
ELF should be "fixed" about this, but to be sincere and honest, I think a lot could be removed from ELF on modern systems.
Maybe it is not worth to fix ELF, but to go something like NGELF which would be excrutiatingly simpler and cleaner than ELF, namely real and disruptive innovation.
> You missed the point: using "shared objects" would have to be explicit with "dlopen/dlsym/dlclose'.
dlopen and friends are function calls that you cannot evaluate build time. Actually not even at runtime as they are by nature dynamic and conditionally dlopen is a thing. Any shared object dependency tracking would be impossible or a new standard would be required.
Also dlopen is a POSIX standard. ELFs are used in many other places non POSIX.
> Mixing static linking with dynamic linking was not a good idea in the first place, and I mean it.
Why was it not a good idea? This happens all the time, especially the code that is at the very first executable address of the elf until some libc prepares things is arguably statically linked.
> [...] but to go something like NGELF [...]
Sounds interesting. Could you paste a link? I could not find it in google.
> It seems obvious that on 32-bit and 64-bit systems, the function should not give different results
and a commit to mask off the low bits in an implementation elsewhere.
Well, maybe that would be convenient, but overall it seems unimportant. It's necessary for the tool writing the table and the tool reading it to agree but cross compilation is absolutely full of hazards like this anyway.
The code looks fine to me for what that's worth. I can see the assignment in the if being contentious.