ELF hash function may overflow

JonChesterfield · on April 12, 2023

I chased that rabbit hole briefly and it's not very clear that the hashed value is required to be <= UINT32_MAX. Closest is a claim by the same author as this post:

> It seems obvious that on 32-bit and 64-bit systems, the function should not give different results

and a commit to mask off the low bits in an implementation elsewhere.

Well, maybe that would be convenient, but overall it seems unimportant. It's necessary for the tool writing the table and the tool reading it to agree but cross compilation is absolutely full of hazards like this anyway.

The code looks fine to me for what that's worth. I can see the assignment in the if being contentious.

sltkr · on April 12, 2023

The behavior is obviously an oversight. I'd bet 1 to 10 that if you chased down the original author he would agree.

No sensible engineer would design a hash function that populates the lower 28 bits of the hash code, ALWAYS leaves bits 28 through 31 clear, and then SOMETIMES sets bit 32, but only rarely and only on certain architectures.

It makes no sense as a conscious design. The logical conclusion is that the intent was to create a 28-bit hash function, and the fact that the provided code sometimes sets bit 32 is clearly a bug.

dahfizz · on April 12, 2023

It looks like the ELF standard itself says the hash table uses 32 bit values:

> A hash table of Elf32_Word objects supports symbol table access.

https://refspecs.linuxfoundation.org/elf/gabi4+/ch5.dynamic....

userbinator · on April 12, 2023

I suspect the author of the hash function thought this wouldn't add more than 4 bits:

    h = (h << 4) + *name++;

But as one should know, two n-bit numbers can create an n+1-bit result when added due to carry.

dahfizz · on April 12, 2023

I think the issue is that, when written, a `long` was 32 bits. I would guess the author was familiar with the concept of a carry bit, but they didn't care because the carry bit was discarded by their architecture.

MaskRay · on April 13, 2023

I added this sentence to the article, hopefully making it clearer:

> If h is in the range [0x0fffff01,0x0fffffff] in the previous iteration, shifting it by 4 and adding *name may make h larger than UINT32_MAX.

gumby · on April 12, 2023

Back when ELF was designed that architectures larger than 32 bits were extremely uncommon, either obsolete (36 and 40 bit) or expensive and exotic (Cray) so in neither case part of the ELF design space. So not a huge surprise.

I remember thinking at the time that it was an oversight but it took more than another decade for that to even matter.

omginternets · on April 12, 2023

I have a question: what should I read for an introduction to the implementation/internals/design of hash functions?

I would like to to beyond my current understanding, which is basically “they’re effectively one-way functions”, and be able to participate in discussions of articles such as this one.

jameswryan · on April 12, 2023

For the cryptography & theory? https://toc.cryptobook.us/

For the design and internals of hash functions? The finalists for the SHA3 competition have extensive design documentation. There's an archive at https://web.archive.org/web/20170829225940/http://csrc.nist....

Cryptographic hash functions are designed to resist existing attacks, so you'll want an understanding of differential & linear cryptanalysis, as well as a variety of algebraic attacks. I don't know of a good textbook on the subject, so you might find yourself searching keywords on https://eprint.iacr.org/

omginternets · on April 12, 2023

Thank you!

Toxide · on April 12, 2023

Is this a bug? Nowhere in the function is the restriction of being under 32bits provided. Seems more like a problem with the specification.

lionkor · on April 12, 2023

If someone checked in that code, it would definitely fail my code review. I understand back in the day it was different, but today there should be a lot of named intermediates. Additionally, `long` and any such keywords should not make it into any commit unless the commit explains 1) why its needed and 2) how, with any standard conforming implementation, it couldnt possibly cause a bug.

As always in C programming, the bugs arise from people doing stuff that any sane guideline tells them to not do.

mananaysiempre · on April 12, 2023

Named intermediates for what? This function munges a single hash value, h, and then folds its bits 24 to 31, g, into bits 4 to 7. It implements a mathematical formula, basically, and given it’s a hash, there isn’t much meaning to the contents of any pair of parens in that formula. Perhaps *name++, but any C programmer just thinks “next character of name” when they see that, don’t they?

(I’ve seen people code with a lot of intermediates with two- and three-word names, and I just don’t see why, in general. But here especially—hashes are not exactly oases of meaning.)

The 'unsigned long' point is correct in a cross-platform context, of course. If the code is from the SunOS linker, then its authors defined the ABI, so could guarantee 'long' was 32 bits. It’s the Glibc port that was careless.

(Although if I heard the phrase “any standard-conforming implementation” in this context, I’d be tempted to point out that an implementation doesn’t have to provide uint32_t at all, though a POSIX 2008 one does, and may have 33-bit ints, automatically promoting the uint32_t to signed int and immediately hitting UB on overflow. Either use C23 _BitInt(32), not subject to promotions for precisely this reason, or add +0U as needed to force the hypothetical signed int to unsigned.)

lionkor · on April 12, 2023

if its a mathematical formula, a comment would do it, too

cultureswitch · on April 12, 2023

For all its advantages, C is unfortunately so ripe with stuff that any sane guideline would recommend not to do that it can hard to follow through.

Though I agree in this case this would never have passed a modern review.

pjmlp · on April 12, 2023

One only needs to compare C programming manuals with the programming manuals from systems programming languages being developed outside Bell Labs.

Also note that C author's were naturally aware of these issues and created lint in 1979.

Now getting people to use such tooling is another matter, apparently 50 years weren't enough.

Joker_vD · on April 12, 2023

There is a lovely piece of text in the third version of PNG specification: "PNG four-byte unsigned integers are limited to the range 0 to 2^31-1 to accommodate languages that have difficulty with unsigned four-byte values" [0]. Gee, I wonder what languages those may be?

[0] https://www.w3.org/TR/2022/WD-png-3-20221025/#7Integers-and-...

account42 · on April 12, 2023

The answer is Java, not C(++).

dahfizz · on April 12, 2023

C has no problem dealing with a uint32_t. Not sure what you are getting at.

This is more of an issue with languages like java that abstract away integer widths and signs, which is convenient if you're only doing arithmetic but becomes a huge pain when dealing with binary data.

renewiltord · on April 12, 2023

The sized uint sure, I get that.

But the rest of the code? I guess you could make it so the dereference and pointer-increment happen separately. And for someone unfamiliar you could expand out the loop condition.

But ultimately it's a hash function. How would you write it?

legulere · on April 12, 2023

Another perspective is that easily preventable bugs in C arise because the compiler doesn’t stop people from doing them.

marginalia_nu · on April 12, 2023

I don't understand what it is you expect the compiler to stop. There's nothing actually wrong with the code. It's just written in a platform-dependent manner.

kevin_thibedeau · on April 12, 2023

The base integer types have always been open ended since C89. It has always been wrong to assume an exact size when writing code that depends on modular wraparound. The code is wrong.

Prior to stdint.h there was no portable way to address this but new code should be written using the type system as intended. Open ended when the minimum is sufficient and larger storage is inconsequential. Exact size when the algorithm requires it.

marginalia_nu · on April 12, 2023

The code isn't wrong, it's platform dependent. It's the assumption that it that is wrong. The code itself if is fine.

layer8 · on April 12, 2023

If the assumptions made by the code are wrong, the code is wrong.

I would agree with you if the code would preprocessor-check ULONG_MAX and #error out if it isn’t the expected value.

marginalia_nu · on April 12, 2023

The code isn't making assumptions, the programmer is.

layer8 · on April 12, 2023

That’s all the same. Then the code the programmer is writing is wrong because the programmer’s assumptions are wrong. Potato, potahto.

marginalia_nu · on April 12, 2023

There's actually a fairly big difference with the context of the language preventing the error.

If the code is wrong as in undefined behavior, then the compiler can and maybe should try to prevent it. If the programmer is wrong about the code, then the compiler can't and definitely shouldn't try to prevent it.

In short, the compiler can't prevent this

  int is_empty(char* s) {
    return strlen(s) > 10;
  }

The compiler maybe could prevent this:

  is_empty(NULL);

dahfizz · on April 12, 2023

Its not the same. The compiler can protect against code which is incorrect. The compiler cannot do anything to help with code that is correct but does the wrong thing.

Joker_vD · on April 12, 2023

But if the code was written with intention to be platform-independent, and turned out to be platform-dependent... it's the programmer who wrote it is wrong, not the code itself? Or do I misunderstand you completely?

marginalia_nu · on April 12, 2023

Right.

Let's say I know the Swedish language, in which "öl" means "beer", and I travel to Germany to partake in the Oktoberfest celebrations. I arrive and in Germany and immediately try to order a beer and try to say "One Beer Please", but not mastering the language I say "Ein Öl Bitte" (meaning "one oil please"). I don't think the that's a bug in the German language, and the sentence itself is perfectly good German. It just doesn't say what I think it does.

sylware · on April 12, 2023

ELF is way too complex and not really adapted anymore.

We should start to deprecate DT_NEEDED and make dlopen/dlsym/dlclose (maybe, dlvsym) hard symbols in the loader.

And game devs should stop using main() as some genius glibc dev did add a new libc_start_main version in 2.34. Namely, any game executable linked with a glibc from 2.34 will refuse to load on system with a previous glibc.

Actually, game binaries should be pure ELF64 binaries (not using main()) which "libdl" (dlopen/dlsym/dlclose) everything they need from the system. And of course, as much as possible should be statically linked (I think this is what unity is doing, but unreal/godot have a big issue: the static libstdc++ which, as of late, does not libdl anything from the system).

duped · on April 12, 2023

Use rpath and ship your dependencies. Even if you dlopen every shared library explicitly you have the same problem.

The glibc version problem is not new, you have never been able to rely (safely) that a version of glibc exists that works for your compiled program. This is an example of why containers exist, but unfortunately the problem isn't ELF - it's glibc!

And for that matter, glibc also ships the loader. This is the part that's mildly insane.

sylware · on April 12, 2023

Yep, the issue is mostly the glibc but ELF does not help. Additionaly, the static libstdc++ is also an issue as it seems not "libdl-ing" any of its system dependencies (did not check if c++ gcc devs fixed that already).

That does not mean ELF is not overkill nowdays. In the case of dynamic linking, deprecating DT_NEEDED to rely on hardcoded (with probably specific relocations) and simplified dlopen/dlsym/dlclose in the ELF interpreter (that would deprecate tls_get_addr() as it would become redondant with dlsym) seems to be a sane cleanup: explicitely split dynamic linking from static linking.

Nowadays mitigation:game devs should go pure ELF64 (no main()) for their binaries and fully libdl-ized. The hard part is to fork a gcc static libstdc++ to libdl-ize its system dependencies (dunno if it was done).

duped · on April 12, 2023

I think what you're advocating for is getting rid of automatic dynamic loading, which is certainly a take.

> go pure ELF64 (no main())

What does "pure" ELF64 even mean? No dependency on libc?

ELF is just an object file format. It doesn't imply anything about how the loader behaves or what features an ELF loader must have, or how that object relates to other ELF objects.

sylware · on April 12, 2023

Yep, services from the libc would be libdl-ed too. You would have to decide: either you use the machine code in a shared object, or you would statically link that machine code, but the idea is to make those explicit and different, not that current mess.

It means using the sysv x86_64 ABI entry point (which is basically a main()...).

The file format alone is useless, you need the ELF and ABI specs to know how to use properly the information defined by this very file format.

ptsneves · on April 12, 2023

There is a huge amount of tooling relying on DT_NEEDED for dependency detection. I am not so sure about general purpose Linux, but in the embedded Linux world this would be a disaster. The Yocto system for example would no longer be able to determine the runtime dependencies of generated binaries.

For the static library part, this is such a beaten down argument I just will not argue. I hope you enjoy re-installing your OS every time there is an security update on a library like openssl.

sylware · on April 12, 2023

You missed the point: using "shared objects" would have to be explicit with "dlopen/dlsym/dlclose'.

Mixing static linking with dynamic linking was not a good idea in the first place, and I mean it.

ELF should be "fixed" about this, but to be sincere and honest, I think a lot could be removed from ELF on modern systems.

Maybe it is not worth to fix ELF, but to go something like NGELF which would be excrutiatingly simpler and cleaner than ELF, namely real and disruptive innovation.

ptsneves · on April 12, 2023

> You missed the point: using "shared objects" would have to be explicit with "dlopen/dlsym/dlclose'.

dlopen and friends are function calls that you cannot evaluate build time. Actually not even at runtime as they are by nature dynamic and conditionally dlopen is a thing. Any shared object dependency tracking would be impossible or a new standard would be required.

Also dlopen is a POSIX standard. ELFs are used in many other places non POSIX.

> Mixing static linking with dynamic linking was not a good idea in the first place, and I mean it.

Why was it not a good idea? This happens all the time, especially the code that is at the very first executable address of the elf until some libc prepares things is arguably statically linked.

> [...] but to go something like NGELF [...]

Sounds interesting. Could you paste a link? I could not find it in google.