An alternative C version (Ted's is in Go) is described in the last link of the article [1]. That is a rather interesting tutorial for how to write high-performing tools of this kind, it deals with stuff like string interning/buffering, pre-allocating "objects" to avoid having to implement dynamically growing data structures, and so on.
It's pretty cool how the author uses LLVM as an example of a big codebase, then uses "10X that" as the requirement for how many directories and so on to support.
OMG that is so ... I don't even know. Is it over-engineering to use a static 16-entry perfect hash of file name extensions to filter out the five supported languages?
It's at least somewhat obscure, in the sense that adding support for another language becomes quite involved.
It would be interesting to micro-bench this code against something more naïve that just strcmp()s the extension against a list of known strings, or something.
I guess technically Kolmogorov complexity is an answer to their question since they said "describe" but I expect Waterluvian was thinking "measure" and you can't really measure Kolmogorov complexity for real programs. It is mostly useful for proofs or thought experiments.
You can try to measure Cyclomatic complexity, so that's more useful in practice.
Kolmogorov complexity is borderline useless if the idea is to describe code size or code complexity for humans, though. It only describes programs in terms of "what is the smallest (fully qualified) program that would generate something that ultimately leads to the original result", completely removing both the human, as well as the original code in question, from the equation.
And to make matters worse, we don't actually know how to calculate the true Kolmogorov complexity of anything because humans are notoriously bad at figuring out what "the smallest program" actually is, so it's a great device for reasoning about complexity, but it's a near useless device for determining actual complexity.
Agree with this, but the openbsd codebase, tends to be pretty consistent in style conventions and layout.
Fortunately, the word complexity was not used there, and it's nice to see if this subsystem, or protocol, or driver, etc takes two orders of magnitude more than the others.
Use a programming that is decoupled from stateful manipulations and control flow?
Programming does not need to involve itself with the hardware, that’s just a tool for evaluation. A program is an idea expressed in a language. It’s some formal statement of intent. Sometimes we can evaluate it to produce an output. Sometimes these statements are muddied by underlying representations.
The length of a declarative program seems a pretty good rough measure of complexity in the way you are talking about. Operational complexity of course, in the number of steps to actually evaluate it is a bit different, based on what you are using.
The vast bulk of it is a dump mmio register addresses, that have been cleared for public disclosure. This is driven by some kind of legal/IP requirement. The vast majority are unused, and they are expressed in a rather inefficient way, e.g. XXX_0 through XXX_31 when anywhere else it would have been parameterized XXX(N). This was all copied from the Linux kernel btw.
Rather disappointing, but what are you gonna do? Make your own high-performance graphics hardware? No. AMD is the highest performance graphics hardware with open source driver on the planet by a wide margin. Compromises have to be made. Just accept that the hardware division gives you this crazy register address file and don't touch it.
Picking at random the longest file, ./include/asic_reg/nbio/nbio_7_2_0_sh_mask.h, I think it sets aliases using #define for some addresses and values in hex to easily human-readable ans easy to understand names like BIFPLR1_PCIE_ADV_ERR_CAP_CNTL__COMPLETION_TIMEOUT_LOG_CAPABLE_MASK which is an alias for 0x00001000L.
This file contains 134349 defines.
From what I can see at a glance its mostly loads upon loads of graphics code / graphics cards stuff. Its possible they have upstreamed entire drivers into their code base.
I spotted some stuff that hints at low level protocols as well (i2c f.e.), so probably its just a whole lot of work to support AMD graphics cards. Potentially they upstreamed a driver or something.
side question: how do you _get_ the code from this site. It _seems_ like it's a git repo of some sort, but i can't figure out what to clone. Is this just a ... representation of a git repo somewhere else? What am i missing?
Another commenter noted that this appears to be a Mercurial repo, not a git repo. Does `go get` support Mercurial repos? If yes, what else does it support, besides git and Mercurial?
That only prints file and directory sizes, not lines of code. Counting lines of code correctly is the hard part, though Ted's utility is just using a simple file line count as a proxy, though that's probably good enough for this purpose.
It's pretty cool how the author uses LLVM as an example of a big codebase, then uses "10X that" as the requirement for how many directories and so on to support.
[1]: https://nullprogram.com/blog/2022/05/22/