Hacker News new | past | comments | ask | show | jobs | submit login
Single-file cross-platform C/C++ headers implementing self-contained libraries (github.com)
71 points by ingve 8 days ago | hide | past | web | favorite | 44 comments

I write C for a living but i'm not terribly experienced.

I was wondering is there a reason C and C++ code favours comments over descriptive names? I'm almost certain i'm not making this up, but go into any random file here and look at the names. Structs will have members called p with a comment saying "position". It seems very silly.

It's mostly a matter of personal taste IMHO, has nothing to do with being C or C++, much like brace- or tab-style.

My own rule of thumb is: short names for 'unimportant' variables where the meaning is clear from the context. More descriptive names for the more non-obvious cases, but still keep it short and concise (e.g. I would typically use "pos" instead of "position").

Having said that, some early C dialects limited the significant number of characters in identifiers to 6 or 8. I heard that this is the reason why the C runtime library function names are short (like strcpy() instead of string_copy()).

> Structs will have members called p with a comment saying "position". It seems very silly.

It does not seem silly to me. Coming from mathematics, single-letter variable names are the only acceptable possibility, especially in numerical code. I tend to read contiguous letters in a formula as products.

Besides, descriptive variable names violate the principle of "don't repeat yourself". It is better to describe the variable only once, in a comment next to its declaration, instead of several times.

Programmers should understand that not everybody who programs has a background in computer science, and certainly the traditional conventions of computer science are not universal and not generally reasonable. Many of us have a scientific background, where descriptive variable names are silly. We do not want our numbers to have meaning, that's the whole point of abstraction. Thus, they are named x, y, z, and so on.

Assuming you have been writing code with single letter variables for long enough, try opening a medium to large sized project you wrote a few years ago and no longer use or maintain. See if you still understand your own code.

Descriptive variables are about maintainability. And when it comes to the latter, the guiding principle is:

> Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live.

^ https://stackoverflow.com/q/876089/417194

I have had success using long names to good effect in non-mathematical code, but in mathematical code, I have never gotten them to work very well.

I've tried many times over the years, but I just never seem to be able to squeeze all the relevant information into the names without making them excessively long, and even then I usually find upon rediscovery that I have omitted an important detail. Mathematics seems to thrive on a kaleidoscopic abundance of subtle but important variations on similar themes and this really frustrates the ability to encode information in the names. Also, the necessity of building an abstract-mathematical mental picture separate from the code obviates the need. The value proposition of verbose naming conventions entirely falls apart under these conditions, and once it does, the benefits of concision rule the day.

Programming languages are not mathematics, better to put your mathematics in the comments, than the reverse.

force = (g_constant * mass[0] * mass[1])/(radius 2)

Communicates far more than the algebraic alternative.

Programming is about solving a particular problem, not providing a general formalism. The art of it, and mastery of it, is how to imbue the program with the problem domain. So that by speaking in its concepts you solve the problem.

> violate the principle of "don't repeat yourself"

That isnt the principle, that's a literal reading of each word conjoined into something totally different.

The principle is "solve the problem once", ie., if you encounter code re-solving a problem already solved before, abstract such that you solve it only once.

The principle only makes sense with respect to a problem which reoccurs, it is either incoherent or false when applied to literal fragments of code text.

One would then be moved to define replacement macros for anything more than two letters.

> better to put your mathematics in the comments, than the reverse

I respectfully, and strongly, disagree. Try to implement the conjugate gradient algorithm (or a variation of it) https://en.wikipedia.org/wiki/Conjugate_gradient_method Using your style, you will obtain an unreadable mess. Better make your code isomorphic to your math, and then explain the meaning of everything in the comments.

> force = (g_constant * mass[0] * mass[1])/(radius 2) > Communicates far more than the algebraic alternative.

I don't see how this is clearer than something like this

    // Newton's law of gravitation
    // m : mass of first body
    // M : mass of second body
    // G : universal gravitational constant 
    // r : distance between the two bodies 
    // F : force between the two bodies
    F = G * m * M / (r^2)

The problem arises at the 1000-line program level and above.

If you are doing engineering, then it better to follow engineering best practices, with descriptive names. If you are doing science, it's better to follow science best practices.

For example, we have two competing coordinate systems: first is x // width, y // height, z // depth, and second is x // width, y // depth, z // height. With descriptive names we will have no such problem.

However, it's much easier to load large and complex algorithm into head if it written concisely: one letter per variable or function, minimum unnecessary separators. When I need to understand someone else codebase, I will often concatenate all files, delete all comments and obvious functions (getters, setters, eq, hash, etc.), abbreviate long names via search and replace, remove unnecessary symbols and convert functions and classes into oneliners, until I will see whole algorithm at once. However, reading of such code requires some additional training.

To summarize, it's possible and is easy to convert descriptive names into abbreviations, so, it's better to stick with descriptive naming in public interfaces.

The original compiler ran on a platform was heavily resource constrained (very little memory, slow terminals).

If all you have is a 300 baud terminal (a luxury for some early users), writing a 30-letter statement takes about a second. Doubling that speed by using shorter identifiers is worth it (certainly as long as all the people reading the code either wrote it, or have the writer sitting next to them)

That created a culture that persists to today.

There are different cultural streams such as Apple’s one, which originated in Pascal (for the original Mac OS). Consequently (I think), Apple’s C headers later on weren’t afraid to use identifiers with over 40 characters. Next step/Cocoa similarly aren’t afraid of long names (https://mackuba.eu/2010/10/31/the-longest-names-in-cocoa/)

Where did you see that exactly?

Looked at several headers at random just now (including file, net) and they seem OK. Not terribly verbose, but not exactly cryptic.

In terms of 'p' vs 'position' - 'p' is an extreme and it's usually used for pointers, but the driving factor for using shorter names is usually to keep the line length in check when these vars are used in the code. Ditto for type names.

"Position" is quite often shorten to "pos", "offset" to "off", "length" to "len", etc. This is very common. Sometimes the original name is too long (e.g. "position of the last error in the first block") and lands itself to a not-so-obvious abbreviation, in which case the comment is made to clarify what it is.

It's basically about striking a balance between succinctness and verbosity, and conventionally C tends to lean towards the former.

The C++ standard library continued this culture of "making names unreasonably short and/or adding modifiers that are anything but obvious", e.g. rdbuf, setf, seekg/p, tellg/p, beg (instead of begin[ing]), ends etc.

I prefer that instead of boost-like std::chrono::time_point<std::chrono::high_resolution_clock>

std::datetime::timepoint<std::datetime::highres> ? Assuming you'd want to keep a separate datetime namespace.

Neither ::highres nor ::high_resolution_clock really is a descriptive name though, because it doesn't really describe what it means. What is a "high resolution clock" here? Does it mean µs resolution? Nanoseconds (as some "newer" Linux APIs support)? What about accuracy?

"chrono" is one of these "sounds cool but is kind of a weird word to use" things.

I think a lot of C/C++ programmers use text editors such as Vim/Emacs, and might not have code completion support, so the shorter names are easier to type.

'dabbrev-expand works in the context of the files that are open in all of the buffers in the buffer list too

Most people that use either Emacs or Vim for serious programming use extensions that implement code completion.

Some old C code was written on monochrome 80x25 ttys. Or written by mathematicians who will be confused why you haven't simplified "position" down to "-p(o2)stn", because clearly "position" is multiplying 8 variables together. Or typed on a broken keyboard, hence "creat" I assume. Or written on programmers who remember, or were still using, punchcards. And maybe have some RSI and no IDE autocompletion to help.

And of coure any new C code would look out of place if it didn't follow these old shibboleths... better make new C code an unreadable mess too, no? ;)

Not all of math is done in commutative rings.

It's a sad truth of the state of C/C++ build systems that header-only and single-file libraries (which take very long to compile) are so sought after.

I highly recommend you give Bazel a look if you haven't. I'm coming from a biased perspective, since I work at Google where we have a nice build setup. But, I like Bazel so much I use it and hack on it in spare time.

There's really only two things that bother me:

- Windows support is alright but it's a bit weird and wants some Unixy stuff to exist. It's being worked on in small bits but it could be better. If you don't care about this you'll have a good time likely.

- Sadly it isn't widely used outside of Google, so you're mostly on your own. The only upshot is that the Starlark language makes it really easy to compose BUILD files, so I don't find it too difficult to manage a few BUILD files for things like SDL2. Unlike a lot of other build systems, like CMake for example, it would be much less hassle to consume other Bazel projects directly, since it's designed to work that way, so if it ever did gain widespread adoption it would be a really nice solution.

There's definitely some things that would turn off the average user. Like the BUILD files, in order to be predictable, contain dependency information, which either needs to be updated by hand or tools (you can guess which is preferred.) On the other hand, builds are very predictable, and it works such that you can easily express dependencies across different programming languages and even autogenerated code from another target.

Nothing is perfect, but I find Bazel is super nice for C/C++ compared to the alternatives I've used. It's a different way of approaching the problem of building software, one that I've found very novel despite the shortcomings. I'm worried it's too easy to miss what makes it special behind some of the turnoffs people initially hit.

How does this solve the build time problem the parent post was lamenting?

I’ve looked a little bit at bazel but haven’t spent long on it. It seems to really, really want you to do things in its specific googly way ( that seems to stem from monorepo style and always knowing the exact build path and never needing to distribute to users) which is hard if e.g. trying to to convert an existing project that doesn’t perfectly fit (us: combined C++/Python with some legacy fortran generation)

Or a project that has external dependencies currently being chain-built but isn’t itself built with bazel (e.g. most the planet) - we don’t all have dedicated build teams to accurately and bug free reproduce other libraries build procedure (also getting away from a buggy reproduced build is part of my system rewrite intention).

I seem to remember the documentation also had the usual problem that the easy examples were well documented but hard to tell how to do anything more complex, but maybe I didn’t dig enough. CMake has a similarish problem except the problem is finding more modern ways of doing things that don’t suck (I’m aware of all the talks about this, many of them are big on ambition but low on details, and usually just teach transient dependencies).

Ignore the first sentence, I think I misread what you were specifically commenting on for that.

It improves the build time problem by offering a way to do fast, hermetic builds. Caching is very accurate, targets only rebuild exactly when they need to, and you can distribute just about any kind of work over a build farm. It may not sound like much, but having the distribution at the build system level, versus say, distcc, makes a big difference, especially if you have long build steps that aren't just compilation. You literally can't accidentally have dependencies that aren't explicitly defined. When you go to compile it will not find headers from targets that are not depended on, for example.

>I’ve looked a little bit at bazel but haven’t spent long on it. It seems to really, really want you to do things in its specific googly way ( that seems to stem from monorepo style and always knowing the exact build path and never needing to distribute to users) which is hard if e.g. trying to to convert an existing project that doesn’t perfectly fit (us: combined C++/Python with some legacy fortran generation)

I don't really think this is the case. There's definitely certain aspects that can be challenging but I do not feel it has much to do with monorepos. It works well with a monorepo using the workspaces system, but it also should work fairly well using modular git repos as well, each repo is just treated as it's own workspace. It's possible to even have Bazel go and get other workspaces, and you can refer to targets inside of other workspaces.

One of my favorite parts about Bazel is combining different languages. If Starlark rules for the language you are trying to build for don't exist, you can write them. Starlark is a fair bit like Python, it's relatively easy to learn and write your own targets.

I acknowledge that most people don't want to maintain BUILD files for external stuff. There are some options here. Some people already maintain this, like here: https://github.com/bazelregistry - and there are experimental rules for dealing with external build systems.

I use it outside of Google on personal projects and find it perfectly manageable. You don't really need a large build team unless you're doing interesting things. There are a few aspects of Bazel that really are tough, like how you must explicitly specify dependencies on every target, and how actual builds can't even see files not specified in dependencies, and when you get fairly deep you'll probably run into trying to understand what's going on with runfiles. It's definitely a bit more compliant than a Makefile generator at this point. But in exchange, it delivers on its promise of being fast and correct.

I'm actually super familiar with Bazel :) And Buck and Pants.

Been following it since the presentation at 2015 @Scale conference, attended 2017 Bazel Conf, introduced it at Lucid Software, helped write a bunch of OSS rules.

Bazel is mostly great. Besides a few pre-1.0 polish items, I'm mostly frustrated by the expectation that users write build rules in Skylark, while the real rules (including C++) have to be written and compiled into Bazel itself to take advantage of features that Skylark lacks and aren't even on the roadmap. [1]

So while I'm disappointed Bazel's not quite a general-purpose build system a la Make, it's awesome for C++ and Java.

[1] "Why Skylark is a 2nd class citizen" https://groups.google.com/forum/#!msg/bazel-discuss/8eAMN3Wh...

(I work on Bazel)

Starlark (Skylark) is used for a huge number of rules. The community has written and used in production lots of rules, such as Closure, Docker, Go, Haskell, Kotlin, Kubernetes, NodeJS, Rust, Scala, Swift, Typescript.

We stopped creating native rules many years ago and we are actively migrating them to Starlark. I'd replace the phrase "real rules" with "legacy rules".

If you hit critical limitations when writing rules, feel free to ask on the mailing-list (or on a GitHub issue).

Thanks for your work Laurent!

> The community has written and used in production lots of rules

I know. I've written them. [1] Github pauldraper

> If you hit critical limitations when writing rules, feel free to ask on the mailing-list

I have, and you've responded. [2]

And a year later it is still is unclear to me whether Starlark is going to get tree artifact actions, or dependency discovery (pruning), which C++ uses to great effect.

I proposed a solution to both problems. [3] Oscar from Stripe presented a proposal at the first Bazel Conf.

I've concluded that either (1) I'm using the wrong channels or (2) Bazel isn't that aware/interested in Starlark parity.

TBH, the number of responses that are ignorant of the dependency discovery disparity alone shows me how unaware Bazel is of the Skylark-only deficiencies.

Great project (seriously), but when your bread-and-butter compiled languages* (Java, C++, Objective C) aren't using the same system as everyone else, it's only natural.

Again, great project.

[1] https://github.com/higherkindness/rules_scala

[2] https://groups.google.com/forum/#!msg/bazel-discuss/8eAMN3Wh...

[3] https://docs.google.com/document/d/16iogGwUlISoN2WLha2TAaUdp...

* Go is in Starlark but has essentially required the complex Gazelle build tool to be layered on top of Bazel.

It's mostly a prioritization issue (we can't work on everything we'd like to have).

But there's a very recent discussion that seems related: https://groups.google.com/d/msg/bazel-dev/oFRdGdrm8DM/Gr0Yz3...


I'm wondering if there's still a good reason. Cmake with either vcpkg or conan makes for a very reliable system. The syntax still isn't the prettiest, but if done well ('modern' cmake, using targets), cmake scripts are very short, so it's still worth it.

The trouble is not everyone uses CMake or vcpkg or Conan. As a library author I'm not going to provide build scripts for every C++ build system in existence.

The beauty of a single header include is that there is no build system, so I don't have to provide any scripts at all.

It's an unfortunate solution but I can see why it is so popular.

I agree with you.

Actually, we've been having a lot of success developing an in-house tool to solve this problem and it's helped us build agile modular systems using C++.


It's a one stop shop for building modular C++ applications with an opinionated set of packages here: https://www.github.com/kurocha

That being said, `teapot` doesn't have any default build rules, it relies on packages to provide them, so you could build an entirely different eco-system of packages with minimal effort.

It can be mitigated a bit by using precompiled header files.

These are kinda misleading - "single-file" is a technically accurate term, but if you think that implies that you can just #include them, it's not that (i.e. not what is normally called "header-only"). It's basically a header and implementation combined into a single file, with the implementation part #ifdef'd. So you need manually instantiate that implementation in one of your existing translation units, or add a new one specifically for that purpose. At that point, I don't see how this is preferable to just having an .h/.cpp as usual, and zipping it for distribution purposes.

They're also known as STB-style headers, but IMHO all those names are correct.

One advantage is that you don't need to mess with build systems for configuration, instead put the configuration defines before the implementation right in the source code.

Another advantage is potentially faster compilation if you put many implementations into the same implementation source. Same idea as unity-builds basically. Even though you're using dozens of header-only-"modules", the compiler only builds a single file.

I disagree it's as simple as you suggest. Trouble seems to be that in a perfect world, you're right. It would be the same.

In the nasty real world, these build systems introduce more issues for more people. Then linking the shared object or DLL adds management issues, naming issues, arch etc. But that's for separate dependency building as a binary.

Has every one of these side compilation steps gone perfectly for you? No. At some point they wasted your time.

This concatenation-build step is done reliably beforehand, even if simple, can be assembled and be tested by the releasing party. Saves the user some trouble.

A pre-built binary is about as easy if done sensibly. Boost is mainly header only but has some libs still requiring linking, which requires you digging into that library's particulars.

All of those are steps that have tradeoffs in time, attention, flexibility and risk.

Careful of that word "... just". Just do this! I can just do that! The full hassle/cost requires pulling out of just the compile step, in this instance.

I'm not talking about prebuilt binary libraries, much less DLLs. I'm comparing it to just dropping an .h and a .cpp file directly into your project. This one requires dropping an .h file, and then either adding a .cpp file (which you have to write manually, even if it's just one line of code), or adding an implementation #include to one of the existing .cpp files.

I guess my point is that I just don't see the benefit of concatenation here.

It looks like every chnage will also lead to recompiltion of all dependent files. It seems quite inefficient.

Such library-style headers that you drop into your own project are usually never changed.

You can also group the stable headers into a different implementation source file from the headers that change frequently.

less messing with build systems.

I don't see why tho - either way you have an implementation file to add, so what's the difference?

You could still #include the .cpp file even if they were seperate files. You wouldn't even need the define.

https://github.com/nothings/stb is another set of good header only libraries.

(I'm not the author and don't know them, I'm just a happy user.)

Don't C++ 20 modules solve this problem?

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact