Hacker News new | past | comments | ask | show | jobs | submit login
Considering C99 for Curl (haxx.se)
223 points by mariuz on Nov 22, 2022 | hide | past | favorite | 178 comments



The reasons given are reasons to stick with C89 forever. Change has risk, and there are advantages and disadvantages. If there were no advantages gained from C99, it wouldn't exist (people don't release language updates for no reason).

A much more interesting question is if you were writing curl today, what would you do. If the answer is still 'C89' then we as a profession have to wonder why - did we get it exactly right, and there are no lessons from the last 30 years, or is the fact there are no better alternatives deeply depressing.


For systems programming, C89 is definitely a "sweet spot". It's relatively easy to write a compiler for and was dominant when system variety was at its highest, so it is best supported across the widest variety of platforms. Later C standards are harder to write compilers for and have questionable features like VLAs and more complicated macro syntax. C++ is a hideous mess. Rust is promising, and would probably be my choice personally, but it's also still fairly new and will limit your deployment platform options.

C89 is still a reasonable choice today. I don't think it's depressing. It's a good language, and hitting the sweet spot of both language design and implementation is really hard, so you'd expect to have very few options.


Re: Rust, Curl is modular enough that you don't need to rewrite it in Rust in order to enjoy some of its benefits, you can just tell Curl to use Hyper (Rust's de-facto HTTP lib) as a backend. For the past few years they've been working on getting the Hyper backend to pass the Curl test suite and they're down to five remaining tests, so perfect support looks to be imminent: https://github.com/orgs/hyperium/projects/2/views/1 (seanmonstar occasionally streams on Twitch if you'd like to watch him work on these).


Also, and this sort of blows my mind, but Rust is almost 10 years old. It is a pretty darn stable language, especially for greenfield projects like a new HTTP library. The better Rust gets at interop the more it will begin to eat the systems programming world IMO, and we all benefit, even if it is quietly doing it without much fanfare.


I was with you right up until “quietly doing it without much fanfare”.

Personal opinions of Rust aside, it has gained so much fanfare that “rewrite it Rust” is now practically a meme.


The other side of that is the back end of what Rust folks are doing. There is a vocal segment that are doing a lot of surface level things. But there are also people quietly building the language and toolchain up to be something that you can do true low level embedded work with while maintaining (most of) the guarantees of Rust. That is what I was alluding to. I don't think "rewrite it in Rust" is always smart or even productive.

edit: It is also worth exploring why and how a systems programming language has generated this much excitement in folks that are "rewriting it in Rust". These people are also in their way making it that much easier for everyone else to transition to Rust, proving these projects work just fine in Rust. I do agree it is a meme, but it is a good one for us all. As an infosec practitioner nothing could make me happier than seeing people excited about a language that eradicates one of the worst and most pernicious classes of C/C++ bugs.


The sad part is that this was inflicted by the industry themselves,

https://www.schneier.com/blog/archives/2007/09/the_multics_o...

> The combination of BASED and REFER leaves the compiler to do the error prone pointer arithmetic while having the same innate efficiency as the clumsy equivalent in C. Add to this that PL/1 (like most contemporary languages) included bounds checking and the result is significantly superior to C.


Thanks for the link!

One data point more backing my theory that we're in the middle(?) of "computing dark ages", where the biggest crap and nonsense dominates everything (and people not even know how crappy everything is).

Do you think we will ever leave the dark age?

I mean before our AI overlords get in charge and kill and replace all the nonsense we've built, of course.


Not before our lifetimes.

This is a matter of quality, and like everything in computing, quality only matters when money or law is involved, so in a way returning digital goods with refund is one way to make companies take quality more seriously, other are stricter liability laws for when exploits ocurr in the wild.


It's not fair to judge an entire ecosystem full of extremely talented people by the vocal (and insufferable) 1%. Every group has them. What has become a meme has zero relationship to the quality of the thing.


I wasn’t judging anyone. I was just saying Rust isn’t exactly flying under the radar.


Rust is over 12 years old at this point as a publicly-available project (its development started internally sometime in the late 00s; it was publicized in summer of 2010).


Periodization is hard, especially with Rust, but if we're talking about reliability, counting time before 1.0 in 2015 doesn't feel right to me. Seven and a half years is still a long time :)


Agree, time since v1.0 is a very reasonable measure.


10 years and still an okish IDE experience… This is going to stay with Rust. The problem is not the tooling, but the language.


I don't get what you mean.

There are languages out there with much more advanced features than Rust, like e.g. Scala. But the IDE experience in Scala is not worse than with Java.

There is no fundamental problem with IDE support. It just takes some work with an advanced language.

(The only issue are languages which you need to write "backwards", like Haskell. But that's another story.)


I am talking about compilation speed.

I recently started working with Rust for contributing to projects like Rome/tools [1] and deno_lint [2]. My first impression with Rust is a bit frustrating: compilation takes tens of seconds or minutes. I am waiting in front of my IDE to get type hints / go-to def (often to fall-back to a text search). When I am launching unit tests, I am waiting that rust-analyzer terminates its indexing, and then I am waiting again that the tests compile…

The tools are now mature, and a lot of engineering work is done on both rust compiler and rust-analyzer. I am afraid that the slow compilation of Rust is rooted to its inherent complexity.

[1] https://github.com/rome/tools

[2] https://github.com/denoland/deno_lint


> I am afraid that the slow compilation of Rust is rooted to its inherent complexity.

AFAIK that's not the case.

The problem here is that Rust has "issues" with separate compilation due to some language design decisions (which affect incremental compilation than obviously). It's not built for that and it is, and continuously will be, quite difficult to make this work somehow.

But that's less a problem with the complexity of the language as such.

My Scala example stands: Scala is also quite complex and not the fastest to compile. But after the build system and the compiler crunched the sources once (which may take many minutes on a larger code base) the IDE is very responsive. Things like type hints or go-to def are more or less instant. Code completion is fast enough to be used fluently. Edit-compile-test cycles are fast thanks to the fact that separate compilation considerations were part of the language design decisions. (That's for example why Scala has orphan type-class instances; which are a feature and a wart at the same time).

As I understand Rust's "compilation units" are actually crates. This is not very fine granular and I guess the source of the issues.

I would guess splitting code into a few (more) crates (which than need to depend on each other) may improve the incremental build times. Also things like not building optimized code during development of course apply, but I think cargo does this automatically anyway.

But I'm not an expert on this. Would need to look things up myself.

Maybe someone else has some proven tricks to share?

OK, a quick search yielded some useful results, so I share:

https://www.pingcap.com/blog/rust-huge-compilation-units/

https://fasterthanli.me/articles/why-is-my-rust-build-so-slo...

https://news.ycombinator.com/item?id=29742694


Thanks for the detailed answer and the pointed resources!

In fact, I included the lack of compilation locality in "inherent complexity of Rust". However, I agree that this could be considered apart.

In my experience with TypeScript (quite different, I admitted), splitting in distinct compilation unit may help. However, this does not solve the issue.

This could be great if Rust could deprecate some features in order to improve its compilation speed. I am not sure if it is feasible…


I am unsure what you mean. Care to elaborate?


I think I followed. One aspect of a programming language is how easy it is to build a useful IDE for with code prediction, navigation, refactoring, etc. Java is relatively easy, Lua is very hard. Rust is somewhere in the middle, with macros being a complicating factor. https://rust-analyzer.github.io/blog/2021/11/21/ides-and-mac... discusses the problems specific to rust much better than I could.


Go has one of the best compiled language IDE experiences out there. GoLand makes it feel easier than Python code :)


Can it find all Implementations of an interface or what interfaces this implements ?


Yes to both. It is great :)


I'm unsure exactly what the above poster is trying to say, I generally find Rust development very pleasant with nothing but vscode and Rust-analyzer.

But... I'll admit there is one major stumbling block so far. Debugging iterator chains can be cumbersome because of the disconnect between the language and the compiled code. I've found myself stepping in and out of assembly more than I'd like. I assume this is the kind of problem that can be overcome with a nicer debugger though.


> I assume this is the kind of problem that can be overcome with a nicer debugger though.

I think this would be something that modern debuggers need to solve somehow in general.

There are more and more languages with high amount of syntax sugar, where the output to be debugged doesn't have much in common anymore with the code written.

Debuggers need to be aware of desugarings somehow.

But it makes no sense to implement this on a case by case basis for every language. We need next generation debuggers! (But I have no clue how "a sugar aware debugger" could be implemented; something in the direction of "source maps" maybe?)


See my other answer [1] to get more context :)

[1] https://news.ycombinator.com/item?id=33729406


So you're arguing C89 is better because it's easier to write compilers with it? How is that a relevant point for the context? We're talking about whether c99 it's better migrating for the end user and not a compiler writer


They are arguing that C89 is better because it's supported on more platforms (curl is used on all sorts of oddball embedded systems), and that it's easier to get a new platform going with C89


The requirements for writing curl are different from the requirements for writing other software. Just because C89 is a good choice for curl doesn't mean that C99 isn't a better choice for other things. The failure isn't in having revised a language, it's in thinking that all projects using the older version must upgrade. The idea that progress is linear is an illusion.


The question is more complicated than that: curl is so popular it is used on systems where c99 is not available. The question is how many of those exist, and are at which point it's not worth supporting them anymore.


Just a terminology rant: it is used in -builds- where c99 is not available.

Particularly the MSVC ecosystem is identified in TFA as being a late adopter.

Once built c99 code can run where it wants.


> Once built c99 code can run where it wants.

Is that true on the MSVC ecosystem?

Don't you have to compile separately for each `msvcrt` environment, as I thought they aren't binary compatible? And would a non-C99 msvcrt necessarily have an `snprintf()` implementation in its libc-equivalent dll?


You can call code compiled with one msvcrXX from code compiled with a different msvcrXX, provided that you don't try passing things from one to the other (that is, no passing a pointer to a FILE structure, or even a file descriptor since the file descriptor table is on the msvcrXX instead of the kernel), and always free or realocate memory using the same msvcrXX (that is, don't allocate memory and expect your caller to call free() on it, always provide a custom deallocation function for your objects).

This is possible because, unlike on Linux where function names are global, on Windows function names are scoped to the DLL, so you can have MSVCR71.DLL and MSVCR81.DLL loaded at the same time in the same process and they won't interfere with each other.


OK, but isn't the point of building a program that links to (say) MSVCR71.DLL that you're expecting to run it in an environment where (say) MSVCR81.DLL isn't available?

I don't see how that fixes the problem of possibly not having an snprintf() implementation on a system that doesn't have a C99-compatible MSVC runtime environment.

Did I miss an implication of your comment somehow?


If you compile your program against some MSVCRT then it's your job to make sure that MSVCRT is available on the machine where your program is installed, by delegating to its installer.

All supported MSVCRTs are installable on all supported Windows SKUs.


Yes, that's the thing I always forget, the way Windows deals with multiple incompatible versions of msvcrt is that every application ships its own copy of libc, and hopefully the installer is well-written enough to only copy it into place if it's newer than the newest release of the same major version that's already there, lest a random app re-introduces a bunch of security issues that should have been closed by the last security update for every other application that uses the same msvcrt.

...and by "forget", I mean "block out due to trauma, because surely it can't be that stupid".


Supported is really doing a lot of heavy lifting there, isn't it?

Windows 7 and 8 haven't been supported in years, but are still pretty common in the wild.


> Windows 7 and 8 haven't been supported in years

Windows 7 hasn’t been supported for two years now, but Windows 8 EOS isn’t until January 2023.


> This is possible because, unlike on Linux where function names are global, on Windows function names are scoped to the DLL

This is also possible on Linux with linker scripts AFAIK.


Hmm, not really in the same way -- to do this with linker scripts you would need to rename some of the symbols in the library being consumed.

What you can use, to a limited extent, is dlmopen().

Shared objects in Linux are just really late linked static objects, with fix-ups (hence PIC requirements).

In macOS and Windows, there are heirarchies.


Linker scripts change whether or not symbols are added to a global symbol table for subsequent requests (i.e. "exported"). Though, you don't even need a linker script to effect visibility as both GCC and clang provide a visibility function attribute, and you can change the default visibility through a simple compiler command switch.

dlopen permits you to control whether exported (externally visible) functions in a module become available to satisfy link dependencies in the application, such as subsequent module loads. See the dlopen flags RTLD_GLOBAL and RTLD_LOCAL.

dlmopen is for controlling the visibility of shared library dependencies pulled in by dlopen'd modules, whether RTLD_GLOBAL or RTLD_LOCAL, which only effect the immediate symbols in the module and not symbols from automatically loaded shared library dependencies. If you link the main application with OpenSSL (-lssl -lcrypto), or a prior module you dlopen'd pulled in OpenSSL as a dependency, then those OpenSSL symbols become available to satisfy requirements for subsequent dlopen'd modules. dlmopen allows you to create an entirely different symbol namespace for a module or modules, where symbols dependencies are only ever satisfied from that namespace, and exported (global) symbols, whether pulled in by dlopen or transitively via a shared library, are never visible outside that namespace.

None of these options directly map to the behavior of DLLs. DLLs fundamentally use different semantics, AFAIU. The closest behavior to DLLs might be DT_RUNPATH + dlmopen, but dlmopen use is explicit so not really the same thing. You could use ELF symbol versioning (maybe in combination with DT_SONAME and DT_RUNPATH) to accomplish the same thing as DLLs by effectively renaming all the symbols in a library (e.g. attaching a version component), but there aren't any tools around to help automate that, AFAIK; you'd have to generate linker scripts and it'd be a complex build. Much easier to just static link at that point.


For C, Windows has had a stable CRT (libc in Unix speak) for several years now, since Win10. There are still cross-runtime compatibility concerns with C++, but those shouldn't apply here.


> curl is so popular it is used on systems where c99 is not available.

What systems don't support a version gcc that compiles c99 at this point?


Old systems in ports and airports, military systems, anything that has a 50 years shelve life...


How many old systems like that are connected to the internet and could actually use curl?


More than you would think, but curl is used to talk to the local network too, or inside VPN as well.

Some hacks are quite crazy.

E.G: there is this very old navy broacasting protocol, NMEA (https://en.wikipedia.org/wiki/NMEA_0183), that was designed so it could be transmitted through old fashion radio waves. You'll find it in some sonars, water sensors or AIS beacons. For this reason, despite that it looks more like a layer 4 protocol, it embeds its own packet format and checksum, all in ASCII, that clients are expected to parse.

Now of course, a lot of devices are still emitting their data in NMEA, and it's not uncommon to be able to just telnet or netcat (if UDP) into one to see the data flowing.

But after a while, people started to aggregate those data from their numerous sources into one single router, and expose this router for convenience through... HTTP over TCP/IP.

And now you have those all those old computer towers (some still rocking a CRT screens or windows xp) doing long polling to get broadcasting data over a protocol that was made for request/reponse, to read the payload that is another protocol that was meant for radio equipment and hence requires manual consistency checks, that is transported by yet another protocol that is doing its best to preserve packets.

And they say the spirit of hacking is dead :)

(sometimes I feel IoT or domotic stacks look the same honestly)

Of course, somewhere in there, there is a curl call. The question is therefore does curl author want to support a potential upgrade path for such twisted use case or not. I would say "nahhhh", but maybe curl had precisely the success it did because the author was ready to support it in crazy settings.


But couldn't you just cross compile to such targets?

I think nobody really does any serous development on such old machines.


Maybe. Or maybe they have the only know toolchain to work installed on this single machine somewhere in the basement (seen in a healthcare corp), or they have a chain of trust that needs way too much effort to verify again (seen in the army), or they are not comptent enough to do such thing (seen in airports), or their whole stack is so old it can only run on this stuff (seen in ports), or their target is exotic and you can't cross compile to it easily (seen in aerospace).

Again, not sure that it means curl should endorse such niche situations, but the modes of failure are numerous.

TL;DR: the world is complicated


I read this as: If there would be real need in almost all cases you could cross compile (the exotic target being the exception).

Whether this is feasible in a economic sense is another question. But technical it should be possible.


Quite many!


> If there were no advantages gained from C99, it wouldn't exist (people don't release language updates for no reason).

Why isn't it possible that the updates aren't as good as the people who wrote them thought they were?


VLA certainly were proven as a very bad idea, to the extent Google has paid the effort to remove all of it from the Linux kernel, as security measure.


> VLA certainly were proven as a very bad idea

VLAs aren't a problem. VLA in C are.

But that's not because there is any issue with VLAs. The issue is that C concepts are stupid, but nobody fixes the roots of the issues.

If you put something of variable length (a VLA) into something with limited static length (the "stack") it will explode. That's nothing new and nothing special or exclusive to VLAs. Actually, exactly this is one of the main issues with the bad C design since inception: It does not do bound checks (especially no static ones; as this would require proper depended typing for safe VLAs). Out of bound access will just "explode" as always in C (likely leaving a nice security crater).

To be honest I don't get why we're still stuck with the stack / heap nonsense. There is not stack (or heap). There is only memory.

What would be much more interesting would be direct control over the caches… Instead we still use the pure fantasy products "stack" and "heap" which are actually irrelevant (as they don't exist in the end).

There is nothing "special" about "stack" memory. That's just a very primitive region based automatic memory allocator backed into the C runtime!

The whole "using registers" thingy in context of "stack" is also just fake by now. You don't use HW registers—but some virtualization of them presented to you by the VM that runs inside the CPU. So you don't control register allocation anyway! So this could be made completely transparent without any impact. (The VM inside the CPU does the actual register allocation fully automatic. Presenting the "faked virtual ISA registers" to the outside world just to make "legacy" code happy).


As someone tangentially involved in this, I think this was misguided. But Linus also was not happy about code generation with VLAs.


> As someone tangentially involved in this, I think this was misguided.

Do you mean the removal of VLAs from Linux? Why do you think that was misguided?


> If there were no advantages gained from C99, it wouldn't exist (people don't release language updates for no reason).

That's not what the post said. He didn't say that C99 offered no advantages to anyone. He said no one could come up with benefits to the curl project that would be gained by moving to C99, therefore the risk introduced by doing so was not worth it for now (my paraphrase, obviously).

It sounds to me like a perfectly good reason to stay with the current standard for that project.

[edited to fix a typo]


Writing curl today for your own use, on a platform/OS/tech stack you control, or to target all the places where curl runs right now?

It's still deeply depressing how costly/impractical it is to apply improved technologies in the long tail of environments that isn't "linux on amd64" and similar, but it's not really a language design question in my opinion. We didn't get it "exactly right", we got it "good enough", and upgrade costs are prohibitively high for the general case.


"people don't release language updates for no reason", indeed, many reasons are in the end "planned obsolescence" or ways to make even a naive compiler so much complex that only few remains, and of course in control of very few groups of ppl, and it is near impossible to implement reasonably a real life alternative.

My opinion is C is already way too rich and complex. I would stick to c89 with benign bits of c99 and c11. The benchmark being "one average system developer coding a naive and real life C compiler in a reasonable amount of time and effort".

That said, I know that my "next" C compiler will probably be a RISC-V assembler with a very conservative usage of a macro preprocessor.


some things actually just work


The tipping point for me would be `snprintf()` and related functions. I've found it generally more useful/memorable/readable than strncpy()/strlcpy() and other updates to the dangerous and deprecated strcpy() just for copying strings safely, never mind it's other formatting abilities.

If Curl already has its own "decent and functional replacement" for `snprintf()` that's used extensively throughout the codebase, or if they just don't need that functionality (I haven't checked) then I guess that's not an issue. But that would be the big selling point as far as I'm concerned.


Note that strncpy() is not intended for safety. The purpose of strncpy() is to write to fixed size data structures such as part of the filesystem where you don't want to store NUL termination on strings.

Like 1980s Internet protocol features the rationale for weird things in C is more often "That's how Unix works" than "This is actually a clever safety feature".


> write to fixed size data structures such as part of the filesystem where you don't want to store NUL termination on strings

... AND where you want to pad the remaining space with zero bytes, so that you don't leak uninitialized memory onto the disk, or network.

The null byte padding behavior of strncpy makes it clear what the intended use was.

Also, the way C initializes character arrays from literals has strncpy-like behavior, because the entire aggregate is initialized, so the extra bytes are all zero:

   char a[4] = "a";      // like strncpy(a, "a", 4);
   char b[4] = "abcd";   // like strncpy(a, "abcd", 4);
the compiler could literally emit strncpy calls to do these initializations, so we might say that strncpy is a primitive that is directly relevant for run-time support for a C declaration feature.


Is the original intent of strncpy() germane to the GPs comment? Explicitly stating the max length to copy is an effective tool for avoiding buffer overruns, regardless of whether the designers imagined that important use case.


But the thing it does (fill out a fixed sized buffer without caring about NUL-termination) is not at all what you'd want from a safety feature.

If you look at this function assuming it's a safety feature, that's a huge surprise, and indeed if you were skimming you might miss what it does because (in the context of "it's a safety feature") this is an insane choice. "Why would you do that?". Well, because it's not a safety feature.

The perf cost isn't what you'd expect from a safety feature either. Suppose we have a 1MB buffer, and we strncpy "THIS" into it using n = 1024. That's just four bytes right? Nope. strncpy() will write "THIS" and then 1020 zero bytes.


Except strncpy is broken for C strings, because it doesn't guarantee nul termination. So if you forget to force a termination on every use, you get buffer overruns.

Not only that, but (because of its actual purpose) it also fills the buffer with nuls, which is a complete waste of resources.

So yes, the original intent of strncpy() germane to the GPs comment, because it makes strncpy actively dangerous and complete shit when working with C strings.


The issue is that strncpy() isn't a str___() function, despite its name. It's a 0x00-padded memcpy.


The output side of strncpy() might not be a str___() function, but AFAICS the input side of strncpy() is clearly a str___() function, since it stops reading (but not writing) at the first NUL byte.


But it's not an str* function, it's an strn* function. And most (though not all, that would be too easy) work on fixed-size (hence the n) nul-padded strings.


No, it isn't a string function of any kind. "A string is a contiguous sequence of characters terminated by and including the first null character." § 7.1.1.

Calling a bespoke byte-sequence data structure a "string" is inaccurate. Treating strncpy() as a string function is erroneous and can easily lead to memory corruption.


[flagged]


> Have you considered giving reading comprehension a try?

Don't do this.


Then don’t demand it by wilfully misunderstanding comments in order to “well actually” them.


Do we have a source for what the intended purpose is? I think you speak well to the effective purpose, but I'm not sure if it was that clear when it was introduced.


> I think you speak well to the effective purpose, but I'm not sure if it was that clear when it was introduced.

It was completely clear, and can easily be inferred from its specified behaviour.

It's just completely useless nowadays, because its purpose is essentially obsolete, because the data type it works with is almost never used anymore.

strncpy works with fixed-size nul-padded fields as you'd find in e.g. mainframe-type software. That is why it:

- fills the destination buffer with NULs if the source is shorter

- does not nul-terminate if the source is the same size or longer than the destination

strncpy is essentially equivalent to zero-ing a buffer of size `n` then copying the first `n` bytes of src (up to the first nul) in the target


That's your understanding now. Fine, but not evidence of what someone else thought about this some decades ago.

(I don't use strncpy and don't defend its functionality, just want to know what was intended when it was introduced.)


I found an actual quote of a source which verifies the intention for fixed-length fields: https://softwareengineering.stackexchange.com/a/438090


On early Unix systems, the directory structure was a simple 16-byte record; 14 bytes for the name, and 2 bytes for the inode. [1] strncpy() was used to simply record the file name into this structure.

[1] "UNIX Implementation" by Ken Thompson, _The Bell System Technical Journal_, July-August 1978, Vol 57, No 6, Part 2, pg 1942.


Many protocols still relevant today make use of that structure, it still has widespread use.


For the average protocol you're going to init the entire message then set into it, you don't need to fully zero fields.

This is mostly relevant to write into existing memory or memory-mapped records.


Sometimes yes, but you must also not null-terminate the strings in those cases.

And on embedded devices you are often memory constrained so you might reuse an existing structure.



Doesn't say anything about the intention as understood when it was introduced


Get strlcpy and strlcat from OpenBSD.


Any version that still relies on separate parameters is unsafe, no matter what.

Some typo on the buffer limits and the same hazards as always.

Only fix is hardware memory tagging.


C needs to bite the bullet and adopt slices as first-class types, so that they can be optimized on ABI level.


How would you optimize slices the ABI level? Supporting them in function calling conventions should be easy. But figuring out storage representations I suppose would be a huge can of worms. There are too many ways of encoding slices depending on the use case. It only starts with the choice of a length field type (8, 16, 32, 64 bit. signed or unsigned)? There are also other representations thinkable, like sentinel values (NUL terminator) or more implicit storage of the size. Supporting them all in the compiler is not possible in practice.


"Slice" is, by now, a fairly established term in PL design which implies a tuple of (start, end) or (start, length), so it specifically excludes prefixed length, null termination etc - because experience has shown that slices are the only sane choice.

What I meant by optimization is not treating them same as other structs, but e.g. guaranteeing pass-by-register like other primitive types, spelled out explicitly in the ABI. The choice of length field type would be size_t, obviously.


Like every other systems programming language with a slice like feature, including those that predate C.


Please show, don't tell. Your sibling explained that they meant only function call optimization. And I would say it's debatable that this is an "optimization" since slices would be a new concept that is distinct from structs. I agree though that the obvious naive choice would be to pass them in the same way that structs of { ptr, len } are passed in the ABI.


Infosec people have been showing the C folks for decades, showing alone isn't enough, when people refuse to change their habits.


Multics?


I don’t think Multics did hardware memory tagging.

Systems which do/did include Burroughs Large Systems (now Unisys ClearPath MCP), IBM System/38 and AS/400 and IBM i (the RISC versions of which used PowerPC AS Tagged Memory Extensions), ARM MTE, SPARC ADI, and CHERI/ARM Morello.


Multics didn't need it, because PL/I does bounds checking by default, it has proper string and array data structures.


then why not just use memmove() instead of strncpy() if no-NUL is the goal? not to mention memmove() is overlap-safe.


> then why not just use memmove() instead of strncpy() if no-NUL is the goal?

no-nul is not the goal of strncpy, it's the effect of strncpy.

strncpy is designed to work on fixed-size, nul-padded strings. That's why it fills the destination buffer with nuls if the source is too short, and it doesn't guarantee nul-termination (if the source is exactly the size of the destination).


If anyone reading this and feels uncertain on how this is done, DON'T do this:

    snprintf(dest, sizeof dest, source);   // BAD code do not repeat
that looks great at first sight, just another size-checked way of copying strings, but remember that the third argument to `snprintf()` [1] is of course a `printf()`-style formatting string. So if that `source` argument contains any percent symbols, there's gonna be a party in your computer and both Undefined and Behavior are going to show up. You don't want that.

Instead, if you want to use `snprintf()` for this, remember to do:

    snprintf(dest, sizeof dest, "%s", source);

[1]: https://linux.die.net/man/3/snprintf


Thanks for mentioning this. I didn't include it in my comment because I originally thought it was too obvious, but considering it now, it probably was worth making explicit.


I blame Stack Overflow for conditioning me into seeing more of the possible ways things could break, when it comes to C code. :)


`-Wformat-security` [0] to the rescue!

[0]: https://fedoraproject.org/wiki/Format-Security-FAQ


Curl uses its own implementations, curl_msnprintf and curl_mvsnprintf.


The printf() family is much slower than direct string operations. snprintf() is not a good substitution when string copying is frequent.


I tend to stay away from the entire C style strings approach in general whenever I write C.


I always use antirez's SDS string Library. The fact that they are compatible C strings with the only price to pay being a call to sdsfree() instead of free I like a lot. Simple and yet super useful. Check it out.


Strlcpy and strlcat from OpenBSD works.


IMO you should store the size of your strings. If you know the sizes already then you can just memcpy/memmove.


Only for those that never do mistakes with parameter passing.


> But that would be the big selling point as far as I'm concerned.

I mean, it's a trivially replicated function, you can just copy it over from an existing codebase. So it's not exactly a major feature.


strlcpy() doesn't fix the problem of getting the length parameter wrong. The right solution is this:

    char *stecpy(char *d, const char *s, const char *e)
    {
      if (e) e--;
      while (d < e && *s)
        *d++ = *s++;
      if (d)
        *d = '\0';
      return d;
    }

    main() {
      char buf[64];
      char *ptr, *end = buf+sizeof(buf) ;
    
      ptr = stecpy(buf, "hello", end);
      ptr = stecpy(ptr, " world", end);
    }
As discussed here https://twitter.com/hyc_symas/status/1382298601641152513

The point is that the end of the buffer is invariant, there's no reason to screw around recalculating the length of remaining space after each copy into the buffer. This also fixes the nonsense of strcpy/strcat returning the same dst pointer that was passed in. By returning the pointer to where copying ended, you don't need a separate strcat function any more, nor do you have the Shlemiel The Painter problem with strcat.


> The tipping point for me would be `snprintf()` and related functions

Functions can always be implemented with additional header files.

They aren't actual extensions to the language.


But snprintf can be used in C90. You just detect if it's available and use it, as if it were any other platform-specific function.


Hm, why do people choose to do that instead of using the fallback case unconditionally? Expectation that platform-provided snprintf is faster?


The platform-provided snprintf is smaller; it takes up zero bytes in your program.


> I think there are still much better things to do and much more worthwhile efforts to spend our energy on that could actually improve the project and bring it forward.

> Like improving the test suite, increasing test coverage, making sure more code is exercised by the fuzzers.

I wonder if using a more recent version of C might draw in more developers willing to contribute (in a similar vein to Linux kernel introducing Rust).


It seems unlikely. If you're capable of & interested in hacking on something like curl, having to put your variable declarations at the top of the block probably isn't going to be a deal-killer for you.


In C99 there are some nice things over C89 that aren't new syntax, like initializing aggregates in automatic storage ("stack") with non-constant expressions:

   struct point p = { getx(obj), gety(obj) };  /* C90 error, C99 OK */
GCC had this as an extension before C99. Coding around that one can get ugly, and it's just a syntactic limitation.

Also, the C99 preprocessor is more powerful, with variadic support.

The macros I implemented in the cppawk project (awk with C preprocessor) would be impossible without C99. I was able to make a multi-clause loop macro, with user-definable clauses.

https://www.kylheku.com/cgit/cppawk/about/


In 2007-2009, I worked a job that for most was consisted of maintaining a C89 codebase. We were using OpenWatcom, which at the time did not have complete C99 support.

But in retrospect I'm surprised to see how many features we used liberally would have been unavailable in pure C89. snprintf first and foremost. But also // comments, __func__, and stdint.h


AFAIK almost all of C99 functionality was basically about standardizing features found in existing compilers prior to it.


Won't older versions of curl source code be available forever in any case? I understand that people use very old machines/code sometimes, but I don't think it's that unreasonable to say "if you must use a compiler with capabilities stuck in the 20th century, you can't use curl source newer than late 2022".


For things that touch the network and maybe particularly use TLS, sometimes people are suddenly in a hurry not to be using an old version anymore. The people with the classic compilers would probably have to be prepared to backport security fixes on short notice, I figure the curl project might not want to put them into that awkward position.


openssl doesn't support C89. If you have TLS you probably already have access to a new enough compiler.


curl doesn't necessarily require OpenSSL for its TLS functions, in fact it's designed to be TLS-agnostic (to the point that it uses macOS Secure Transport API and Windows Schannel by default). You could either use wolfSSL or use a dedicated TLS code connected to the embedded cryptographic co-processor.


If you are willing to download a new curl, why can't you download a new compiler?


I think this is entierly the right approach. With smaller language footprint you have easier the code is to read and you will have less issues running the code on a wide variety of platforms. Many c99 features (like Variable Length Arrays) have been proven to be less than well designed and since been made optional.


Yes, VLA is bad, but declare anywhere is very desirable. curl can't move due to old MSVC, but once that is solved, it would make a lot of sense to move to C99. The post itself says "It is not a no to C99 forever".


Caveat re VLAs: while C23 doesn’t return VLAs to mandatory status, it does require variably modified types; so no stack space footgun but still a significant complication in the type system:

  void foo(size_t n) { int a[n][n]; }
does not have to be supported outside C99 but

  void bar(size_t n, void *p) { int (*pa)[n][n] = p; }
has to be in both C99 and C23 (though not in C11 or C17).


Is there any user value to changing? Probably not.

Is there risk associated with changing? Probably yes in terms of security and limiting compatibility.

It probably doesn't make sense to change now unless there are specific reason(s) that will lead to impactful improvements.


That's how software stagnates and dies.

> ... risk opening the flood gates for people rewriting things...

This sounds like "we feel that changes are needed but we will not be able control it".


I don't really get why Daniel decided not to go with C99. Doesn't C99 compiler produce the same machine code for ANSI C source code? They could start writing new code in a newer standard and fixing old code in a newer standard gradually, so most of `curl` and `libcurl` source code would have slowly, but steadily transitioned to a newer standard. Or am I missing something?


> Doesn't C99 compiler produce the same machine code for ANSI C source code?

Only if a C99 compiler is available. As the post points out:

> The slowest of the “big compilers” to adopt C99 was the Microsoft Visual C++ compiler, which did not adopt it properly until 2015 and added more compliance in 2019. A large number of our users/developers are still stuck on older MSVC versions so not even all users of this compiler suite can build C99 programs even today, in late 2022.

(emphasis mine)


And that's just the "big compilers" supporting the "big platforms". If you're shipping some bespoke embedded platform where the toolchain/OS integration was custom work, you might very well never upgrade the compiler and just pick a new one when you're forced to define a new platform due to the hardware going EOL.


I'm mixed on this personally, on one hand... I get it. Having curl on the XYZ micro controller you're using for a project is good. At the same time the only reason MSVC added c99 support was community pressure because major projects said "we don't care about MSVC anymore we're moving on"[1]. So clearly there is a correlation between vendors and versions. If the vendors don't feel pressure to support newer versions then they'll just continue to ship the garbage they've been shipping for years. I personally don't have a good answer for this. Daniel very clearly has made his decision and I'll respect that as it's not my project.

[1] Also because the C++ committee ignored Herb and just forced it into later C++ revs from a library perspective.


How hard would it be to do something like C99 (or whatever) > LLVM IR > C89 (auto-generated, doesn't have to be human readable) > niche compiler?


There is no reason to support C99, when C11 and C17 are supported.

Naturally without the stuff that got made optional in C11, no need to spend development effort on legacy features.


He addresses this point in the article:

> [...] we would have to go gently and open up for allowing new C99 features slowly

> A challenge with that approach, is that it is hard to verify which features that are allowed vs used as existing tooling normally don’t have that resolution.

> The question has also been asked that if we consider bumping the requirement, should we then not bump it to C11 at once instead of staying at C99?

The motivation ultimately boils down to:

> Ultimately, not a single person has yet been able to clearly articulate what benefits such a C flavor requirement bump would provide for the curl project


I mean, I am not a curl developer so I don't get to vote, but restricting variable scopes to loops alone sounds worthwhile to me.


You can still add a set of curly braces around the loop to scope variables.

The syntax isn't as nice, and you have to remember to do it (maybe a linter can help with that) but it ends up more of a "nice to have".


The nuance from the article is that they are adopting C99, but incrementally, feature by feature, as needed.


More language isnt necceseraly better. The more features a language has the harder it becomes to read. When a language gives you a lot of options, you do spend more time chosing btween options that on the problem at hand.


Smaller language isn't necessarily better, either. Case in point: no booleans make a language "smaller"; result: each codebase will have their own, different, implementation.


There is no one way to do booleans because the choice of implementation depends on what your goals are. If you value space you should pack you booleans as bits in a larger type, but if you want fast access you should use int. bool has some nice properties becaus it doesnt have multiple states of true unlike int. C gives you types that have properties but doesnt prescribe what they should be used for.


Packed boolean arrays are an entirely different thing; as a matter of fact they are arrays, not a primitive data type.

The boolean primitive data type is a standard in modern languages. Take any language: Rust, Go, Zig... and C99 (kind of).

In C89, a project may use enums. Another macros. Another ints.


Yes and im saying they may have good reason to use different types because they have different requirements.


I think I can see why, I never tried C or C++, but Zig was a pleasure to learn and play around with.


> Ultimately, not a single person has yet been able to clearly articulate what benefits such a C flavor requirement bump would provide for the curl project. We mostly see a risk that we all get caught in rather irrelevant discussions and changes that perhaps will not actually bring the project forward very much. Neither in features nor in quality/security.


Didn't Linux just switch for added safety against some speculative execution vulns? Keeping iterator variables scoped to their respective loops, it seems.

Isn't cURL susceptible to this too? (Not to mention i always find that style much nicer :)


Previously posted by /u/edent (not much discussion though):

https://news.ycombinator.com/item?id=33636732


> However, there is no longer any modern compiler around that does not support this.

What features of C99 isn't this true of today?


> What features of C99 isn't this true of today?

IIRC, MSVC stubbornly refuses to add support for variable-length arrays (which AFAIK are required for full C99 support). I don't know if there's anything else on that list of C99 features that MSVC doesn't support yet.


> IIRC, MSVC stubbornly refuses to add support for variable-length arrays (which AFAIK are required for full C99 support).

You are correct that VLAs are mandatory for C99, but they turned out to be such a bad idea that they are optional in C11 onwards.

> I don't know if there's anything else on that list of C99 features that MSVC doesn't support yet.

IIRC, MSVC's `snprintf` and friends are broken (returns incorrect values and/or interprets the size parameter incorrectly). I think all of the annexure K stuff is broken in MSVC.


> I think all of the annexure K stuff is broken in MSVC.

Doesn’t annex k only exist in MSVC because it’s a bunch of crap MS got the committee to add and no one else wanted to implement?

And isn’t it a C11 thing?


> Doesn’t annex k only exist in MSVC because it’s a bunch of crap MS got the committee to add and no one else wanted to implement?

Yes, but in a perverse twist of fate, Microsoft's implementation does not conform.

> And isn’t it a C11 thing?

I stand corrected, it is a C11 thing.


The MS implementations were given different semantics in annex K. They aren't standard compliant.


I think VLAs were a very good idea and they were not made optional because somebody thought they were a bad idea, but simply to make C11 easier to implement. VLA can be dangerous when the size depends on external input an when implemented without stack probing.


No they are not.

Google has paid the development effort for the Linux kernel to get rid of all VLA occurrences.


The Linux kernel is a very different environment than user space; it has a very limited stack space (16 KiB IIRC), and the consequences of exceeding that are worse than in user space.

> Google has paid the development effort for the Linux kernel to get rid of all VLA occurrences.

IIRC, what they were really interested in getting rid of was not the normal VLA we're talking about, but something even more esoteric called VLAIS (variable length arrays in structures), which clang didn't support. See https://lwn.net/Articles/441018/ for a discussion about that.


I have seen all Linux Plumber talks on the subject.


I was tangentially involved in this and I think this was misguided.


> when implemented without stack probing.

I dunno how to probe the stack in standard C, and so I never used VLAs, nor allowed them because (I thought that) there was no way to determine if a VLA declaration would cause a stack overflow.


>MSVC stubbornly refuses to add support for variable-length arrays (which AFAIK are required for full C99 support).

They were made optional in C11. It was a good decision because it's a dangerous misfeature.


I find them very useful and they often make the code cleaner. A dynamic run-time bound for a buffer can also make the code safer. VLAs are only dangerous when implemented naively (allocated on the stack without stack probing).


>are only dangerous when implemented naively (allocated on the stack without stack probing).

That's how it's implemented in all the major compilers. Anyway, even a hypothetical heap-based implementation would be bad because there's no way to report allocation errors.


GCC has stack probing.


Interestingly, I've seen code that uses the flexible array member idiom with MSVC. I'm fairly certain I first saw this in the windows code base when I worked at MS, and I believe it may even be in some public headers somewhere. In order to compile on MSVC, the trick uses MemberName[0] or MemberName[1] rather than the standard MemberName[]. GCC and clang warn about this. I suspect it might have been a common idiom in pre-C99 days.


You are probably confusing flexible array members with VLA. Those are two completely different features. You are thinking about the one where last member of struct is an array and you adjust size based on malloc size when allocating struct. VLA feature allows you specifying arbitrary non constant expression when declaring local stack variables with array type. Something like:

// VLA

int foo(int n) { int array[n]; }

// flexible array member

struct s { int n; double d[]; }; struct s s1 = malloc(sizeof (struct s) + (sizeof (double) 8));


I'm not confusing them. That's why I said flexible array member and not VLA.

I don't need a lecture. I have actually lectured on C before.

I do have them bucketed similarly in my mind. They are both c99 features about variable sized arrays. MS has not implemented either one iirc.


>MS has not implemented either one iirc.

Flexible array members were implemented a long time ago. The oldest Visual C++ I have at hand is 2005 and it already has them (although only in C mode).


> I suspect it might have been a common idiom in pre-C99 days.

There is a Blog from Raymond on this[1]. The TL;DR: is that zero length arrays and FAMs weren't legal until C99. Not sure where the support from MSVC factors in. But that's the story and AFAIK he's sticking too it.

[1] https://devblogs.microsoft.com/oldnewthing/20040826-00/?p=38...


You don't even have to go to C99. Microsoft just had a confirming C89 preprocessor implementation for 3 years. [0]

[0] https://learn.microsoft.com/en-us/cpp/preprocessor/preproces...


> tldr: we stick to C89 for now.

Cool.


[flagged]


The approaches C and Rust projects have to upgrades are so radically different, it's comical:

curl: let's not rush, 23 years is not enough time for people to update their compilers.

Rust: the compiler released last Thursday is the oldest supported version.



It's not really up to the curl people to rewrite it in rust.

It's for the proponents of rust.

Personally, I'm largely convinced that writing in rust will lead to safer implementations... now I'm just waiting for the full-featured, well-though-out, stable, useful rust implementations to appear.

I'm not married to curl. Where's "rurl" or whatever? If it exists and does something useful as well as curl (or better), I'll happily use it.


Well, curl is a bit of swiss army chainsaw for protocols and in general there is rarely a need to have same subsystem (of "downloading/sending something to an url) have support for protocols from HTTP thru IMAP all the way to LDAP.

So there is really no reason to use it Rust, use smaller, protocol-specific libs, less security problems and if you need multi-protocol support just switch by protocol and maybe write some wrappers.

And for calling it from C, "Rewrite it in Rust but make ABI C compatible" means you have to repeat a lot of Curl's idiosynchracies all while recreating all of the functionality. Looking at https://curl.se/docs/security.html there isn't that many of them that Rust safety would prevent so the real gains from that aren't probably all that great.

Now OpenSSL on another hand...


That would be a major effort, without a very clear benefit. Curl is massive, just scroll through its man page.


I just scrolled through the man page at a speed of 15 pages per second. Aside from that, curl is an extremely important and impressive piece of software that I use daily.


Massive and written in C, you'd be mad to use it ... oh

    curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh



>Rustaceans at the back of the auditorium slowly lower their hands ...

And start a rewrite themselves afterwards.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: