The reasons given are reasons to stick with C89 forever. Change has risk, and there are advantages and disadvantages. If there were no advantages gained from C99, it wouldn't exist (people don't release language updates for no reason).
A much more interesting question is if you were writing curl today, what would you do. If the answer is still 'C89' then we as a profession have to wonder why - did we get it exactly right, and there are no lessons from the last 30 years, or is the fact there are no better alternatives deeply depressing.
For systems programming, C89 is definitely a "sweet spot". It's relatively easy to write a compiler for and was dominant when system variety was at its highest, so it is best supported across the widest variety of platforms. Later C standards are harder to write compilers for and have questionable features like VLAs and more complicated macro syntax. C++ is a hideous mess. Rust is promising, and would probably be my choice personally, but it's also still fairly new and will limit your deployment platform options.
C89 is still a reasonable choice today. I don't think it's depressing. It's a good language, and hitting the sweet spot of both language design and implementation is really hard, so you'd expect to have very few options.
Re: Rust, Curl is modular enough that you don't need to rewrite it in Rust in order to enjoy some of its benefits, you can just tell Curl to use Hyper (Rust's de-facto HTTP lib) as a backend. For the past few years they've been working on getting the Hyper backend to pass the Curl test suite and they're down to five remaining tests, so perfect support looks to be imminent: https://github.com/orgs/hyperium/projects/2/views/1 (seanmonstar occasionally streams on Twitch if you'd like to watch him work on these).
Also, and this sort of blows my mind, but Rust is almost 10 years old. It is a pretty darn stable language, especially for greenfield projects like a new HTTP library. The better Rust gets at interop the more it will begin to eat the systems programming world IMO, and we all benefit, even if it is quietly doing it without much fanfare.
The other side of that is the back end of what Rust folks are doing. There is a vocal segment that are doing a lot of surface level things. But there are also people quietly building the language and toolchain up to be something that you can do true low level embedded work with while maintaining (most of) the guarantees of Rust. That is what I was alluding to. I don't think "rewrite it in Rust" is always smart or even productive.
edit: It is also worth exploring why and how a systems programming language has generated this much excitement in folks that are "rewriting it in Rust". These people are also in their way making it that much easier for everyone else to transition to Rust, proving these projects work just fine in Rust. I do agree it is a meme, but it is a good one for us all. As an infosec practitioner nothing could make me happier than seeing people excited about a language that eradicates one of the worst and most pernicious classes of C/C++ bugs.
> The combination of BASED and REFER leaves the compiler to do the error prone pointer arithmetic while having the same innate efficiency as the clumsy equivalent in C. Add to this that PL/1 (like most contemporary languages) included bounds checking and the result is significantly superior to C.
One data point more backing my theory that we're in the middle(?) of "computing dark ages", where the biggest crap and nonsense dominates everything (and people not even know how crappy everything is).
Do you think we will ever leave the dark age?
I mean before our AI overlords get in charge and kill and replace all the nonsense we've built, of course.
This is a matter of quality, and like everything in computing, quality only matters when money or law is involved, so in a way returning digital goods with refund is one way to make companies take quality more seriously, other are stricter liability laws for when exploits ocurr in the wild.
It's not fair to judge an entire ecosystem full of extremely talented people by the vocal (and insufferable) 1%. Every group has them. What has become a meme has zero relationship to the quality of the thing.
Rust is over 12 years old at this point as a publicly-available project (its development started internally sometime in the late 00s; it was publicized in summer of 2010).
Periodization is hard, especially with Rust, but if we're talking about reliability, counting time before 1.0 in 2015 doesn't feel right to me. Seven and a half years is still a long time :)
There are languages out there with much more advanced features than Rust, like e.g. Scala. But the IDE experience in Scala is not worse than with Java.
There is no fundamental problem with IDE support. It just takes some work with an advanced language.
(The only issue are languages which you need to write "backwards", like Haskell. But that's another story.)
I recently started working with Rust for contributing to projects like Rome/tools [1] and deno_lint [2]. My first impression with Rust is a bit frustrating: compilation takes tens of seconds or minutes. I am waiting in front of my IDE to get type hints / go-to def (often to fall-back to a text search). When I am launching unit tests, I am waiting that rust-analyzer terminates its indexing, and then I am waiting again that the tests compile…
The tools are now mature, and a lot of engineering work is done on both rust compiler and rust-analyzer. I am afraid that the slow compilation of Rust is rooted to its inherent complexity.
> I am afraid that the slow compilation of Rust is rooted to its inherent complexity.
AFAIK that's not the case.
The problem here is that Rust has "issues" with separate compilation due to some language design decisions (which affect incremental compilation than obviously). It's not built for that and it is, and continuously will be, quite difficult to make this work somehow.
But that's less a problem with the complexity of the language as such.
My Scala example stands: Scala is also quite complex and not the fastest to compile. But after the build system and the compiler crunched the sources once (which may take many minutes on a larger code base) the IDE is very responsive. Things like type hints or go-to def are more or less instant. Code completion is fast enough to be used fluently. Edit-compile-test cycles are fast thanks to the fact that separate compilation considerations were part of the language design decisions. (That's for example why Scala has orphan type-class instances; which are a feature and a wart at the same time).
As I understand Rust's "compilation units" are actually crates. This is not very fine granular and I guess the source of the issues.
I would guess splitting code into a few (more) crates (which than need to depend on each other) may improve the incremental build times. Also things like not building optimized code during development of course apply, but I think cargo does this automatically anyway.
But I'm not an expert on this. Would need to look things up myself.
Maybe someone else has some proven tricks to share?
…
OK, a quick search yielded some useful results, so I share:
Thanks for the detailed answer and the pointed resources!
In fact, I included the lack of compilation locality in "inherent complexity of Rust". However, I agree that this could be considered apart.
In my experience with TypeScript (quite different, I admitted), splitting in distinct compilation unit may help. However, this does not solve the issue.
This could be great if Rust could deprecate some features in order to improve its compilation speed. I am not sure if it is feasible…
I think I followed. One aspect of a programming language is how easy it is to build a useful IDE for with code prediction, navigation, refactoring, etc. Java is relatively easy, Lua is very hard. Rust is somewhere in the middle, with macros being a complicating factor. https://rust-analyzer.github.io/blog/2021/11/21/ides-and-mac... discusses the problems specific to rust much better than I could.
I'm unsure exactly what the above poster is trying to say, I generally find Rust development very pleasant with nothing but vscode and Rust-analyzer.
But... I'll admit there is one major stumbling block so far. Debugging iterator chains can be cumbersome because of the disconnect between the language and the compiled code. I've found myself stepping in and out of assembly more than I'd like. I assume this is the kind of problem that can be overcome with a nicer debugger though.
> I assume this is the kind of problem that can be overcome with a nicer debugger though.
I think this would be something that modern debuggers need to solve somehow in general.
There are more and more languages with high amount of syntax sugar, where the output to be debugged doesn't have much in common anymore with the code written.
Debuggers need to be aware of desugarings somehow.
But it makes no sense to implement this on a case by case basis for every language. We need next generation debuggers! (But I have no clue how "a sugar aware debugger" could be implemented; something in the direction of "source maps" maybe?)
So you're arguing C89 is better because it's easier to write compilers with it? How is that a relevant point for the context? We're talking about whether c99 it's better migrating for the end user and not a compiler writer
They are arguing that C89 is better because it's supported on more platforms (curl is used on all sorts of oddball embedded systems), and that it's easier to get a new platform going with C89
The requirements for writing curl are different from the requirements for writing other software. Just because C89 is a good choice for curl doesn't mean that C99 isn't a better choice for other things. The failure isn't in having revised a language, it's in thinking that all projects using the older version must upgrade. The idea that progress is linear is an illusion.
The question is more complicated than that: curl is so popular it is used on systems where c99 is not available. The question is how many of those exist, and are at which point it's not worth supporting them anymore.
Don't you have to compile separately for each `msvcrt` environment, as I thought they aren't binary compatible? And would a non-C99 msvcrt necessarily have an `snprintf()` implementation in its libc-equivalent dll?
You can call code compiled with one msvcrXX from code compiled with a different msvcrXX, provided that you don't try passing things from one to the other (that is, no passing a pointer to a FILE structure, or even a file descriptor since the file descriptor table is on the msvcrXX instead of the kernel), and always free or realocate memory using the same msvcrXX (that is, don't allocate memory and expect your caller to call free() on it, always provide a custom deallocation function for your objects).
This is possible because, unlike on Linux where function names are global, on Windows function names are scoped to the DLL, so you can have MSVCR71.DLL and MSVCR81.DLL loaded at the same time in the same process and they won't interfere with each other.
OK, but isn't the point of building a program that links to (say) MSVCR71.DLL that you're expecting to run it in an environment where (say) MSVCR81.DLL isn't available?
I don't see how that fixes the problem of possibly not having an snprintf() implementation on a system that doesn't have a C99-compatible MSVC runtime environment.
Did I miss an implication of your comment somehow?
If you compile your program against some MSVCRT then it's your job to make sure that MSVCRT is available on the machine where your program is installed, by delegating to its installer.
All supported MSVCRTs are installable on all supported Windows SKUs.
Yes, that's the thing I always forget, the way Windows deals with multiple incompatible versions of msvcrt is that every application ships its own copy of libc, and hopefully the installer is well-written enough to only copy it into place if it's newer than the newest release of the same major version that's already there, lest a random app re-introduces a bunch of security issues that should have been closed by the last security update for every other application that uses the same msvcrt.
...and by "forget", I mean "block out due to trauma, because surely it can't be that stupid".
Linker scripts change whether or not symbols are added to a global symbol table for subsequent requests (i.e. "exported"). Though, you don't even need a linker script to effect visibility as both GCC and clang provide a visibility function attribute, and you can change the default visibility through a simple compiler command switch.
dlopen permits you to control whether exported (externally visible) functions in a module become available to satisfy link dependencies in the application, such as subsequent module loads. See the dlopen flags RTLD_GLOBAL and RTLD_LOCAL.
dlmopen is for controlling the visibility of shared library dependencies pulled in by dlopen'd modules, whether RTLD_GLOBAL or RTLD_LOCAL, which only effect the immediate symbols in the module and not symbols from automatically loaded shared library dependencies. If you link the main application with OpenSSL (-lssl -lcrypto), or a prior module you dlopen'd pulled in OpenSSL as a dependency, then those OpenSSL symbols become available to satisfy requirements for subsequent dlopen'd modules. dlmopen allows you to create an entirely different symbol namespace for a module or modules, where symbols dependencies are only ever satisfied from that namespace, and exported (global) symbols, whether pulled in by dlopen or transitively via a shared library, are never visible outside that namespace.
None of these options directly map to the behavior of DLLs. DLLs fundamentally use different semantics, AFAIU. The closest behavior to DLLs might be DT_RUNPATH + dlmopen, but dlmopen use is explicit so not really the same thing. You could use ELF symbol versioning (maybe in combination with DT_SONAME and DT_RUNPATH) to accomplish the same thing as DLLs by effectively renaming all the symbols in a library (e.g. attaching a version component), but there aren't any tools around to help automate that, AFAIK; you'd have to generate linker scripts and it'd be a complex build. Much easier to just static link at that point.
For C, Windows has had a stable CRT (libc in Unix speak) for several years now, since Win10. There are still cross-runtime compatibility concerns with C++, but those shouldn't apply here.
More than you would think, but curl is used to talk to the local network too, or inside VPN as well.
Some hacks are quite crazy.
E.G: there is this very old navy broacasting protocol, NMEA (https://en.wikipedia.org/wiki/NMEA_0183), that was designed so it could be transmitted through old fashion radio waves. You'll find it in some sonars, water sensors or AIS beacons. For this reason, despite that it looks more like a layer 4 protocol, it embeds its own packet format and checksum, all in ASCII, that clients are expected to parse.
Now of course, a lot of devices are still emitting their data in NMEA, and it's not uncommon to be able to just telnet or netcat (if UDP) into one to see the data flowing.
But after a while, people started to aggregate those data from their numerous sources into one single router, and expose this router for convenience through... HTTP over TCP/IP.
And now you have those all those old computer towers (some still rocking a CRT screens or windows xp) doing long polling to get broadcasting data over a protocol that was made for request/reponse, to read the payload that is another protocol that was meant for radio equipment and hence requires manual consistency checks, that is transported by yet another protocol that is doing its best to preserve packets.
And they say the spirit of hacking is dead :)
(sometimes I feel IoT or domotic stacks look the same honestly)
Of course, somewhere in there, there is a curl call. The question is therefore does curl author want to support a potential upgrade path for such twisted use case or not. I would say "nahhhh", but maybe curl had precisely the success it did because the author was ready to support it in crazy settings.
Maybe. Or maybe they have the only know toolchain to work installed on this single machine somewhere in the basement (seen in a healthcare corp), or they have a chain of trust that needs way too much effort to verify again (seen in the army), or they are not comptent enough to do such thing (seen in airports), or their whole stack is so old it can only run on this stuff (seen in ports), or their target is exotic and you can't cross compile to it easily (seen in aerospace).
Again, not sure that it means curl should endorse such niche situations, but the modes of failure are numerous.
But that's not because there is any issue with VLAs. The issue is that C concepts are stupid, but nobody fixes the roots of the issues.
If you put something of variable length (a VLA) into something with limited static length (the "stack") it will explode. That's nothing new and nothing special or exclusive to VLAs. Actually, exactly this is one of the main issues with the bad C design since inception: It does not do bound checks (especially no static ones; as this would require proper depended typing for safe VLAs). Out of bound access will just "explode" as always in C (likely leaving a nice security crater).
To be honest I don't get why we're still stuck with the stack / heap nonsense. There is not stack (or heap). There is only memory.
What would be much more interesting would be direct control over the caches… Instead we still use the pure fantasy products "stack" and "heap" which are actually irrelevant (as they don't exist in the end).
There is nothing "special" about "stack" memory. That's just a very primitive region based automatic memory allocator backed into the C runtime!
The whole "using registers" thingy in context of "stack" is also just fake by now. You don't use HW registers—but some virtualization of them presented to you by the VM that runs inside the CPU. So you don't control register allocation anyway! So this could be made completely transparent without any impact. (The VM inside the CPU does the actual register allocation fully automatic. Presenting the "faked virtual ISA registers" to the outside world just to make "legacy" code happy).
> If there were no advantages gained from C99, it wouldn't exist (people don't release language updates for no reason).
That's not what the post said. He didn't say that C99 offered no advantages to anyone. He said no one could come up with benefits to the curl project that would be gained by moving to C99, therefore the risk introduced by doing so was not worth it for now (my paraphrase, obviously).
It sounds to me like a perfectly good reason to stay with the current standard for that project.
Writing curl today for your own use, on a platform/OS/tech stack you control, or to target all the places where curl runs right now?
It's still deeply depressing how costly/impractical it is to apply improved technologies in the long tail of environments that isn't "linux on amd64" and similar, but it's not really a language design question in my opinion. We didn't get it "exactly right", we got it "good enough", and upgrade costs are prohibitively high for the general case.
"people don't release language updates for no reason", indeed, many reasons are in the end "planned obsolescence" or ways to make even a naive compiler so much complex that only few remains, and of course in control of very few groups of ppl, and it is near impossible to implement reasonably a real life alternative.
My opinion is C is already way too rich and complex. I would stick to c89 with benign bits of c99 and c11. The benchmark being "one average system developer coding a naive and real life C compiler in a reasonable amount of time and effort".
That said, I know that my "next" C compiler will probably be a RISC-V assembler with a very conservative usage of a macro preprocessor.
The tipping point for me would be `snprintf()` and related functions. I've found it generally more useful/memorable/readable than strncpy()/strlcpy() and other updates to the dangerous and deprecated strcpy() just for copying strings safely, never mind it's other formatting abilities.
If Curl already has its own "decent and functional replacement" for `snprintf()` that's used extensively throughout the codebase, or if they just don't need that functionality (I haven't checked) then I guess that's not an issue. But that would be the big selling point as far as I'm concerned.
Note that strncpy() is not intended for safety. The purpose of strncpy() is to write to fixed size data structures such as part of the filesystem where you don't want to store NUL termination on strings.
Like 1980s Internet protocol features the rationale for weird things in C is more often "That's how Unix works" than "This is actually a clever safety feature".
> write to fixed size data structures such as part of the filesystem where you don't want to store NUL termination on strings
... AND where you want to pad the remaining space with zero bytes, so that you don't leak uninitialized memory onto the disk, or network.
The null byte padding behavior of strncpy makes it clear what the intended use was.
Also, the way C initializes character arrays from literals has strncpy-like behavior, because the entire aggregate is initialized, so the extra bytes are all zero:
char a[4] = "a"; // like strncpy(a, "a", 4);
char b[4] = "abcd"; // like strncpy(a, "abcd", 4);
the compiler could literally emit strncpy calls to do these initializations, so we might say that strncpy is a primitive that is directly relevant for run-time support for a C declaration feature.
Is the original intent of strncpy() germane to the GPs comment? Explicitly stating the max length to copy is an effective tool for avoiding buffer overruns, regardless of whether the designers imagined that important use case.
But the thing it does (fill out a fixed sized buffer without caring about NUL-termination) is not at all what you'd want from a safety feature.
If you look at this function assuming it's a safety feature, that's a huge surprise, and indeed if you were skimming you might miss what it does because (in the context of "it's a safety feature") this is an insane choice. "Why would you do that?". Well, because it's not a safety feature.
The perf cost isn't what you'd expect from a safety feature either. Suppose we have a 1MB buffer, and we strncpy "THIS" into it using n = 1024. That's just four bytes right? Nope. strncpy() will write "THIS" and then 1020 zero bytes.
Except strncpy is broken for C strings, because it doesn't guarantee nul termination. So if you forget to force a termination on every use, you get buffer overruns.
Not only that, but (because of its actual purpose) it also fills the buffer with nuls, which is a complete waste of resources.
So yes, the original intent of strncpy() germane to the GPs comment, because it makes strncpy actively dangerous and complete shit when working with C strings.
The output side of strncpy() might not be a str___() function, but AFAICS the input side of strncpy() is clearly a str___() function, since it stops reading (but not writing) at the first NUL byte.
But it's not an str* function, it's an strn* function. And most (though not all, that would be too easy) work on fixed-size (hence the n) nul-padded strings.
No, it isn't a string function of any kind. "A string is a contiguous sequence of characters terminated by and including the first null character." § 7.1.1.
Calling a bespoke byte-sequence data structure a "string" is inaccurate. Treating strncpy() as a string function is erroneous and can easily lead to memory corruption.
Do we have a source for what the intended purpose is? I think you speak well to the effective purpose, but I'm not sure if it was that clear when it was introduced.
On early Unix systems, the directory structure was a simple 16-byte record; 14 bytes for the name, and 2 bytes for the inode. [1] strncpy() was used to simply record the file name into this structure.
[1] "UNIX Implementation" by Ken Thompson, _The Bell System Technical Journal_, July-August 1978, Vol 57, No 6, Part 2, pg 1942.
How would you optimize slices the ABI level? Supporting them in function calling conventions should be easy. But figuring out storage representations I suppose would be a huge can of worms. There are too many ways of encoding slices depending on the use case. It only starts with the choice of a length field type (8, 16, 32, 64 bit. signed or unsigned)? There are also other representations thinkable, like sentinel values (NUL terminator) or more implicit storage of the size. Supporting them all in the compiler is not possible in practice.
"Slice" is, by now, a fairly established term in PL design which implies a tuple of (start, end) or (start, length), so it specifically excludes prefixed length, null termination etc - because experience has shown that slices are the only sane choice.
What I meant by optimization is not treating them same as other structs, but e.g. guaranteeing pass-by-register like other primitive types, spelled out explicitly in the ABI. The choice of length field type would be size_t, obviously.
Please show, don't tell. Your sibling explained that they meant only function call optimization. And I would say it's debatable that this is an "optimization" since slices would be a new concept that is distinct from structs. I agree though that the obvious naive choice would be to pass them in the same way that structs of { ptr, len } are passed in the ABI.
I don’t think Multics did hardware memory tagging.
Systems which do/did include Burroughs Large Systems (now Unisys ClearPath MCP), IBM System/38 and AS/400 and IBM i (the RISC versions of which used PowerPC AS Tagged Memory Extensions), ARM MTE, SPARC ADI, and CHERI/ARM Morello.
> then why not just use memmove() instead of strncpy() if no-NUL is the goal?
no-nul is not the goal of strncpy, it's the effect of strncpy.
strncpy is designed to work on fixed-size, nul-padded strings. That's why it fills the destination buffer with nuls if the source is too short, and it doesn't guarantee nul-termination (if the source is exactly the size of the destination).
If anyone reading this and feels uncertain on how this is done, DON'T do this:
snprintf(dest, sizeof dest, source); // BAD code do not repeat
that looks great at first sight, just another size-checked way of copying strings, but remember that the third argument to `snprintf()` [1] is of course a `printf()`-style formatting string. So if that `source` argument contains any percent symbols, there's gonna be a party in your computer and both Undefined and Behavior are going to show up. You don't want that.
Instead, if you want to use `snprintf()` for this, remember to do:
Thanks for mentioning this. I didn't include it in my comment because I originally thought it was too obvious, but considering it now, it probably was worth making explicit.
I always use antirez's SDS string Library. The fact that they are compatible C strings with the only price to pay being a call to sdsfree() instead of free I like a lot. Simple and yet super useful. Check it out.
The point is that the end of the buffer is invariant, there's no reason to screw around recalculating the length of remaining space after each copy into the buffer. This also fixes the nonsense of strcpy/strcat returning the same dst pointer that was passed in. By returning the pointer to where copying ended, you don't need a separate strcat function any more, nor do you have the Shlemiel The Painter problem with strcat.
> I think there are still much better things to do and much more worthwhile efforts to spend our energy on that could actually improve the project and bring it forward.
> Like improving the test suite, increasing test coverage, making sure more code is exercised by the fuzzers.
I wonder if using a more recent version of C might draw in more developers willing to contribute (in a similar vein to Linux kernel introducing Rust).
It seems unlikely. If you're capable of & interested in hacking on something like curl, having to put your variable declarations at the top of the block probably isn't going to be a deal-killer for you.
In C99 there are some nice things over C89 that aren't new syntax, like initializing aggregates in automatic storage ("stack") with non-constant expressions:
struct point p = { getx(obj), gety(obj) }; /* C90 error, C99 OK */
GCC had this as an extension before C99. Coding around that one can get ugly, and it's just a syntactic limitation.
Also, the C99 preprocessor is more powerful, with variadic support.
The macros I implemented in the cppawk project (awk with C preprocessor) would be impossible without C99. I was able to make a multi-clause loop macro, with user-definable clauses.
In 2007-2009, I worked a job that for most was consisted of maintaining a C89 codebase. We were using OpenWatcom, which at the time did not have complete C99 support.
But in retrospect I'm surprised to see how many features we used liberally would have been unavailable in pure C89. snprintf first and foremost. But also // comments, __func__, and stdint.h
Won't older versions of curl source code be available forever in any case? I understand that people use very old machines/code sometimes, but I don't think it's that unreasonable to say "if you must use a compiler with capabilities stuck in the 20th century, you can't use curl source newer than late 2022".
For things that touch the network and maybe particularly use TLS, sometimes people are suddenly in a hurry not to be using an old version anymore. The people with the classic compilers would probably have to be prepared to backport security fixes on short notice, I figure the curl project might not want to put them into that awkward position.
curl doesn't necessarily require OpenSSL for its TLS functions, in fact it's designed to be TLS-agnostic (to the point that it uses macOS Secure Transport API and Windows Schannel by default). You could either use wolfSSL or use a dedicated TLS code connected to the embedded cryptographic co-processor.
I think this is entierly the right approach. With smaller language footprint you have easier the code is to read and you will have less issues running the code on a wide variety of platforms. Many c99 features (like Variable Length Arrays) have been proven to be less than well designed and since been made optional.
Yes, VLA is bad, but declare anywhere is very desirable. curl can't move due to old MSVC, but once that is solved, it would make a lot of sense to move to C99. The post itself says "It is not a no to C99 forever".
Caveat re VLAs: while C23 doesn’t return VLAs to mandatory status, it does require variably modified types; so no stack space footgun but still a significant complication in the type system:
void foo(size_t n) { int a[n][n]; }
does not have to be supported outside C99 but
void bar(size_t n, void *p) { int (*pa)[n][n] = p; }
has to be in both C99 and C23 (though not in C11 or C17).
I don't really get why Daniel decided not to go with C99.
Doesn't C99 compiler produce the same machine code for ANSI C source code?
They could start writing new code in a newer standard and fixing old code in a newer standard gradually, so most of `curl` and `libcurl` source code would have slowly, but steadily transitioned to a newer standard.
Or am I missing something?
> Doesn't C99 compiler produce the same machine code for ANSI C source code?
Only if a C99 compiler is available. As the post points out:
> The slowest of the “big compilers” to adopt C99 was the Microsoft Visual C++ compiler, which did not adopt it properly until 2015 and added more compliance in 2019. A large number of our users/developers are still stuck on older MSVC versions so not even all users of this compiler suite can build C99 programs even today, in late 2022.
And that's just the "big compilers" supporting the "big platforms". If you're shipping some bespoke embedded platform where the toolchain/OS integration was custom work, you might very well never upgrade the compiler and just pick a new one when you're forced to define a new platform due to the hardware going EOL.
I'm mixed on this personally, on one hand... I get it. Having curl on the XYZ micro controller you're using for a project is good. At the same time the only reason MSVC added c99 support was community pressure because major projects said "we don't care about MSVC anymore we're moving on"[1]. So clearly there is a correlation between vendors and versions. If the vendors don't feel pressure to support newer versions then they'll just continue to ship the garbage they've been shipping for years. I personally don't have a good answer for this. Daniel very clearly has made his decision and I'll respect that as it's not my project.
[1] Also because the C++ committee ignored Herb and just forced it into later C++ revs from a library perspective.
> [...] we would have to go gently and open up for allowing new C99 features slowly
> A challenge with that approach, is that it is hard to verify which features that are allowed vs used as existing tooling normally don’t have that resolution.
> The question has also been asked that if we consider bumping the requirement, should we then not bump it to C11 at once instead of staying at C99?
The motivation ultimately boils down to:
> Ultimately, not a single person has yet been able to clearly articulate what benefits such a C flavor requirement bump would provide for the curl project
More language isnt necceseraly better. The more features a language has the harder it becomes to read. When a language gives you a lot of options, you do spend more time chosing btween options that on the problem at hand.
Smaller language isn't necessarily better, either. Case in point: no booleans make a language "smaller"; result: each codebase will have their own, different, implementation.
There is no one way to do booleans because the choice of implementation depends on what your goals are. If you value space you should pack you booleans as bits in a larger type, but if you want fast access you should use int. bool has some nice properties becaus it doesnt have multiple states of true unlike int. C gives you types that have properties but doesnt prescribe what they should be used for.
> Ultimately, not a single person has yet been able to clearly articulate what benefits such a C flavor requirement bump would provide for the curl project. We mostly see a risk that we all get caught in rather irrelevant discussions and changes that perhaps will not actually bring the project forward very much. Neither in features nor in quality/security.
Didn't Linux just switch for added safety against some speculative execution vulns? Keeping iterator variables scoped to their respective loops, it seems.
Isn't cURL susceptible to this too? (Not to mention i always find that style much nicer :)
IIRC, MSVC stubbornly refuses to add support for variable-length arrays (which AFAIK are required for full C99 support). I don't know if there's anything else on that list of C99 features that MSVC doesn't support yet.
> IIRC, MSVC stubbornly refuses to add support for variable-length arrays (which AFAIK are required for full C99 support).
You are correct that VLAs are mandatory for C99, but they turned out to be such a
bad idea that they are optional in C11 onwards.
> I don't know if there's anything else on that list of C99 features that MSVC doesn't support yet.
IIRC, MSVC's `snprintf` and friends are broken (returns incorrect values and/or interprets the size parameter incorrectly). I think all of the annexure K stuff is broken in MSVC.
I think VLAs were a very good idea and they were not made optional because somebody thought they were a bad idea, but simply to make C11 easier to implement. VLA can be dangerous when the size depends on external input an when implemented without stack probing.
The Linux kernel is a very different environment than user space; it has a very limited stack space (16 KiB IIRC), and the consequences of exceeding that are worse than in user space.
> Google has paid the development effort for the Linux kernel to get rid of all VLA occurrences.
IIRC, what they were really interested in getting rid of was not the normal VLA we're talking about, but something even more esoteric called VLAIS (variable length arrays in structures), which clang didn't support. See https://lwn.net/Articles/441018/ for a discussion about that.
I dunno how to probe the stack in standard C, and so I never used VLAs, nor allowed them because (I thought that) there was no way to determine if a VLA declaration would cause a stack overflow.
I find them very useful and they often make the code cleaner. A dynamic run-time bound for a buffer can also make the code safer. VLAs are only dangerous when implemented naively (allocated on the stack without stack probing).
>are only dangerous when implemented naively (allocated on the stack without stack probing).
That's how it's implemented in all the major compilers. Anyway, even a hypothetical heap-based implementation would be bad because there's no way to report allocation errors.
Interestingly, I've seen code that uses the flexible array member idiom with MSVC. I'm fairly certain I first saw this in the windows code base when I worked at MS, and I believe it may even be in some public headers somewhere. In order to compile on MSVC, the trick uses MemberName[0] or MemberName[1] rather than the standard MemberName[]. GCC and clang warn about this. I suspect it might have been a common idiom in pre-C99 days.
You are probably confusing flexible array members with VLA. Those are two completely different features. You are thinking about the one where last member of struct is an array and you adjust size based on malloc size when allocating struct. VLA feature allows you specifying arbitrary non constant expression when declaring local stack variables with array type. Something like:
// VLA
int foo(int n) {
int array[n];
}
// flexible array member
struct s { int n; double d[]; };
struct s s1 = malloc(sizeof (struct s) + (sizeof (double) 8));
Flexible array members were implemented a long time ago. The oldest Visual C++ I have at hand is 2005 and it already has them (although only in C mode).
> I suspect it might have been a common idiom in pre-C99 days.
There is a Blog from Raymond on this[1]. The TL;DR: is that zero length arrays and FAMs weren't legal until C99. Not sure where the support from MSVC factors in. But that's the story and AFAIK he's sticking too it.
It's not really up to the curl people to rewrite it in rust.
It's for the proponents of rust.
Personally, I'm largely convinced that writing in rust will lead to safer implementations... now I'm just waiting for the full-featured, well-though-out, stable, useful rust implementations to appear.
I'm not married to curl. Where's "rurl" or whatever? If it exists and does something useful as well as curl (or better), I'll happily use it.
Well, curl is a bit of swiss army chainsaw for protocols and in general there is rarely a need to have same subsystem (of "downloading/sending something to an url) have support for protocols from HTTP thru IMAP all the way to LDAP.
So there is really no reason to use it Rust, use smaller, protocol-specific libs, less security problems and if you need multi-protocol support just switch by protocol and maybe write some wrappers.
And for calling it from C, "Rewrite it in Rust but make ABI C compatible" means you have to repeat a lot of Curl's idiosynchracies all while recreating all of the functionality. Looking at https://curl.se/docs/security.html there isn't that many of them that Rust safety would prevent so the real gains from that aren't probably all that great.
I just scrolled through the man page at a speed of 15 pages per second. Aside from that, curl is an extremely important and impressive piece of software that I use daily.
A much more interesting question is if you were writing curl today, what would you do. If the answer is still 'C89' then we as a profession have to wonder why - did we get it exactly right, and there are no lessons from the last 30 years, or is the fact there are no better alternatives deeply depressing.