Software like glibc is battle-tested---it is _widely_ used on hundreds of thousands of systems around the world, and has been used (though not at today's massive scale) for decades. I understand that glibc is under active development and there is a lot of new code, but let's keep this in perspective:
Writing a new implementation of a system is a huge opportunity to introduce bugs. There is focus on these specific problems, but in the broader scheme of things, glibc is remarkably stable, performant, and feature-rich. A new implementation will have bugs, and those bugs might be less likely to be caught simply because the system will not be as widely used for quite a long time. Even formally proven systems don't address flaws in the actual program specification. (See "The Limits of Correctness" by Brian Cantwell Smith for a good discussion). Also relevant (which I'm reminded of in part because of his recent death): Peter Naur's Programming as Theory Building.
So even _new_ code to glibc has the benefit of a huge community of both developers are users to eyeball it and test it out in production on a huge number of systems.
So rewriting glibc may solve certain problems, but it's bound to create a whole lot more, considering the narrow range of issues that are being focused on. New Rust code will have undiscovered issues too, even if they're not memory or stack related. I feel that this effort might be better spent fixing and finding problems with glibc---and continuing the development of tools to find those problems, to benefit _all_ of our old C libraries and programs---than rewriting for the sake of rewriting.
Memory issues have a tendency to be the most severe types of issues from a security perspective. (Not always, but with a high frequency.) Other issues that may occur with a rewrite may be security problems as well, but the security track records of network-facing software written in old-timey C speak for themselves: memory safety issues make up a huge fraction of their vulnerabilities.
> I feel that this effort might be better spent fixing and finding problems with glibc---and continuing the development of tools to find those problems, to benefit _all_ of our old C libraries and programs---than rewriting for the sake of rewriting.
So we've been trying that for the past 35 years since C was released in 1978, and we've completely failed to move beyond making the same basic memory management errors again and again and again. I think that if we couldn't do it in 35 years, we aren't likely to be able to by just trying a little harder this one time.
As long as the public interfaces are still unsafe, it doesn't seem to me that there is significant latitude to qualitatively improve the security of a libc implementation.
Of course there is. The glibc bug we're talking about would not have happened if libresolv were memory safe.
To be more precise: I claim that, for a library for an unsafe interface, if the ratio of library interface area over the library implementation volume is high enough, it is not worth re-implementing in a safe language. I also claim that a libc implementation is generally well above this threshold.
I agree that this might not be the case for selected subsets of the library like libresolv.
The vast majority of the memory safety issues here are not at the public interface boundaries.
This seems a bit counter-intuitive given that serious problems are still being found years or even decades later. Perhaps battle-tested is insufficient.
And while new code is likely to have bugs, assuming it's already based on the original C code it would be less than if it was re-written from scratch based on a spec. New code can take advantage of the battle-tested nature of the original code even if it's in a different language.
(It's a close cousin to the "don't rewrite it" argument we've all heard via Spolsky et. al., even though they've ended up rewriting their shit several times over).
There is a fundamental shift forward in the types of guarantees you can make with the premises of a language like Rust vs. that of C that I have simply never heard one of these demagogues address.
Bugs will always exist. But that statement will apply equally to any software, regardless of language.
> New code can take advantage of the battle-tested nature of the original code even if it's in a different language.
That's why I referenced the Peter Naur paper on Programming as Theory Building---in practice, that may very well not be the case.
I'd be curious to see some research suggesting that the prevalence and severity of those bugs is identical regardless of development tool/language/runtime, but I'd be surprised if that's demonstrable.
The question at hand is not "will safer languages eliminate all bugs?" It's "will safer languages reduce the prevalence and severity of important classes of bugs?" I'd wager it's probably yes, but even if you disagree, I don't think that it's reasonable to suggest that because there will always be bugs we should never improve.
I intended to convey that software written in any language will have bugs, not that all languages will produce the same types of bugs.
My argument is based on the act of rewriting it---regardless of language. Many languages provide excellent guarantees, but that does not protect against bugs in the implementation itself (logic).
Despite being "battle tested", all of these C programs continue to have both memory and logic errors. I think a rewrite would have the same rate of new logic issues after the initial code review and testing. "bugs will always exist" -- sure, so if we have something that eliminates a class of bugs, why not use it? The other classes of bugs will be there (and probably in the same force) whether you rewrite or not.
A lot of these bugs get introduced due to cruft in old code as well. So there are a bunch of tradeoffs here.
This is the black and white security fallacy. Memory safety problems are, statistically, a huge quantity of security bugs. By eliminating them you drastically reduce the number of bugs.
1. Memory safety issues are the cause of a very large number of security vulnerabilities (often most of them for projects written in C or C++, depending on the software).
2. Memory safety-related issues have a relatively high probability of being turned into remote code execution, which is one of the most if not the most severe outcomes.
3. C and C++ projects have been empirically observed to have orders of magnitude more memory safety problems than projects written in other languages do.
4. The additional classes of security vulnerabilities that managed languages tend to foster do not have the combined prevalence and severity of memory safety problems.
So, we would be better served in security by moving to memory-safe languages.
Note that this includes projects in languages that are themselves written in C or C++, which shows that there's some value in confining the unsafe code to a small and well-tested core library (in this case, the language runtime). Honestly, it seems like 50% of the value just comes from not using C strings, since pretty much every other language has its own string library that does not use null-termination.
1) Runtime overhead for some form of GC (D, Lisp, etc)
2) Rephrasing a program to satisfy a memory constraint checker (Rust)
3) Disciplined memory usage (i.e. Nasa C coding guidelines)
We don't have enough experience with 2 to indicate whether it will create new classes of bugs. We also don't understand the knock-on effect of managing memory differently - will functionally identical programs require more or fewer resources, more or fewer programmers hours, etc.
Rust may very well be the future, but we don't know for sure yet.
One thing we do know: options 1 and 3 have been available for years, but not widely utilized. What lessons can we learn from this fact to apply to Rust?
What classes of security bugs could possibly arise from Rust's ownership discipline?
Not all security bugs are related to memory. Many are related to improperly written algorithms (most crypto attacks), or improperly designed requirements (TLSv1).
Even Heartbleed was primarily due to a logic bug (trusting tainted data) instead of an outright memory ownership bug.
Does Rust automatically zero out newly allocated memory? Honest question, I don't know the answer.
Oh, also: If you're implying that Rust's ownership discipline can create security bugs where there were none before, I consider that a real stretch. I'd need to see an actual bug, or at least a bug concept, that Rust's borrowing/ownership rules create before accepting this.
Nobody is saying that Rust eliminates all security bugs. Just a huge number of the most common ones.
> Does Rust automatically zero out newly allocated memory? Honest question, I don't know the answer.
This is a problem that will be there equally in all languages
Perhaps less so in languages with a better type system, but that doesn't affect Rust since there aren't any _systems_ languages with a better type system.
The paper on the Limits of Correctness that I mentioned above does a good job at arguing my point. Even if you rewrote glibc in a language like Coq and formally proved its correctness, that doesn't mean that it's "correct" in the sense that its logic mimics glibc---there could be logic errors.
So you might gain confidence (or guarantees) in rewriting glibc in Rust, but in rewriting it you potentially introduce a host of new issues.
> Software like glibc is battle-tested