Less than a year after this was published, Tencent released the Magellan series of sqlite RCEs.
I think this is a fine page and it is eminently reasonable that sqlite remains a C codebase. In particular, I think he's right that rewriting sqlite in a memory-safe language would introduce a bunch of bugs and likely result in a couple of years of instability.
But the "security" paragraphs in this page do the rest of the argument a disservice. The fact is, C is a demonstrable security liability for sqlite. The real position of the project is that memory safety security vulnerabilities are an acceptable tradeoff for an otherwise reliable database engine; in practice, people will deal with the exposure either by treating it as an externality (ie: baking sqlite into products where it is directly exposed as part of attack surface, and then throwing up their hands and issuing patches when RCEs are discovered) or by carefully positioning sqlite so it isn't a meaningful part of the attack surface.
Both of these approaches are suboptimal --- that's why we call them "tradeoffs" --- and it is the case that if you held everything else equal (and you can't, but bear with me), sqlite would be a better piece of software written in, I guess, Rust; memory corruption wouldn't be one of the problems you need to consider (or blow off uncomfortably).
Again: the argument as a whole, and this page --- fine! I use and like sqlite.
I suppose that a version of SQLite written in a safe(r) language will eventually appear, and hopefully become popular.
But it will take a long time to mature. SQLite the project cannot switch languages, sadly. The only way to migrate is that another project would grow beside in the meantime, mature, and become a viable and compatible alternative.
I don't like C (correction: I love writing C; I don't like running code written in C), but I'd probably stick with sqlite for several years after the introduction of a competing Rust sqlike. But I also wouldn't expose sqlite directly to untrusted users (for instance, in the hellworld where I'm a designer on a major browser, I wouldn't make sqlite part of the Javascript interface of that browser).
But, tangentially, if not SQLite, what would you expose as a DB interface in a browser? SQL, while large and hairy, is both powerful and logical (in the relational part, not syntax). A plain KV store like dbm is no match to it. A Redis-like store is better but still more limited. Kdb's approach is sort of wonderful but geared towards time series and not much general-purpose. Is there an existing interface / language you would reuse to give the browser a rich database interface?
I don't think a rust version of sqlite would make sense, rewriting a very mature and well tested library rarely makes sense.
Only if either:
- there are anyway some major changes comming
- or major problems/improvements in maintainability and better external dev commitment/support
could rewriting it make sense IMHO.
Lets be honest many (most?) of the "recent" (~3years) security bugs sqlite had would most likely not have been prevented by using rust, go or similar.
EDIT:
I guess the biggest drawback of C is that it keeps contributors away, but I also might keep some of the contributors projects like sqlite might not want to have away. So depending on the maintainer it might be seen as a benefit not a drawback.
> Lets be honest many (most?) of the "recent" (~3years) security bugs sqlite had would most likely not have been prevented by using rust, go or similar.
Using SQLite's CVE list from 2020[1], we see 12 vulnerabilities:
* Two NULL pointer dereferences; it's impossible to produce a NULL reference in safe Rust.
* Two integer overflows; Rust makes these harder, but not impossible.
* Three UAFs; these are impossible in safe Rust.
* One uncategorized segfault; there are impossible in safe Rust barring environmental constraints (like a stack overflow from unchecked recursion).
* One segmentation fault from incorrect object initialization; this is impossible in safe Rust.
* 3 SQL and table-level bugs; Rust is unlikely to have helped with these.
By my count, that's 6/12 bugs are would be impossible in plain old safe Rust, and another 3/12 that would likely be prevented by normal best practices in Rust. I still don't think this necessarily means that we should drop everything and rewrite SQLite in Rust, but the raw numbers don't back up the claim that doing so wouldn't eliminate the actual security bugs that SQLite is seeing.
But would a sqlite port limit itself to safe rust?
(As a side note wrt integer overflows, rust only makes it easier to detect them during tests and provides neat methods for integer overflow aware code, but just that.)
Anyway I'm surprised that there where more "preventable" bugs then I expected.
> But would a sqlite port limit itself to safe rust?
I suppose they wouldn't have to, but why would they bother with unsafe? They proudly announce how few system APIs and syscalls they depend on, so they have no need for that (assuming they chose to not use any number of safe wrappers). Complicated self-referential data structures, perhaps, but that again is the kind of thing that could be exhaustively tested and tucked into a safe interface.
And yes, you're absolutely right about integers: Rust itself is not going to save you in release builds. But it does avoid a major source of overflows in C (implicit conversions and promotions), and has explicit, fallible APIs that are easy to enforce as a lint.
Wouldn't they need unsafe in order to interface with the outside world via their libsqlite persona? Which, I think, is how most of the world consumes their sqlite - embedded in some other language as a library?
You can export a C API in safe Rust. You might a small amount of unsafe code at any boundary where the C API takes pointers instead providing them, but that again is the kind of thing that can be safely encapsulated and tested.
You can for example have a C API accepting a Option<&mut A> if A is repr(c) and sized. This is possible due to the guarantees &mut A has about it's
representation (if A is..) and the guarantees `Option`
has about it's representation wrt. non-null values.
While such pattern exposes a often fully safe rust C-ABI API it has some drawbacks mainly that C is required to make sure the pointer they pass in is valid (null or aligned, allocated and initialized).
You can take this a step further an only accept a `&mut A` BUT that is a bit dangerous as now C calling rust with a null-pointer leads to unsound rust, as rust assumes &mut A is never null.
Another gotcha is wrt. owned heap allocated data, as you need to make sure the same alocator is used for allocating and de-allocating it. Again nothing new, but in C you often doge that issue. (E.g. if you allocate, and initialize a *A in C and then pass it to rust as a Box<A> you need to be very careful that you never drop that box in rust). Through Rust is currently getting support for keeping track of non-default/non-global allocators in the standard library, and you could always setup a global/default allocator which defers to libcs malloc/free.
The 3/12 that safe Rust would have prevented would likely have been prevented by enforcing the use of current tools for C. Since neither Rust safety or lint-ish safety are enforced, I think this is a reasonable comparison.
Show me a project that claims to consistently use static analysis tools for C and doesn’t ignore them, and I’ll show you a liar!
But more seriously: Rust’s toolchain comes with linting built in, and the community as a whole is much better about applying and responding to static analysis results than the C ecosystem is. And that’s even before we get to false positives, which (subjectively) C static analysis tools seem to spit out a great deal more often.
And I say all of that as someone who’s currently doing whole-program static analysis of C and C++! It’s not that you can’t do it, it’s degrees of ease and a culture of stringency that’s lacking.
Certainly! Lots of compiler churn means lots of downstream churn. However, for what it’s worth: I’m currently maintaining around 50k lines of Rust (both public and private code), and my experience with compiler changes has been remarkably painless: I get a couple of new warnings (usually from clippy) in the CI every few months, fix them, and move on with my life. Nothing ever fundamentally breaks: I could ignore the warnings and roll them all up into a single change every 6 months if I chose to.
I have used rust since the 1.0 release and while there had been some churn in the earlier releases even that was fast to fix.
By now it's nearly always new improved clippy analytics/warnings, but clippy is not rustc but a (the) external linker, so this isn't really rust compiler churn IMHO.
Even when there was a bit of churn in the early stable releases it was often remarkable easy/fast to fix.
If there where not trivial to fix things it was nearly always dependencies, not because they are bad but because if you have enough dependencies sooner or later one of them is no longer maintained or deprecates some functionality you need or do a new major release you should upgrade to. But even there it's much less painful then the experiences I had with e.g. npm and the JS ecosystem.
The only really painful thing I did run into was ring/rustls during the time ring yanked all previous dependencies no mater if they where broken or worked fine and still frequently updated some of their core parts. That was no fun, but fully the fault of ring, and it's no longer doing this by now.
>I guess the biggest drawback of C is that it keeps contributors away
There's this idea floating around in some circles that if you could just adopt technology X, more people would contribute to a project.
I myself was guilty of this ... believing that if we added a web frontend to the DAW I've worked on for 21 years, all those JS/webtech developers would show up.
There's no platform or language you could choose for SQLite that would increase the number of contributors. C is already a wildly popular programming language, with far, far more practicing users than Rust. Rust may be The Cool New Thing at present, and in time it may possibly grow to something much bigger, bigger even than the C/C++ universe. But that's not true right now, and it also wouldn't be true even if SQLite was implemented using some impossibly performant JS or its cousin.
Yeah that sounded way off base to me. Regardless of relative merits, how many rust developers are there that don't know C? Probably not a huge number. How many C developers have never seen a line of rust? Probably an order of magnitude more than there are rust developers.
> SQLite the project cannot switch languages, sadly.
I'm sensible to the issues of tradeoffs (e.g. platform support which is a completely fair issue) and manpower requirements and whatnot, but I don't see why it can not switch language. Converting libraries "inside out" has been done. Adding Rust support inside sqlite's build system and migrating modules is technically feasible (again, not opining on whether it would be worth it).
Adding Rust support would reduce (presently) the number of architectures and platforms that SQLite can target. That would greatly reduce its utility for a lot of customers. Once Rust supports the same variety of architectures that C presently supports this will become a non-issue, but that's unlikely to happen in the near term.
Ok, yes. SQLite can switch languages if the objective is to cutoff a large number of customers. Until Rust is well-supported on the variety of architectures that C is supported on, then that will be the result.
For rustc with llvm maybe, but there's projects to build a GCC backend for rustc, as well as projects to build a Rust frontend for GCC, both of which could solve this.
It cannot "switch" languages in a way like "we stop development in C and switch to development in XYZ only; C-based code will only get security updates".
If can switch the officially endorsed "primary" implementation, but only after an alternative implementation has been around for a long time and was battle-tested, all the while the original C implementation continued to exist and develop.
> It cannot "switch" languages in a way like "we stop development in C and switch to development in XYZ only; C-based code will only get security updates".
At a purely technical level (ignoring issues of platform support and all) it pretty much can do that, actually: integrate XYZ into the build system, build new features in XYZ, start converting old features to XYZ, end up with an sqlite in XYZ.
It is popular not because it's a good idea but because go: using cgo is a huge imposition, so recoding everything in go so it can be used from go is basically the norm.
That is useless to sqlite itself, this exists only because of go's issues and is essentially unusable outside of the go ecosystem.
> Dislike for cgo, is similar to the old JNI displeasure, nowadays plenty of Java libraries don't have any issues making use of JNI.
Issue’s cgo doesn’t just make the build more complex and more expensive (and yeets cross-compilation), it also makes programming way more difficult as go tooling doesn’t understand C and C tooling has no clue about Go (because non-standard calling conventions and insular tooling community and everything), and for all that cost you get little because the go/c call overhead is stupendous (still, though smaller than it used to be), and of course it locks up the scheduler (after a slice of the goroutine being blocked the go runtime will create a new one, but still…).
Absolutely not, java uses native threading so native calls don't generally break scheduling, and java uses native threading so while native calls have some overhead it's maybe 5x for no-op functions, it's nowhere near "two orders of magnitude".
I’ve noticed cgo dependencies massively slow down compilation.
E.g. the hugo project built with and without extended features. Build without and it’s all go, it compiles in the blink of an eye. The tooling in go, from a devops point of view, is surprisingly good. It’s not just compile speed.
Build with all the c deps and well you’re compiling more code so of course it’s going to be slower but it’s disproportionately so.
Presumably people would want to rewrite sqlite into Rust. But it's still a database, a low level, high performance software system. Even if you write it in rust, some percentage of the code will be unsafe rust or rust that calls a library written in C. It will be safer, but still not a panacea. It would be more viable in my opinion to improve the safety of the C code that currently makes up sqlite.
Let’s be generous and assume that 5% of the overall code remains unsafe. That’s 95% of the code that doesn’t need to be checked extra. Additionally, that 5% is likely to remain static. With C any “securing” effort is a snapshot effort that bitrots quickly as more code is added.
sqlite is renowned as a project that tests meticulously; they claim 100% branch test coverage, to the point where a reason they reject current Rust is that they can't achieve similar coverage against the branches the compiler itself inserts as safety checks (that is: they can't use Rust because they can't test safety checks that don't even exist in the language they use today).
And the track record of that approach is, well, right there for you to see.
The idea that what's needed for sqlite's C to be safe is more of the approach sqlite already uses seems pre-falsified, doesn't it?
I think 100% machine code branch test coverage is legally required in some of the environments that SQLite targets. That is, there must be a test that exercises both sides of every machine code branch instruction emitted in the final binary. Basically, every branch must have a justification for why it exists. I feel like that's not such a bad target for rust to aim for.
> sqlite is renowned as a project that tests meticulously...
Any language that can produce a C library interface identical to the existing one for sqlite3 would be a good candidate for a reimplementation.
If you show me a Rust library that passes all the sqlite3 tests flawlessly on my platform (typically x86-64) then I'd include that in my project, and sleep soundly at night afterwards. There might be problems, but the chances are very low.
Most other libraries don't nave nearly as good coverage, and it is much riskier to switch over an implementation.
> sqlite is renowned as a project that tests meticulously; they claim 100% branch test coverage
You realize that there are lots of test coverage metrics and none of them tell you anything about the correctness or safety of the code or about how thoroughly the edge cases have been tested, right? 100% branch coverage is neat and may indicate that the testing is thorough, or it might just indicate the project is chasing it as a metric. I’ll bet on compile time lifetime and ownership analysis guaranteed by the language every time over metrics of how good the test suite is (a test suite is important but classes of bugs are just impossible in Rust and don’t need testing/coverage in the first place).
> that is: they can't use Rust because they can't test safety checks that don't even exist in the language they use today
Can you back up this claim? This feels like a very FUD statement as any high level language (including C) has a risk that the compiler inserts branches that aren’t present in the code. Regardless, AFAIK, coverage instrumentation happens at a level below where the distinction between C and Rust matters, so any coverage should be the same. I buy the argument that replicating the test suite may be time consuming bit, but the claim that there’s something inherent about Rust preventing branch coverage feels extraordinary to me. Also, I’m not even sure where the extra branches are being inserted. Are you referring to drop statements the compiler injects for lifetimes?
> And the track record of that approach is, well, right there for you to see.
Constant CVEs and a fundamental inability to handle maliciously crafted files? I’m not shitting on the SQLite team. The project is amazing and a marvel. I’m just saying that its C heritage has some inescapable realities. Also remember that studies tend to show that the bug rate is pretty constant across languages in terms of LOC. This you want the language and stdlib doing as much as possible for you if you’re prioritizing correctness and security.
It is better, definitely safer. But it's not perfect. I write rust on a daily basis. I've written a lot of C++ in the past. It's safer, but I have memory safety issues in both. It's more a matter of degree.
For a large existing code base that are well written I think it's easier to improve the safety of the existing code than rewrite it completely in a safer language.
People talk about having memory safety issues in Rust and Go, and my general reaction is that these claims tend to be pretty artificial (for instance: they've managed to introduce concurrency bugs that abort their program). If you've got a war story about a security-relevant memory corruption vulnerability you managed to introduce into a Rust codebase (that wasn't in straight-up `unsafe` code), I'd be interested in hearing more about it.
My position right now, before hearing that war story, is that Rust vs. C is more than a difference of degree. It's a difference of degree in security writ large, to be sure. But for memory corruption? I'd say the distinction is close to categorical.
> If you've got a war story about a security-relevant memory corruption vulnerability you managed to introduce into a Rust codebase (that wasn't in straight-up `unsafe` code), I'd be interested in hearing more about it.
No, they're in unsafe code. Except Go where you don't get memory safety with some kinds of race conditions. I'm debugging a segfault in Rust today as it turns out. I guarantee it's in unsafe code. But I have some unsafe code. A database would also definitely have quite a bit of unsafe code.
It's a lot better than the situation in C. But rewrites always introduce bugs. And sqlite is so well written and tested, it's not a low quality code base. I don't think a rewrite makes sense here, it'd be better to improve the existing code to make it safer. That's my opinion anyway.
>> some percentage of the code will be unsafe rust
What would be the significance of that? Unsafe blocks in rust must still satisfy the borrow checker and all the compiler’s safety rules. The only things unsafe gives you are:
Dereference a raw pointer
Call an unsafe function or method
Access or modify a mutable static variable
Implement an unsafe trait
Access fields of unions
That's not a useful or constructive comment. At best it is, indeed, worthless; at worst, you're using your stature on HN to bully another commenter by implying that their opinion can be dismissed out of hand, because you, tptacek, say so.
I don't really agree with it either - it overstates the issues with occasionally using escape hatches in Rust by implying that they're at all comparable with the problems inherent with using C. But yours was still an unnecessary reply.
I don't think we understand enough about computer science or computer engineering (or something in the middle of those two things) to deliver large-scale C projects with rich, flexible interfaces safely. The last 20 years has just been a sequence of events where we've been surprised by new oversights, from buffer overflows to integer mishandling to uninitialized variables to UB. These problems compound; they're never solved, but rather beaten into development teams (at least the conscientious ones) and so even the decades-old problems recur, because it's not enough to know about the general pattern of a problem, you also have to viscerally understand all the combinatorics of those problems and all the new code that you write, all of which has the potential to create some new scenario that allows a well-known vulnerability to re-emerge. And even if you manage to clean the Augean Stables this way, you're still S.O.L. when the next new memory corruption bug class is discovered. A mug's game, a bad bet.
You can use formal methods to sidestep this! But my argument would be that at that point, you're really writing C In Name Only. By all means, if the aesthetics of Rust are that painful to you (I get it!), write C against a formal verifier. :)
I agree with this for the most part. But just because you use Rust doesn't mean you have no security issues - there are still plenty of ways to go wrong, including integer overflow if you're using "as" conversions instead of checked conversions. But there's no doubt it eliminates a whole class of issues and makes others much less likely (and verifiably so.)
For a greenfield project, you shouldn't use C or C++ today, unless you need some huge big C++ library (e.g. game engine).
For sqlite which has 100% branch coverage and is quite well written, I don't think a rewrite makes sense. You'll surely introduce more bugs, including security bugs, than you fix. The cost seems too high. It'd be better to just fix security issues as they come up (and try to clean up bits of code that could be problematic.)
The amount of code which would have to be `unsafe` in a library like sqlite would be absolutely minuscule if it would exist at all, and much easier to check for than literally all of the codebase?
Well, yeah, it's safer, there's no question about it.
Is it sufficiently much safer to warrant the massive investment and the bugs (including security bugs) which will result from a rewrite?
If I was the project manager I would not sign off on that. This is a volunteer project. Someone can of course do all that work free of charge, but they'd really just creating a fork of sqlite unless they also get most of the other major volunteers to agree.
> in practice, people will deal with the exposure either by treating it as an externality ... or by carefully positioning sqlite so it isn't a meaningful part of the attack surface.
I think you're missing one of the ways. The one where people make no deliberate attempt to engage with the risk at all. No big throwing up of the hands. Sqlite is deeply embedded software, it takes years for a whole generation of smart TVs and security cameras and cars to find themselves in landfill and put a decade of vulnerabilities to bed.
> In particular, I think he's right that rewriting sqlite in a memory-safe language would introduce a bunch of bugs and likely result in a couple of years of instability.
AIUI, the SQLite test suite always is praised for being especially expansive(complete?), so assuming it's not testing C-isms, which would be weird, then I would expect a rewrite to be less risky than trying a trick like that in most business software
Not sure if sqlite can have RCEs, it's an embedded sql engine, not a remotely accessible sql server, so presumably those RCEs are vulnerabilities in programs that expose sqlite to remote access.
> 2. Safe programming languages solve the easy problems: memory leaks, use-after-free errors, array overruns, etc. Safe languages provide no help beyond ordinary C code in solving the rather more difficult problem of computing a correct answer to an SQL statement.
Uhm ... If those were project wide "easy problems", then how come vulnerabilities in a project like Chromium are 70% caused by these "easy problems"?
I'd say with that category of bugs on board, you cannot mistrust yourself enough and that there is nothing easy about preventing these bugs all the time over the life-time of a project.
While there are a lot of good points there are some which are strange IMHO (wrt. the save language section):
> import complete binary SQLite database files from untrusted sources
Standard sqlite3 contains features which makes opening databases from untrusted sources quite dangerous and as far as I know there is no "un-trusted" open mode, through you might be able to compile a hardened/restricted sqlite. Either way
it's true that using a "safe" language would not have
helped here.
> Safe languages insert additional machine branches to do things like verify that array accesses are in-bounds. In correct code, those branches are never taken. That means that the machine code cannot be 100% branch tested, which is an important component of SQLite's quality strategy.
But sqlite favors the use of asserts and what this languages insert are basically asserts...
> Safe languages usually want to abort if they encounter an out-of-memory (OOM) situation.
This might be true about go (idk.) but isn't this generalizing a bit to much?
Just to be clear that sqlite is and stays in C makes totally sense I just feel the author tries a bit to hard to find arguments beyond the necessary ones.
I mean:
- When it was written there where no chooseable "safe" language alternatives.
- Sqlite is extremely well tested, many (most?) of the recent bugs where not of the kind any safe language would have prevented.
- Safe languages don't necessarily help you with avoiding logic bugs. And while many do have additional abstractions/tooling to help with preventing logic bugs rewriting sqlite is more likely to introduce more logic bugs then it would prevent in the long runs. Rewrites are always a good chance to both fix bugs but also introduce new bugs.
- Sqlite is extremely portable non of the safe languages have quite that level of portability sqlite wants to provide.
Its a well know cliche at this point for pretty much every program to be rewritten in the "safe" language of Rust. But it makes sense for C++ be in there too, before it was the Rust people wanting to rewrite everything, it was the C++ people wanting to rewrite everything.
It has been like that since the 80's as UNIX was gaining market share, but as Rust is gaining ground on that effort, it is easy to shit on the effort.
Yes, even with its C underpinnings, C++ is better than plain old C, provided the safer types for arrays, strings and RAII resource management are used.
Hoare was already complaining about C on his 1980 Turing Award speech.
Problem with C++ has never been that it's "bad" or whatever. I worked with people who did stuff all over the axis (back when I was doing C/C++, which is the interval of 13 to 20 years ago): from drivers to Windows 98/XP MFC GUI programs, and the complaint always was that hey, there is yet another C++ version that fixes old problems but offers no way to (semi-)automatically migrate the existing code to the newer and better/safer API.
I ain't here to shit on C++; I've looked at v20 and I liked what I saw. But the problem remains: there are tens of millions of coding lines written in older style C++ that are never going to be rewritten in a more modern version of C++.
We might as well just rewrite them in Rust, or, if a better systems language with the same (or better) guarantees emerges, rewrite them in that. I in particular don't think it has to be Rust at all costs; I just happen to think it's the best mix of a systems and application language with the borrow checker and (mostly) fearless concurrency -- at least as of today.
---
A small aside:
IMO many people deliberately don't draw a distinction between loud fanboys and people who legitimately and objectively evaluated Rust and found it a net win for their needs. In many HN discussions only the first group is mentioned, the second one almost never.
That's definitely not a fair representation of the Rust community. All programming languages have loud obnoxious fanboy minorities and I don't understand why Rust is being shat on regularly (here on HN) for something that exists everywhere. I can only theorize many people got rubbed the wrong way at the start and then don't care to make an objective analysis afterwards.
A) I don't think this is needed you could just pin a older rust version. But if it's needed we are not quite there yet.
B) I think that point has been meet.
c) Depending on the definition of "obscure" this has been meet, but given that sqlite only requires very little to make it run I guess the for the appropriate definition of "obscure" it is not fulfilled quite yet.
D) This still needs a lot of work even just for line coverage the tooling isn't quite up to my standards tbh.. While my standards are pretty high, I fear the sqlite standards might be even higher.
E) It's kinda meet but needs a bit more time to mature.
F) I think this was more or less already meet in 2017, if not then it's by now.
> A) I don't think this is needed you could just pin a older rust version. But if it's needed we are not quite there yet.
The problem I think is that SQLite is also used in embedded systems, they need to be predictable (for example, for all of its flaws C89 is still used in SQLite). So unless that there is a subset that is that stable, they won't move yet.
> So unless that there is a subset that is that stable,
This is kinda a point of rust that every compiler can compile code of all previous versions up to rustc 1.0.0.(Sure it isn't perfect, there was some fallout especially during the earlier rust days.)
So you could require language/library compatibility with e.g. 1.55 while any compiler >=1.55 _should_ be usable/compatible for building it. (You still would need the 1.55 compiler for development to make sure you stay compatible.)
Now I guess there are still problems with that I guess and what they would like is probably some form of rust LTS version. Theoretically there is little problem with bumping the minimal rust version every year or so (due to the compatibility guarantees) BUT on embedded you might not have "proper" upstream LLVM support which would mean that when rust bumps LLVM you might lose rust support, I guess.
I'm referring, of course, to those embedded versions. I've dabbled into Rust and Go, and they are certainly nicer than C in non-trivial programming, but knowing how slow everyone in industrial systems adopt something, I'm not holding my breath. Maybe a SQLite for "normal" applications (the real-world definition, not apps)?
> Rust needs to mature a little more, stop changing so fast, and move further toward being old and boring.
Didn't make any sense at any point post 1.0. Rust 1.0 code still works today modulo BC breaks required to fix unsoundnesses.
> Rust needs to demonstrate that it can be used to create general-purpose libraries that are callable from all other programming languages.
Also didn't make much sense at any point as the story there was always straightforward. But efforts like librsvg have pretty much demonstrated that (interestingly the libsrvg conversion effort started around the time that page was created, circa 2017).
> Rust needs to demonstrate that it can produce object code that works on obscure embedded devices, including devices that lack an operating system.
Rust is used in a bunch of embedded contexts. Whether Rust can produce object code that works on your embedded device is a more debatable question, depends on the existence (and quality) of the proper llvm backend.
I think there's also a gcc frontend in the works, but I expect it's essentially nowhere yet as it was only just started (few months old I think?). Though I believe it has financial support and a fair amount of manpower. I believe there's also an even more recent effort for a gcc backend in rustc.
So yeah this one I'd say there's limited progress yet but things seem to be moving in the right direction and picking up.
> Rust needs to pick up the necessary tooling that enables one to do 100% branch coverage testing of the compiled binaries.
Unclear what the issue is there so no idea.
> Rust needs a mechanism to recover gracefully from OOM errors.
That was always possible by working no_std, though of course required reimplementing your own abstractions.
With the linux kernel integration effort, a lot more work is going into "fallible allocation" APIs, and thus the ability to gracefully recover from allocation failures.
> Rust needs to demonstrate that it can do the kinds of work that C does in SQLite without a significant speed penalty.
The branch coverage thing is a weird, artificial-seeming requirement that all the branches in the compiled code --- not the code as written, but the code ultimately produced by the compiler --- be testable. In other words: if the compiler generates a bounds check anywhere, it should be possible to test what happens when that specific bounds check fails. The problem is that sane Rust code doesn't give you all the tools you'd need to deliberately trip all the checks the compiler generates, because that is part of the point of being a safe language.
> The branch coverage thing is a weird, artificial-seeming requirement
Yes that's because it's a requirement designed by a standards body. I found a paper "Is 100% Test Coverage a Reasonable Requirement? Lessons Learned from a Space Software Project" (2017) that mentions that 100% branch coverage is a requirement in European Cooperation for Space Standardization (ECSS) for Class A software (where failure could result in loss of life, etc). The paper concluded:
> Our findings include that there seems to be a break-even point between 80% and 95%, and everything beyond this points is increasingly costly and could introduce new project risks—which confirms findings reported so far in literature (Section 5). However, the interview revealed that, still, 100% coverage can be a reasonable quality requirement; even though a 100% requirement is not a good indicator for the software quality as such.
It doesn't follow that because rust is a safe language that it cannot expose test harnesses that would enable 100% branch coverage. Personally I'm ambivalent on this, maybe it's not useful, but it doesn't seem bad either. But your reaction to this requirement seems weird, like the "fox and the grapes" fable... You can't get 100%-branch coverage, so you give up and claim that 100% branch coverage is dumb, why would anyone want that anyway? Do you really think that 100% branch coverage testing should be unavailable to rust programmers if that's what they need (for whatever reason, including meeting some admittedly arbitrary standard)?
A couple of people have brought this up here, and it's an argument that makes sense. I'll just note that the sqlite page, which is what I'm critiquing, isn't written this way; the project doesn't say "we use C because ECSS requires us to build software in an 100%-branch-coverage language", but rather speaks to the important benefit of literal 100% branch coverage. It's that important benefit I question, not the logistics problems they face, which I concede.
That might just be me twice, I added a new reference this time at least ^_^ I agree that 100% branch coverage as a goal in and of itself is generally dubious.
I seem to recall an interview where the SQLite creator, Richard Hipp, described the reasons behind the branch coverage testing in a bit more detail and mentioned that it was a requirement from one of their customers, which is where I got that idea. Sorry I don't have a specific reference.
Right so really more of an expressivity issue: the compiler is not smart enough to remove all branches which can not happen in a given program, so some of those branches will be completely untestable despite being in the final object code e.g. have a vec![_;4], structurally use it such that the index can only be in-bounds, the compiler may not be able to elide the OOB checks because it might not understand they're unnecessary for real-world code.
It's substantially less silly than a small-time JavaScript component library adding 100% branch coverage testing requirements as a blocker to accepting a PR. But they do it for the same reason, to be able to advertise it and demonstrate reliability. This is how the sqlite project makes money. I guess someone's got to build the instrumentation tools that let them keep doing this, sounds like it won't be them. (Edit: not sure who else would have the motivation, to be honest. If they had to pioneer one thing, it should be that.)
> depends on the existence (and quality) of the proper llvm backend
LLVM's support is really quite lacking that this issue alone is enough of a justification for a project like SQLite to be written in C. The rust-gcc effort you mentioned will hopefully solve this.
Quite a few of the others really are they just don't want to be pioneers which is really quite fair.
With the discussion of getting Rust into the Linux kernel, I think there's more interest in graceful recovery from OOM errors. The new Allocator API will (maybe?) help this.
The Allocator API is about the ability to mix allocators and provide "precise" (per-object) allocation strategies. That's orthogonal to fallible allocation, which is mostly about adding fallible versions of possibly-allocating APIs, and being able to statically remove access to the non-failing one (and being able to implicitly reject any dependency relying on those APIs) (and / or providing alternate implementations which expose a fallible API which is what fallible_collections does, but the stdlib seems to have gone with adding fallible APIs and probably adding a compiler feature / flag to be able to disable the non-failig ones)
For some more detail, sqlite has makefiles for things like Windows CE and VxWorks, and the generic configure process builds on almost anything else sufficiently POSIXY, like QNX.
When I see projects like this, with such impressive range of supported platforms, I am always a bit weary at how well tested those platforms effectively are. I know that some bugs have been caught in OpenBSD due to some of the more esoteric platforms making evident some incorrect assumptions, but I also remember Debian boasting a huge amount of packages for, let's say, ARM that would build, but any attempt to use would show they had never been tried out and were wholly unsupported.
True, though sqlite is pretty universally lauded for their approach to testing. Perhaps not all platforms are tested by the Sqlite team, but if you run their tests on your platform, the coverage is pretty good.
Won't rustc_codegen_gcc effectively solve arch support? It's not done and shipped, but it exists and is usable and is landing into rustc, which is a fair amount of progress.
(Or is SQLite portable to architectures that GCC does not support?)
(Yeah, there are some microcontrllers that can only be programmed by the manufacturer's C compiler, that doesn't even support the entire language and is full of bugs. But I would be very surprised if SQLite run on a PIC.)
IIRC a possible issue is the use of forks of gcc (or something else), I think it used to be common for console toolchains though maybe less so these days.
While I know it's a joke it's still funny on so many levels because it could be true: an interview with Stroustrup claiming that he invented C++ to be purposely hard to preserve the mystique that programming was hard and keep programming salaries up:
The real reason why Bjarne invented C++ was not having to deal with BCPL or C.
After his experience getting his thesis ready in Simula, only to rewrite it in BCPL, he swore himself not to put himself ever again through similar pain.
So after working for a while at AT&T, C with Classes was his way to avoid dealing with raw C, while having some of that Simula productivity.
> Safe programming languages solve the easy problems: memory leaks, use-after-free errors, array overruns, etc. Safe languages provide no help beyond ordinary C code in solving the rather more difficult problem of computing a correct answer to an SQL statement.
I see where the author is coming from, but I don't think this is quite true. The way that safe programming languages work is that they have a richer type system that knows about the semantic context of variables, which in turn is a tool that helps a lot with the "more difficult problems".
For instance, one of the tools Rust uses for enforcing memory safety (data races and use-after-free, in particular) is that there's a distinction between "mutable" and "constant" references. But this is, really, a distinction between unique and shared references. If I am statically guaranteed the only holder of a reference to X, I can modify it; if some other part of the code might have a reference to X, I cannot.
This is essentially a readers/writer lock enforced at compiled time, and it therefore is a pattern that makes it much easier to use actual readers/writer locks: the lock-for-read function gets you a shared reference and the lock-for-write function gets you an immutable one. And Rust makes it easy to say, you cannot unlock the lock (in either variant) until you return the reference, and you cannot accidentally leak the reference out of the scope of the lock.
If you're using a readers-writer lock on, say, the schema of a table (many simultaneous readers can use a table, but only one task can alter the table and nothing else can touch the table while it's being altered), having the tools to meaningfully distinguish the cases and enforce that your mutable references don't get copied does actually make it easier to compute the correct answer to a SQL statement that's running at roughly the same time as an ALTER TABLE.
Another of the tools Rust uses is tagged unions: a C construct like union U {char x; int y;} would be memory-unsafe, so Rust is obligated to forbid it in safe code. Instead, you get an enum type that C would describe something like struct U {int tag; union {char x; int y;};}. If tag == 0, then you're on the first variant; if tag == 1, then you're on the second variant, and the compiler ensures that (in safe code) tag is never equal to anything else, never uninitialized, etc. And there are a bunch of language constructs to assist with this - for instance, the 'match' keyword lets you write cases for each variant, allowing access to x only if tag == 0 and y only if tag == 1. You're not allowed to access x or y directly at all outside of a match keyword or equivalent (like 'if let'), because that would defeat memory safety.
But because you've got good syntax and compiler-checked support for handling tagged unions, you may as well use it for problems even if you don't care about memory safety / security. Take this union from the SQLite source code, for instance: https://www.sqlite.org/cgi/src/file?ci=trunk&name=src/sqlite...
There are three types of tables (eTabType), normal, virtual, or view. There are three union variants with different data. Even ignoring the fact that some of the variables are pointers and others are non-pointers, the data doesn't make semantic sense when interpreted as the wrong variant. If you have code that handles just a normal table, and you extend it to handle virtual tables or views too, you will need to make sure you're not unconditionally accessing addColOffset, pFKey, or pDfltList, because the information you get will be wrong. Rust's enforcement of memory safety means it also prevents you from making this logical mistake.
I think we've been selling memory-safe languages as a tool for security, and I don't mean to detract from that argument at all - but there's also the fact that Rust and Go (and D and Vala and even modern C++) are newer languages that have been able to implement more things than C can, which in turn makes it easier to write correct programs in general.
There are plenty of other examples when one looks for what was happening outside AT&T.
C apologists like to pretend C is some kind of special gift to mankind in systems programming languages, and nothing was happening before C and UNIX came into scene.
All it would take a "C apologist" to explain why they don't want to use JOVIAL for example is that it is cumbersome to type programs in uppercase.
The only one I've ever seen talking of a "special gift" was you; most C programmers I know are very honest about the fact that C sucks in some ways. The typical experience is just that it sucks less than many others. What I appreciate most is that it gets out of my way. I don't want to be fighting a language, I want to concentrate on the problem I'm trying to solve.
What I appreciate most is that it gets out of my way. I don't want to be fighting a language, I want to concentrate on the problem I'm trying to solve.
Please don't take this the wrong way but I read the above as: "leave me alone, I want to introduce more buffer overflows and integer overflows in peace". Truth is, C programmers are not as infallible as many of them believe, and that's sadly a historically proven fact.
At least the above part of your comment makes it sound like you don't understand what languages like Rust and Haskell bring to the table.
Was that your intention?
Additionally, "fighting a language" can mean dozens of things but at least in Rust, getting your program to compile guarantees that a whole class of bugs is now impossible (unless you actively subvert the Rust compiler in your code which some people successfully do; but that's still easier to audit and account for compared to the tooling of C).
Many find that peace of mind and compiler guarantees valuable. It seems that you don't?
> At least the above part of your comment makes it sound like you don't understand what languages like Rust and Haskell bring to the table.
> Was that your intention?
The arrogance in your post is over the top.
> "leave me alone, I want to introduce more buffer overflows and integer overflows in peace". Truth is, C programmers are not as infallible as many of them believe
One of the two possibly, but certainly not both. Or, "leave me alone, I just want to get my program done, and have it do the right thing, with appropriate performance, quick to build, good modularity, few dependencies, and not having to rewrite it next year. Security is important but does not override all of these other things."
Many find these practical properties and the fun of forming something with their bare hands that works more desirable than burning out while waiting for a build to complete or doing the required maintenance when upgrading dependencies. It seems that you don't?
> "leave me alone, I just want to get my program done, and have it do the right thing, with appropriate performance, quick to build, good modularity, few dependencies, and not having to rewrite it next year. Security is important but does not override all of these other things."
Virtually all of these are debatable, not necessarily factual and definitely not at all monopolized by C -- that's the overarching discussion in this entire HN comment section.
...So writing stuff in Rust means the program has to be rewritten in the next year? Factually not true. What does twisting and misrepresenting things win you here? That's not arguing in good faith.
Also, "doing the right thing" is rich coming from practitioners of a language with a ton of undefined behaviour and zero memory ownership checks. I'm sure you have the best of intentions but there's only so much you can do when the language and the compiler itself are willing to pull part of the rug from beneath your feet. That's why I abandoned C a long time ago.
History is not on your side either, with all the yet-another-buffer-overflow submissions here on HN for the last 2-3 years. Even today we had another part of OpenSSL crapping the bed through a buffer overflow, yet again.
So can Rust programmers make mistakes beyond the memory ownership? Of course! Nobody says the language is immune to bugs in general. What a lot of people say -- and it gets grossly misrepresented at every chance that biased people get! -- is that Rust is immune to a class of bugs. Nothing else.
(As for "forming something with your own hands", I think you should pick up pottery or carpentry as a hobby. There's nothing about "your own hands" in computers, we all step on decades old legacy and nothing is truly built by us alone.)
> (As for "forming something with your own hands", I think you should pick up pottery or carpentry as a hobby. There's nothing about "your own hands" in computers, we all step on decades old legacy and nothing is truly built by us alone.)
I mean something like this for example [0], which I just built for experimentation. It's a little GUI basically without dependencies (apart from the actual GUI code it's only some Win32 junk to paint colored rectangles and some text... this stuff is easy to replace with SDL or whatever backend works). The GUI has moveable / resizable windows, buttons, sliders, a simple layout method... The logic is super easy to extend, not the typical message handling mess that is seen in traditional GUI toolkits. Each widget is basically a 0-1 structs and 0-2 functions.
This approach does have value. Not only does it make me happy - it gives me performance, flexibility, portability, modularity, build speed, control - each and every pixel coded on my own ... so much good stuff.
I get the appeal for coding something with minimal dependencies.
I do however consider it a hobby and I don't dare imposing that approach in my paid work, which inevitably has a number of limitations -- mostly time / schedule. That makes coding dependency-minimal projects a rare luxury since they often require love and care that don't conform to tight schedules.
If you are in a position where you can do that for money then it's quite the privileged position! And you should be a very happy guy having it.
I keep hoping that such projects will eventually find their way into the mainstream and show the rest of the world how wrong it gets a lot of things. Here's to hoping that it will still happen one day.
For better or worse, I do in fact get paid to do these experiments. One of my employers has this huuuge mess of MFC C++(98) code, with a lot of redundancy and repetition in the code. It's unmaintainable and we are in fact in need for better architecture.
And while much of the problem is due to historical accidents and lack of experience from some former team members, a lot of the problem I ascribe actually to MFC itself and more generally to the traditional GUI approach - Widgets in classes, code sharing using inheritance / method overriding and using other approaches as well. The main issue lies in the premature objectification (compartmentalization) of things, which prevents sharing data => it's hard to share handlers, which means there will always be a huge overhead from communication and bureaucracy and duplicated implementation efforts.
So here you have it, one example where a "feature" (OOP syntax) leads to bad choices (making up "messages" unnecessarily, using methods to implement per-message handlers), leads to an excess of "fibers" and in general of badly aligned structure in the application, and that leads to long term pain.
While with the write-from-scratch approach and mindset, there is a certain upfront cost, but an actual chance to acquire a deep understanding of the problem domain and to avoid the long term pain from ill-suited structure. In the end, to get the architecture right, the syntax doesn't matter almost at all. Once a problem is thoroughly understood and a suitable data architecture devised, computer programming is only a question of simple data copies and adds and multiplies. For example, in my code, there is basically no dynamic memory allocation, and there are no messages and no callbacks. C is a good language to program simply, it provides a lot of incentive to do so.
The issue described above is more general, it doesn't only apply to OOP syntax. Many language features are just canned solutions to do memory allocation, ownership tracking, message passing, threads organization, polymorphic dispatch... The problem is always, eventually, that one size does not fit all. And in fact, most language features seem to be optimized not for the most common situations but for small tutorial examples.
The particular code I linked above cost me 2-3 days of work (I had prior experience in the domain), which is miniscule compared to the time that is otherwise spent in a project on business logic / features.
At the top of this thread, I set out a number of reasons why
a) you are not "fighting a language" by using Rust or Go or Haskell or whatever - in fact you are more able to focus on the problem you want to solve
b) there are many reasons to use such a language, not just security, and all the things you mention - correctness, performance, development velocity, modularity, good design, and robustness to new requirements - are aided by features in newer languages
'pdimitar is making a reasonable argument and your dismissal of it as arrogance is uncalled for. I would encourage you to reread both of those comments carefully.
> 'pdimitar is making a reasonable argument and your dismissal of it as arrogance is uncalled for. I would encourage you to reread both of those comments carefully.
... No? It was clearly bordering into personal attack territory, suggesting lack of experience or intelligence as a reason for supposedly not seeing something obvious.
> there are many reasons to use such a language, not just security, and all the things you mention - correctness, performance, development velocity, modularity, good design, and robustness to new requirements - are aided by features in newer languages
Just... no? For example, more features => less orthogonal features => less interopability, worse modularity. You can see this easily for example when C++ programmers construct 5 versions of trivial APIs (like function/method signatures). As you wrap everything in more and more complicated types - types that do more than just hold data, like allocating memory, constructing, moving ownership, thread synchronization, being optional, offering various implementations for various traits, and so on... stuff is simply less interoperable.
Also, many programming languages, like C++, Swift, Rust, Haskell... are not exactly known for their build speeds.
Haskell for example in my case, I've tried really hard to like it for quite some time, but never felt productive in it except for end-in-itself kind of mind games trying to (unsuccessfully) boost my ego. One of the last things I remember trying with it is building a plain OpenGL Program (like, Tetris level) and I can remember the build for this single file program taking 10 (or was it 40?) seconds, on each and every incremental change in a program that was a couple hundred lines.
Rust I've never touched, except one stint where I unsuccessfully tried to build a high-profile project, finding in the end that it was mostly snakeoil (after literally compiling 500 dependent packages).
Time to broaden your horizons, I have met plenty of C advocates and UNIX books that tell the story before C there was only Assembly as system programming language.
If they actually acknowledged C sucks, lint would have been long adopted instead of being ignored since 1979 until clang tidy was born, and something like bds for strings and vectors would have been long part of the standard library.
>Time to broaden your horizons, I have met plenty of C advocates and UNIX books that tell the story before C there was only Assembly as system programming language.
Without plenty of examples, one can only assume what you've experienced is outlier. I can't imagine any significant number of experienced software developers think that. Hell, other systems languages born before C are still being discussed today; go back in time and it only becomes more prominent. Unless a developer has been significantly sheltered, your assertion just isn't reasonable.
I think this is a fine page and it is eminently reasonable that sqlite remains a C codebase. In particular, I think he's right that rewriting sqlite in a memory-safe language would introduce a bunch of bugs and likely result in a couple of years of instability.
But the "security" paragraphs in this page do the rest of the argument a disservice. The fact is, C is a demonstrable security liability for sqlite. The real position of the project is that memory safety security vulnerabilities are an acceptable tradeoff for an otherwise reliable database engine; in practice, people will deal with the exposure either by treating it as an externality (ie: baking sqlite into products where it is directly exposed as part of attack surface, and then throwing up their hands and issuing patches when RCEs are discovered) or by carefully positioning sqlite so it isn't a meaningful part of the attack surface.
Both of these approaches are suboptimal --- that's why we call them "tradeoffs" --- and it is the case that if you held everything else equal (and you can't, but bear with me), sqlite would be a better piece of software written in, I guess, Rust; memory corruption wouldn't be one of the problems you need to consider (or blow off uncomfortably).
Again: the argument as a whole, and this page --- fine! I use and like sqlite.