
Why Is SQLite Coded in C? (2017) - jeffreyrogers
https://sqlite.org/whyc.html
======
jerf
I've commented several times before that I consider "C" and "C with analysis
backing" to be in practice two different languages. SQLite is the latter:
[https://sqlite.org/testing.html](https://sqlite.org/testing.html)

Writing SQLite in "plain C" without all that would, well, simply not work.

I agree that "C with analysis backing" is the best language for SQLite right
now. However, it should not be used as an example of "How C is a great
language for programming in" unless you are also planning on skillfully using
a very significant subset of the tools listed in that document. SQLite doesn't
prove C is great; it demonstrates how much additional scaffolding is necessary
to wrap around C to get that level of quality, and given the quantity and
diversity of tools we are talking about, it is not particularly complimentary
to "plain C".

~~~
pcwalton
I agree. SQLite gets away with using C because it literally uses military-
grade levels of verification. As John Regehr pointed out, SQLite is quite
possibly the _only_ piece of software that goes to that level of validation
and testing without being required to by law.

It's not just a matter of skill, either. The cost in terms of money and time
needed to develop software in that way is completely impractical in almost any
commercial scenario. Aside from some very specific situations, it's not an
economically viable way to produce software.

~~~
bsenftner
it is not economically viable for entertainment or disposable "apps", but
extremely required for any serious, mission critical software. Seriously, the
comments here betray how many people are in disposable software careers.

~~~
pcwalton
The comments here betray how much of the software _economy_ depends on
developer productivity. The fact is that SQLite style verification is not
practical for almost any software, very much including "mission-critical"
software.

~~~
phaer
Yes, but look how much of the _software economy 's _infrastructure* depends on
underfunded products. OpenSSH, GnuPG and OpenSSL are just 3 projects which are
installed on pretty much every Linux server on the internet, including the
servers of billion-dollar-businesses. It got a lot better in recent years, but
still: Quite a few economically viable software companies just depend on free
labor for mission-critical software product which take a lot of resources to
become solid.

And while we are at it:
[http://www.openbsdfoundation.org](http://www.openbsdfoundation.org)
[https://gnupg.org/donate/](https://gnupg.org/donate/)
[https://www.openssl.org/support/donations.html](https://www.openssl.org/support/donations.html)

------
rwbt
My favorite aspect of programming in C is how little it gets in the way of the
programmer. Once you become familiar with the language there is not much to
lookup in the docs because it's such a small language. I might end up writing
more verbose code, but it's usually very clear to me. You still can write
spaghetti code but that's besides the point.

~~~
ethelward
I'm not sure I get your point. You say that C is a tiny language, hence
leading to small docs, but that can be said of every language? Except that
they may have larger standard libraries, but nothing stop you from using a
subset of them.

~~~
forgot-my-pw
The language specs is 700-page long, I wouldn't call that tiny:
[http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf](http://www.open-
std.org/jtc1/sc22/wg14/www/docs/n1570.pdf)

Sure, C++ is double of that, but it's still long. In comparison, Java language
specs is around 800 pages, Ruby 330 pages, and C# 500 pages.

~~~
Skunkleton
The length of the spec isn't really relevant is it? Or are you implying that C
is about as complex as Java?

~~~
pcwalton
C is significantly more complex than Java. The memory model of Java is simple
(multithreading notwithstanding), because it's memory-safe and garbage
collected. Not so for C, which has very subtle rules, which result in
undefined behavior if broken.

~~~
boomlinde
The question here isn't whether it's easier to write correct C programs than
Java programs, it's whether C is a simple language.

Subtle UB, as much of a headache as it can be to a programmer _using_ the
language, in this case leads to simpler implementation of a compiler.

~~~
pcwalton
No, it doesn't. Maybe it did in 1978, but modern C compilers have to go to
heroic lengths to do things like alias analysis that would be easier if the
aliasing rules of C weren't so subtle. In Java the question of "can pointer A
alias pointer B" is simple (unless typeof A derives typeof B or vice versa,
the answer is no). In C, due to TBAA, it's hideously complex.

~~~
jcranmer
If you look at software engineering and static analysis papers for a language
like Java, there's often a discussion about using stuff like 2-object, 3-call
site context sensitive alias analyses built on more or less fairly
standardized Datalog rules. If you look at C or C++, the response is generally
"you want a precise alias analysis? _fall down in riotous laughter_ ".

The rules for TBAA are complex, and many C/C++ programs violate those rules
because it's so hard to actually make sure you're not violating them, and half
of the purpose of using C/C++ is actually to be able to do the kind of type-
punning that TBAA prohibits.

------
qwerty456127
C is great and SQLite seems doing amazing leveraging its power. Yet I bloody
wish there were alternative (yet 100% compatible on the file format and SQL
dialect levels) SQLite implementations native to specific languages/platforms
(e.g. .Net). In just so many cases the DB speed is the least priority while
simplicity and portability are valued much higher (in these particular cases).
Every time I build a humble desktop .Net app featuring an SQLite database it
turns into in a ridiculous "try this, google a lot, try that, repeat, .... oh
wow, it works!... oh, it doesn't" nightmare and even if it actually starts
working nobody really knows how to deploy such an app on another computer
(especially if it runs a different kind of OS) correctly.

SQLite is arguably the best of what has happened to the world of
desktop/embedded databases ever yet lack of alternative (not necessarily this
fast) implementations in high-level languages is its main problem IMHO.

~~~
laumars
Ive used sqlite a lot in recent years and found the issue you describe to be
largely one that only affects Windows deployments. On Linux, FreeBSD, Solaris
and OS X builds it's all just worked for me. But the same build scripts will
fail for a multitude of subtle ways on Windows. IIRC (I've not targeted
Windows much recently) I've found the easiest way to get it working was just
to have mingw (or similar POSIX layer) and build sqlite from within that.

~~~
WorldMaker
As with most things, your mileage varies a lot based on what you are used to.
The previous poster mentioned "cross-platform" and I think that is where the
most headaches lie because you might know one permutation, but not all of
them.

For instance, Windows binary deployments are quite predictable (and often dead
simple; so easy a lot of Windows installers get it wrong and yet still work),
but on *nix you may have to fight distro differences in libc or get a dozen
different answers on what you should statically link in your compile versus
what you "must" dynamically link and a half-dozen different
installers/bundlers/distro tools to deal with making sure the shared libraries
are both installed and correctly linked to. What "just works" is often
relative and the magic is very easily dispelled in that terrible state of "why
isn't this working?", especially if it "just works" like magic for other
people.

For what it is worth, the Windows UWP platform even bundles a shared SQLite
install (as like Android and iOS) and using it is quite easy in a bunch of
languages (just not easily portable cross-platform).

~~~
laumars
> For instance, Windows binary deployments are quite predictable (and often
> dead simple; so easy a lot of Windows installers get it wrong and yet still
> work), but on _nix you may have to fight distro differences in libc or get a
> dozen different answers on what you should statically link in your compile
> versus what you "must" dynamically link and a half-dozen different
> installers/bundlers/distro tools to deal with making sure the shared
> libraries are both installed and correctly linked to. What "just works" is
> often relative and the magic is very easily dispelled in that terrible state
> of "why isn't this working?", especially if it "just works" like magic for
> other people._

That's only if you're working with C / C++ though. Switch to Go, Java or
basically nearly any other language, and most of your cross platform headaches
go away.

Though for what it's worth, half those problems aren't really problems you'd
generally have to get your hands dirty with as your tooling and distro should
manage that for you (just so long as you write POSIX code that is). I'll grant
you I've not done anything too complex in C++ but what I have done was
portable between Linux, Solaris and FreeBSD (the 3 systems I needed those
POSIX C++ programs to target).

That was compiling _on_ those respective platforms rather than compiling _for_
those platforms then packaging them up for deployment. I've written a lot of
code over the last 30 years and shipped it in a plethora of different ways and
creating installers has historically definitely been easier on Windows. No
question. Go lang is helping somewhat in that it's now really easy to ship a
dependency free binary. But Windows is only one platform and one that creates
more problems for cross-platform portability than all of the rest of the big
platforms put together.

------
chungy
I strongly suspect this article was written as a response against C++, Java,
or C#. Especially given SQLite being started in 2000, C really was undeniably
the best choice, and the existing code in C is a strong argument in favor
against rewrites.

That being said, Rust and possibly even Go would be strong contenders to make
a new SQLite-like library/program today. At least on the Rust side, the C
bindings are excellent too.

~~~
saosebastiao
It's still not there yet, but getting close. And Rust is really the only
language that has come close.

In terms of performance, it's really the only language out there that can
claim to be as fast as C/C++. It has a minimal runtime, with the option of no
runtime. It has pretty great FFI and can produce easily callable libraries for
other runtime-heavy languages.

It's not, however, as broadly compatible as C. I don't think that's a problem
for most cases, as most programming is now done for (MIPS|ARM|X86|X86_64), and
it can handle those well enough. But microcontrollers and OS-less embedded
devices still have a ways to go before Rust beats out C.

And stability is just not there. IMO, that's a good thing. Rust is the best
thing to happen to systems programming in a _really_ long time, and there are
still tons of ideas with amazing potential benefits. The Rust community has
been exceptional at guiding this development. I'm sure some day it will level
off, but until then, I'm happy with it changing pretty rapidly.

~~~
tekknik
It’s also not as fast as C/C++ in all cases, only in some.

[http://benchmarksgame.alioth.debian.org/u64q/rust.html](http://benchmarksgame.alioth.debian.org/u64q/rust.html)

~~~
steveklabnik
The biggest differences there are due to lack of SIMD. We expect it to be
stable quite soon, and that should close the gap quite a bit.

~~~
igouy
Do you say that because you have coded those programs using SIMD and rustc
nightly, and seen that using SIMD eliminates the performance difference ?

~~~
steveklabnik
I have not personally done this, but the people who have identified that as
the issue, and what we’re stabilizing is the exact same thing as what C and
C++ compilers have, so there’s no reason to believe it would be different.
Once it’s stable, we’ll all see!

~~~
igouy
Someone has repeatedly commented on /r/rust/ that differences are due not so
much to SIMD but _iirc_ to not triggering the same LLVM loop unrolling.

~~~
steveklabnik
I think you're confusing loop unrolling and autovectorization. That said, it's
kinda moot, once SIMD is stable, we'll find out :)

~~~
igouy
>> I think you're confusing loop unrolling and autovectorization. <<

Here's exactly what I was told -- _" This was achieved by manually unrolling a
10-step loop, which compiler apparently could not optimize."_

------
zanethomas
It's always tempting to use c++ because of, for example, the availability of
libraries such as STL and Boost. However doing so greatly increases the
complexity of the code, when considered as a whole, and we all know that
complexity spins off bugs even in well-tested libs such as mentioned above.

The great thing about C is that when someone shoots you in the foot you know
who it was.

~~~
alexhutcheson
The big problem with writing a piece of software like SQLite in C++ is that
there is generally no easy way to expose a C++ interface to other languages.
C++ has too much complexity for other languages to be able to easily construct
the data structures it would need to make a function call or handle the
results.

The normal solution to this is for C++ libraries to expose an interface in
plain C, which is much easier for other languages to call. However, this
either restricts what you can do in your implementation (because you're stuck
with the C "subset" of C++ for anything near the API), or you have to maintain
wrapper code mapping C function calls to your actual C++ interface. Neither is
great, so plain C ends up making a lot of sense for a library that is expected
to be called from different languages.

~~~
username223
This. SWIG is a heroic effort at cross-language compatibility with C++, but it
still gets wonky because C++ interfaces are complex, especially when dealing
with templates. I still find that the best way to do language bindings is to
expose a plain C interface that has no name mangling and treats objects as
opaque pointers. Then write or generate bindings to call this interface from
whatever other language you're using. Finally, wrap that basic functionality
by hand.

SQLite is meant to be basic infrastructure that is wrapped by a variety of
other languages. It should have a lowest-common-denominator interface, and C
is good for that. And while it might be easier to implement it in
$FAVORITE_LANGUAGE, SQLite is mature and well-tested. In some sense it's
"finished," and throwing it out and replacing it would be a waste of time.

------
ralfjung
I can perfectly agree to much of what they say, but here...

> The C language is old and boring. It is a well-known and well-understood
> language.

...I think they are very fundamentally mistaken. C is a horribly complicated
language. It is one of the least-understood languages out there. Experienced
programmers and compiler authors can debate for hours about whether C code of
less than 50 lines has defined behavior or not, and still not come to a
conclusion. People can write an entire PhD thesis
<[https://robbertkrebbers.nl/thesis.html>](https://robbertkrebbers.nl/thesis.html>)
studying the semantics of C, and still leave many open question (chapter 2 of
that thesis does not require any academic background to be understandable, and
it comes with tons of links to tickets/questions filed against the C
standard). Consistently writing safe C/C++ is near impossible
<[http://robert.ocallahan.org/2017/07/confession-of-cc-
program...](http://robert.ocallahan.org/2017/07/confession-of-cc-
programmer.html>), and judging from
<[https://sqlite.org/testing.html>](https://sqlite.org/testing.html>) the
SQLite team agrees.

C is old, yes -- and C has boring and well-understood fragments. But full C is
very, very poorly understood.

~~~
SQLite
Just to be clear: The SQLite project strives to fix UB whenever any is
discovered (which is to say, "rarely"). But SQLite also focuses on testing at
the machine-code level, not just at the source code level. The machine-code
generated by GCC, CLANG, and MSVC is all tested to 100% branch coverage and
beyond. So even if one were to find some new UB in the SQLite source code, all
the usual compilers are known to be doing something sane with it, not
something goofy or harmful, and so it is not really a problem.

------
commandlinefan
"Why is SQLite Coded in C? Because 'C is best'". And you know what? I can't
really say I disagree.

~~~
oneplane
When taking the portability and the 'C is portable assembly'-statement into
account I can't disagree either. While there is plenty of hard things and
things you can do wrong with C, there is a lot that you can't do in pretty
much any other language.

------
odammit
That’s impressive:

> Libraries written in C doe not have a huge run-time dependency. In its
> minimum configuration, SQLite requires only the following routines from the
> standard C library:
    
    
      memcmp()
      memcpy()
      memmove()
      memset()   	
      strcmp()
      strlen()

~~~
saagarjha
I think this statement is slightly misleading, because nobody uses SQLite in
its minimum configuration. See the line below:

> In a more complete build, SQLite also uses library routines like malloc()
> and free() and operating system interfaces for opening, reading, writing,
> and closing files.

~~~
andrewmcwatters
A little bit of an overstatement. It's not uncommon to see minimum
configurations in many embedded scenarios.

------
linkmotif
Woah:

> In its minimum configuration, SQLite requires only the following routines
> from the standard C library:

memcmp() memcpy() memmove() memset()

strcmp() strlen() strnc

~~~
3chelon
Surely something from <stdio.h> as well?

~~~
tzs
Minimal configuration only supports in-memory databases.

~~~
3chelon
Of course. Makes sense.

------
israrkhan
C is perhaps the most portable language. Most platforms support C.

I once used sqlite in a embedded system project that was based on a VLIW
processor(Trimedia). The only compiler available was a C compiler. The
integration worked like a charm.

------
strkek
My problem with C isn't even the language; it's the ecosystem around it.
Glibc, auto{conf,tools,*}, CMake, etc.

Writing Makefiles by hand works for small projects, but for bigger ones you're
forced to choose the lesser of all evils. And if you're on linux you'll still
have to deal with glibc most of the time.

And of course you can only choose between compilers that do too much (as in,
not "C compilers" but "compilers that happen to support C as well"), compilers
that don't even aim for standards compliance, or proprietary compilers.

~~~
Cyph0n
I tried using CMake for one of my course projects recently and it worked like
a charm. Of course, I have no idea whether or not CMake can handle larger
projects as effectively. Maybe someone with more experience can chime in.

~~~
jcelerier
> I have no idea whether or not CMake can handle larger projects as
> effectively.

well, it's able to handle whole operating systems so...
[https://github.com/reactos/reactos](https://github.com/reactos/reactos)

------
tzs
This is kind of cheating, but Lua looks like it would have worked.

I consider that a bit of cheat, because it replaces the problem of embedding
one C program (SQLite) with the problem of embedding a different C program
(Lua).

But for those whose objection is that writing the database logic in C is risky
because C is too low level, this would put the database logic in a higher
level language than C.

------
pier25
[https://sqlite.org/fasterthanfs.html](https://sqlite.org/fasterthanfs.html)

This page shows that SQLlite is 5 times faster than Win10 filesystem, but the
performance gains are much small in other systems.

Can anyone comment on this?

~~~
ekingr
See previous discussion:
[https://news.ycombinator.com/item?id=14550060](https://news.ycombinator.com/item?id=14550060)

------
kbumsik
I love C for system-level coding. As mentioned in OP, you don't need to study
much about language implementation e.g. how GC works, the internal structure
of objects. You have the control of every espect of system and that's why
Linus prefer C over C++.

But I have to admit that the biggest downside of programming in C is people
make their projects complicated. I saw some projects that have compilcated
Makefile build systems and have hidden definitions. The worst part is that
they leave these hidden stuff UNDOCUMENTED. This is often very frustrating
when contributing open source projects.

~~~
pcwalton
You absolutely do have to study a lot about the C language to understand when
you slip into undefined behavior. There are even circumstances in which
compiler developers don't agree on whether some code has defined behavior or
not.

~~~
kbumsik
You are correct. But most cases are explicitly mentioned in compiler's
documentations. I would say it's less efforts than getting to know about other
language internals.

~~~
pcwalton
I disagree. It is significantly easier to determine the semantics of arbitrary
Java programs than C programs.

------
alricb
Ada wouldn't fit because of compatibility and availability, but it's old,
fast, safe and relatively simple, as compared to C with extensive scaffolding.

~~~
irundebian
I don't understand this. Ada is still being developed. Other question: Why not
using SPARK?

~~~
jeffreyrogers
It's much easier to include C source code in an application than Ada.

~~~
irundebian
Yes, but you don't want C if you choose Ada, no?

~~~
jeffreyrogers
But sqlite is a dependency for many other applications, lots on platforms that
probably don't have good Ada compilers, so sqlite wouldn't be as useful if it
were written in Ada.

------
camgunz
There are a lot of comments about "well, if you use a bunch of tools and write
a million tests, C is just fine". But do you _enjoy_ doing that? I know I
don't, and anything we can do to achieve whatever we're aiming for with C
(speed, resource usage, startup time, ubiqity) while avoiding that is great.

------
domenukk
Hello, our friends at MITRE have assigned CVE-2018-8740 to an issue in SQLite3
that was discovered by OSS-Fuzz working on GDAL. [http://seclists.org/oss-
sec/2018/q1/244](http://seclists.org/oss-sec/2018/q1/244)

------
JepZ
Yes, SQLite is coded in C and that is okay. But there are situations, when you
wish it would be different. For example if you want to cross-compile a Go
project and you have to find out that the Go lib sqlite3 just provides
bindings for the C library (probably due to the high quality of the C lib) and
therefore breaks the easy to use Go compiler :-/

I don't know an easy solution to this as most C compilers seem to be platform
specific. In general, I like C and even more SQLite. Nevertheless, it is quite
unfortunate if you can't make use of the modern cross platform compilers.

Maybe one day we will live on a world where cross compiling isn't an issue
anymore.

~~~
saagarjha
> the Go lib sqlite3 just provides bindings for the C library

So compile SQLite for your target platform and use the bindings?

~~~
JepZ
Yeah sure, but in order to do that you need an extra C compiler for the target
platform. With pure Go applications it is as easy as changing a parameter for
the standard compiler.

------
elchief
The browser you're using to read this is written in C++, on your C++ or C OS,
talking to a web server written in C.

~~~
ekr
Other than HaikuOS, I can't think of any other real-world kernel written in
C++. Linux, all BSDs, Windows NT, Hurd are all C. I'm not sure but I think XNU
is C as well (the wiki lists it as C/C++).

~~~
aidenn0
While the kernel is C, I believe that most of the windows OS is written in
C++.

------
andrewchambers
rust panics when out of memory, that would not be acceptable to sqlite3.

~~~
steveklabnik
The Rust language knows nothing of dynamic allocation. You're speaking of the
behavior of the standard library, which is trivial to drop. You're free to
have whatever allocator semantics you want, including robustness on OOM.

~~~
pornel
Steve, you usually give better answers. Invoking the distinction between Rust-
as-defined and Rust-as-commonly-used feels like language-lawyery dodging of a
real issue.

Suggesting to drop all of std just to customize just one feature of the
allocator it is not a good solution. You should know harsh OOM handling is a
problem for Rust users, and there's work being done to improve it.

~~~
steveklabnik
Well, suggesting that Rust inherently has this problem is mis-representing the
situation. And while it's true that lots of people use std, many people also
don't, and _especially_ the people that care most about this aspect of Rust.

> You should know harsh OOM handling is a problem for Rust users

A small number of people have this issue. It barely even applies on entire
operating systems, for example.

That said, custom allocators will be nice.

