Hacker News new | past | comments | ask | show | jobs | submit login
The SQLite Database File Format (sqlite.org)
144 points by s16h on Sept 29, 2014 | hide | past | favorite | 42 comments


Apparently the latest and greatest SQLite is sometimes 50% faster than 3.7.17 released in July of last year. Impressive!

I also love using SQLite for testing compiler optimization passes. Their TCL test suite coverage is quite good. Unfortunately, I think you have to splurge tens of thousands of dollars to get the C-based test suite. I believe you can only test full branch coverage with the C test suite.

Is SQLite used as the internal database for some popular systems?

Both iOS and Android use SQLite. That means billions of devices have SQLite on them. That makes SQLite the most popular implementation of SQL. See: https://developer.apple.com/library/mac/documentation/cocoa/... http://developer.android.com/reference/android/database/sqli...

In case you've been living under a rock .. https://www.sqlite.org/mostdeployed.html

That text needs an update. There's no mention of Android and Symbian is all but dead.

Your choice of languages leaves room for improvement.

Firefox and Chrome both use SQLite for some of their data. It's a great package.

Opera did too, also the new one does.

This podcast talks with the creator about some systems that have SQLite in it and the open source project. http://twit.tv/show/floss-weekly/26

It comes bundled with most Python implementations (import sqlite3).

People have provided some good links, but I suspect more Rails apps run on it than you might suspect. I run a few of mine on it. It performs more than well enough until you have tens of people attempting simultaneous writes (or doing crazy stuff like logging or message queuing through SQLite), which perhaps 90%+ of webapps don't need to do anyway. More time is spent rendering the view than grabbing the data in most cases.

Fossil [0] is a scm that does a lot of really useful things that revolve around managing code:

- web interface, native or CGI

- wiki

- bugtracker

Fossil is implemented using SQLite as a storage, which is rather unsurprising given that both are written by the same guy.

[0] http://www.fossil-scm.org/index.html/doc/tip/www/index.wiki

That looks really interesting, and lightweight! Have you used it? Any comments on it? I only ask because I have a rather old CVS tree to migrate; I have used SVN but by the looks of things this is even more lightweight. Comments on it would be welcome.

Fossil is great. I've been using it increasingly for some of my projects, mostly OpenBSD related where I need versioning before the code moves into CVS. E.g. I used fossil for about a year during the initial development phase of OpenBSD's rtsx(4) card reader driver.

Since fossil has the ability to open a repository (which is a single sqlite database file) in any arbitary directory it nicely overlays with other versioning systems without wanting to store its meta data in hidden directories within the same tree. I often add a bunch of files from /usr/src to a new .fossil file in my home directory for temporary local version control.

You can migrate to fossil from CVS by converting to git first (using e.g. uwe's excellent git-cvs converter at https://github.com/ustuehler/git-cvs), and then generate a git-fast-export stream which you can import into fossil with 'fossil import'. I once tried this with the OpenBSD CVS repository but aborted the import after 7 days. By then it had imported history from 1995 to somewhere in the year 2000, and I was convinced that fossil can't handle trees this large (or histories this deep? Not sure which). This was with a hack to the fossil import code to make it use less sqlite transactions... without that it was even worse (sqlite can handle many queries per second but not so many transactions). So if your code base is large your should consider Subversion or (unless your code base is humongous) git instead.

That's great, thanks. I do not have a particularly large codebase so this might be worth a look at!

I haven't used it myself, but the comments I've read are overall similar: it's great to have a single binary to do almost everything you need, making it the ideal choice for projects that have few contributors, little history and/or low SLOC count without having to deploy all the usual jazz.

SQLite is also used in many embedded systems. If you plug in a USB drive to my TV, it creates a SQLite database to keep track of the current position of the video files it plays.

Adobe Lightroom.


by "concurrent users" the site means "concurrent users for writes" because for reads concurrent users is fine in sqlite.

This is an odd, random thing to be on the front page. Did any event in particular happen to make this come up?

Welcome to HackerNews. The front page doesn't have to be plagued with flat/material design boostrap themes, Wired.com garbage and TechCrunch articles about startups that are written by the startup's founders.

Databases are a ton of fun to read and talk about. If I ever have an event in my life that means I won't have to work anymore I'll spend the majority of my time studying databases.

The next time you're cussing at Microsoft Word for being slow while you type out a paragraph, think about SQLite and how it manages to do sub-millisecond queries to something that is just a file on your file system. Wow!

Doesn't seem that odd to me. I love stumbling on things like that. Makes me learn a bit of something I wouldn't have learnt otherwise. I think this kind of knowledge is of common interest to so called "hackers" or people that like to know how things work.

There was this submission[0] yesterday. Probably someone wanted to learn more about it and stumbled upon this amazing document.

[0] https://news.ycombinator.com/item?id=8378894

The format is amazing, however the document is not well written. You have to read about half of it to begin to understand how the format works. I had to read it whole for a project and it was really time consuming, though I don't think I could write it any better than it is right now.

There were some blog posts that were far clearer to understand the file format, but unfortunately I lost track of them.

I've grown to like the SQLite documentation to the brink of adoration. It reads verbose but has a certain mathematical succinctness and completeness to it.

Took me some time to get used to, and yes, you always have to read the entire paragraph.

In the unlikely case that this is intentional it's genius: it doesn't let you get away with "what do I need to know to make this code compile/kinda-sorta-run", but almost forces you to at least notice the things you should know when using it (protecting you from dangerous half-knowledge).

I'm not sure if this is perfect for file format documentation, but I would guess that a more "traditional" presentation would answer certain questions faster - such as "oh! this looks easy" / "this will take me weeks to work through", or "no, they don't seem to use RFC455692673 index optimization". However, for actually uisng the format for something productive, the time invested would be about the same.

All in all, SQLite is a pleasure to work with, and that's something outstanding on its own.

As a pedagogical exercise, to help me understand the design, I wrote some Go code to read a sqlite database and dump table contents. I remember having to peek at the source code to figure out a few things that weren't explained well in that document.

I'd previously written code to read the lucene database format (python code that time), which was also a good learning experience. I ended up leveraging some of that knowledge for a work project.

And I've also played around with reading/writing git archives.

I'd like to improve the document so that reference to the SQLite source code is not required. Can you explain (perhaps via private email) what the file format document did not describe sufficiently for you to implement a reader in Go?

It was over a year ago, so my memory is a little fuzzy. I do remember that near the end of the process I looked at the source code to answer a question or two. I'll skim over the docs again and see if I can remember what I was confused about.

Yeah, these no-context submissions are annoying but common here.

Also very interesting: http://www.sqlite.org/testing.html

The problem with SQLite is that there's no official RFC, it's a pseudo database.

When you're weighing up what to implement, it certainly helps when there is an agreed upon technical specification.

Hmmm. This presses one of my buttons.

An RFC that justs explains a data format is essentially useless to me. Well, the format of a file is an important part, but it's less than half the story behind what's there, because you're missing a lot of semantics (that are not easily described in english).

There are many "campfire story" class RFCs that raise my blood pressure because they take a complex subject and describe its syntax and formatting and layout pretty well, but leave the hairy algorithmic stuff in the domain of hand-wavy and loose English.

For some reason we leave critical infrastructure open to interpretation. We don't have standards that actually help implementers make sure that what they've written is correct, instead we have documents that try to describe correct behavior and maybe provide a little guidance. For some reason it's gauche to ship working code in an RFC [I've been reading standards for 35 years and I used to work for NIST, I think I know the issues with code in an RFC -- they seem easily surmountable].

There are very important RFCs that don't have reference implementations in their (HTTP, I'm looking at you. If you've ever written an HTTP proxy then you know the particular hell I'm talking about, and don't get me started on PKI or -- oh Lord -- CSS).

On thing that caught my eye here was the section on the B-tree in the sqlite file format. I've written these, they are hard to get right. But the section here is an english description of a very complex data structure, and if this appeared in an RFC and you asked ten different developers to reimplement it then you'd wind up with wildy different interpretations of the text, all non-interoperable until the world finally decided to have some kind of "interop fest".

So I'm happy that SQLite already has a publically available b-tree implementation. Because you won't have a "standard" without that code, and QED.

Complex standards need reference implementations. Just calling something an RFC doesn't make it more useful. Also, forcing ivory tower types to write working code is the best anodyne I know against unimplementable and overly complex standards.

This seems to be a valid comment, but I'm not sure I understand it. Can you show some examples from other databases? How are they doing it right?

I assume that they're referring to the great browser wars, where Web SQL was abandoned in favor of IndexedDB because the former was essentially a call out to use SQLite as it is. Many argued against a single implementation because it didn't have an underlying high level specification that would allow for other implementations.

Why are you assuming I am referring to anything? I simply said I don't completely understand the parent comment and would like some examples. Maybe you intended to respond to another comment?

And I was obviously trying to address your question, speculating what jpswade was talking about.

If you fail to get the basic context of a thread, maybe the internet isn't for you.

An RFC might be useful to define the file format as a general purpose data-set format.

"MIME type application/my-example is defined as a resource conforming to RFCxxxx with the following tables..."

The basic data-set RFC could leave out the internal locking and transaction support, leaving "must be zero" placeholders.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact