Hacker News new | comments | show | ask | jobs | submit login
SQLite 3.8.0 Release with next-generation query planner (sqlite.org)
245 points by conductor on Aug 26, 2013 | hide | past | web | favorite | 91 comments

I'm here to declare my love for sqlite, a database that doesn't asks me to choose a root password (wtf, everybody hate passwords) and an application user password (double wtf), doesn't open a socket (being it unix on TCP), doesn't install a daemon somewhere in the system, doesn't need to be root to start, doesn't asks me to create a new dba user in my system, doesn't require me to be understand how the root daemon starter hands over the control to the dba user when something goes wrong at startup, and generally doesn't bother me with stuff that has really nothing to do with storing and querying data.

I don't run this in production only because I would have a hard time convincing everybody to do it or by cargo-culting my fears (depending on the project). But one day I will.

I love SQLite but I don't think it's fair to compare it directly to other rdbms. As the official website itself puts it:

"Another way to look at SQLite is this: SQLite is not designed to replace Oracle. It is designed to replace fopen()."


There are very few instances where I end up in a position where I have to decide whether I need to use postgres or sqlite, they have very different use cases.

I don't think it is designed to replace fopen().

They can't say it, but it is designed to replace FoxPro and Access, or more generally: a local ODBC connection.

So in short... if anyone gets the chance of an sql injection, they have access to all tables and all schemas from any app that can access your database. Privilege separation is a very good idea. Not opening the database file directly is a very good idea too.

Just to be clear, I'm not saying every application needs it. If you actually use it as a replacement for local files, that's great. If you use a multi-user app containing sensitive data - "root password (wtf, everybody hate passwords)" this is so wrong. Reminds me of "chmod 777 everything" - wtf, everybody hates dealing with permissions.

The gp point that sqlite gets rid of all the "stuff that has really nothing to do with storing and querying data" still stands.

Sqlite eliminates almost all of the management overhead that other databases force on you, so its trivial to have one database per application. Indeed, I have never seen anyone bother with a multi-application sqlite db, which could also cause serious performance problems. So that renders you sql injection point moot.

As for permissions, its better to let the OS handle that than some half-baked database permissions system. Root accounts/passwords are a wtf in a production environment because they destroy auditability. If you really want permissions, it is trivial to create system user accounts for the application and assign the correct file permissions, or even isolate particularly sensitive information in its own file. Combine this with AppArmor to further lock down individual applications.

All of this is overkill, but so is a full blown RDBMS in many cases. If you are starting out building a new application, just target sqlite until you actually need something bigger. Odds are, YAGNI.

> Sqlite eliminates almost all of the management overhead that other databases force on you

Multi-user support is what forces that overhead on you.

It's just that SQLite consciously aims to be a single-user system.

> As for permissions, its better to let the OS handle that than some half-baked database permissions system

This is nonsensical. There isn't an isomorphic mapping from the Unix permissions model to SQL's permissions model, particularly when you start introducing stuff like row-level permissions.

If you write an application which uses a DB backend, and you require multi-user capability, you can either delegate some of that to the database, or roll your own. In which case, half-baked permissions systems are back! Especially if you can't isolate database connections (because it renders any app-code permissions system moot if you allow 3rd party code execution, such as plugins).

All that overhead is there for a reason. Those reasons are irrelevant in the single-user case. But they are not irrelevant in the multi-user case.

> All that overhead is there for a reason.

SQLite's classic use case is where SQL-like access to data would be convenient but a full RDBMS would be tremendously inconvenient, e.g. embedded systems. People seem to be citing inappropriate environments for SQLite as reasons to say it is of no value. You get the same sort of silliness from NoSQL zealots.

> As for permissions, its better to let the OS handle that than some half-baked database permissions system.

Sorry, but that doesn't make sense. For example: if you want your application to be able to add new rows, but not delete them (for audits actually) you cannot do this with OS permissions. Only database-layer permissions will make it possible.

There are many other examples of such use-cases. You simply cannot use OS/apparmor/selinux/whatever external to the database to grant/revoke per-table access.

> Root accounts/passwords are a wtf in a production environment because they destroy auditability.

On this side however - why would anyone know / have access to your root account? Applications should never have it. Servers should never have it (in a form different than the hash). Only people should have it.

Interestingly, there is SELinux integration for Postgres: http://wiki.postgresql.org/wiki/SEPostgreSQL

AFAIK, one database per application would not protect other databases from a SQL injection exploit.

The "ATTACH DATABASE" command (http://www.sqlite.org/lang_attach.html) can open any other database that the current user has permissions to.

You could, as you mentioned, set up a new user for each database. At a certain point, the permissions systems in a traditional RDBMS may be easier to manage.

The stuff that would get rid of the SQL injection, is the database API, removing the general purpose query(string) function. And forcing some structure on the parameters. First a little breaking to get the easy injections out, and progressively forcing the whole host language mapping of the query, including the result schema. People will scream about the debugging, but they will gain a little bit of stuff in exchange (like projection in a variable, where clause re-use etc).

It's a single user database, hence it doesn't need multiple users. It's great for a single user application (e.g. Firefox) but not at all good for multi-user, multi-session applications.

I don't intend to say its no good, btw. It's fantastic! But you have to use it where it is meant. You aren't comparing apples with apples I'm afraid.

If you are using sqlite for hosting a production scale app, you are doing it wrong.

If you want an database for your mobile app, or a web app use a full fledged database like MySQL or Postgres.

sqlite is for embedded storage applications, especially in the backend. Where the resource utilization foot print is supposed to be minimum, and where you running into situations where are you are reimplementing SQL like DSL's to query and write to files. Or say you just wish to maintain some structured form of inventory.

Best example I can give you is a web browser. There is a requirement to store a good deal of structured information, in a mechanism which requires little resource utilization, which doesn't require additional software like demons or a server to run or complicated deploy and maintain scenarios. Or even a strict configuration.

If you look at this use case you will see tons of backend application would be better off using a tool like sqlite than play with huge xmls and endless rewrite parts of SQL badly.

Most of this is not true for PostgreSQL either. The only two things PostgreSQL needs to run is a daemon (it does not have to be actually daemonized) plus a unix socket, passwords, user accounts, root privileges, etc are all optional.

Sure SQLite (which I love) is easier to get running, but do not exaggerate issues of traditional RDBMs.

In PostgreSQL 9.3 which will be released in a couple of weeks they will also fix the cases where you may need to use sysctl to iincrease shared memory limits.

My favorite thing about SQLite is the business model. My understanding is the product is free, but they charge for their test suite. If you build hardware that is going to use SQLite, then you should pay for the test suite, and that funds the future development. It is a tautologically perfect business model. Revenue toward a rock solid project pays to make it even more rock solid.

I don't know how much the test suite costs by itself, but it is included in Sqlite Consortium membership, which is a typical open source support-style business model. For $75k, you get almost a month of sqlite developer time and tons of support (eg their home phone numbers). Not a bad deal for companies that want deep sqlite integration or customized versions that fit on tight embedded systems.


They also sell advanced features such as encryption and compression.

Charging for encryption now days is horrible. We should be encouraging devs to put encryption everywhere. It should be the default.

Personally I don't like it when projects mix money with opensource.

Given that SQLite is intended as a replacement for fopen(), I'm not sure how many normal files an application uses (or should use) are (or should be) encrypted. I cannot think of a lot of use cases.

At work we need to because we are distributing map data that was licensed to use under the condition that we don't ship the data unencrypted to the clients, however nonsensical that is, given that you have to distribute the encryption key the same way.

But for use cases like Lightroom which stores its catalog in an SQLite database or Far Manager which stores the config there I very much doubt they should be encrypted by default.

> Given that SQLite is intended as a replacement for fopen(), I'm not sure how many normal files an application uses (or should use) are (or should be) encrypted. I cannot think of a lot of use cases.

SQLite has been under development for thirteen years and has hundreds of releases. I don't think the "feature X? you're doing it wrong, we're just a simple replacement for fopen()!" refrain holds a lot of water at this point.

FTR when you work in the financial industry, all kinds of files and databases get encrypted. I agree that encrypting a file or database is not extremely secure since the decryption key is sure to be nearby, but it's still an important need, there are security audits and all kinds of things that call for encryption when available.

That said, it's the financial industry. If the encryption feature is offered as a paid one, that's not so terrible.

Nothing stopping you encrypting your stuff before you put it in to the database.

Or encrypting the filesystem that the database is stored on.

Back in the sands in the time Sun also did this with Java. IIRC the associated license was a great cause of pain for open implementations of the JVM, since without passing the suite you couldn't call yourself Java, and with passing the suite (insert some horrible restriction here I forget)

Without passing the suite, you couldn't get the patent licences, so Sun could put you out of business if they chose to. Sun used the test suite to define what was and wasn't "Java" (you had to pass the whole test suite or you couldn't ship product) & thereby prevented anyone else from using the core language in interesting ways that Sun didn't approve of.

Google did an end-run around these restrictions by using the Dalvik virtual machine instead of the JVM where the crucial patents lurked, so Android didn't have to implement the whole of the Java library before they shipped.

But these days, any where you could possibly deploy SQLite are generally linux based platforms.

And you don't really worry about underlying hardware on those machines.

I'm sure every smartphone vendor would probably disagree with you there.

We've got a big windows SQLite deployment. We use it as the back end for our entire test suite and it works wonderfully.

SQLite is one of the most important pieces of software you never see. I've used it in projects from many years back to recently and you know what's awesome about it? It Just Works.

I LOVE SQLITE - both only when Core Data handles it for me haha

Unless you want to use it with multiple users and connections, in which case it stops working. It's good for embedding data storage in clients and testing.


Until 2010, I ran a forum on SQLite with 600 users/day + around 1000 posts/day on a single OpenBSD machine with 2gb RAM and 7200RPM HDD. We moved to Postgres when we passed 1600 posts/day.

There was some minor slowdown noticeable around 6AM - 10AM, but besides that, we never had major issues. The bottleneck was usually the network, not the DB.

Edit: Anyone interested in playing with SQLite should give the SQLite Manager plugin for FF a try : https://addons.mozilla.org/en-US/firefox/addon/sqlite-manage...

(I'm not affiliated with the developer, I just like the tool)

600 users/day and 1000/posts a day is miniscule.

You can run something like that with no noticeable performance issues on a classic 120MHz Pentium and flat files (I'm speaking from experience of running a USENET news server handling 30,000 groups for an ISP on a machine like that, which was shared with lots of other functions), so if the bottleneck had been anything other than the network, it'd have been shocking.

To add to this, since 2010 SQLite improved concurrency a bit with write-ahead loging: http://www.sqlite.org/draft/wal.html

I thought SQLite locked the entire DB when doing an INSERT. Seems like that would slow a forum app considerably. Though I guess at 1600 INSERTs per day, which averages to about 1 per minute assuming even distribution throughout the day, you won't have that many lock collisions.

The sqlite locking model can eat 1600 inserts before breakfast with only minor lock contention. The whole file lock doesn't take effect until the moment the write is ready to go to disk.


Writes were generally quite fast so the next write rarely got queued. Also we ran it single-threaded. When we moved to Postgres, that also gave us the opportunity to safely switch to phpfpm without worrying about the lock issue.

The nice thing about Postgres as opposed to SQLite is that we finally got "searching" as a feature. ;)

Not if you use it in the WAL mode.

Really? I thought SQLite is for development purpose only. but 1000posts/day? Wow..

> but 1000posts/day? Wow..

While a forum that gets 1000 posts/day is quite active, a database that does an INSERT ever 80s is not even moving.

For comparison, 8 years ago I wrote a queuing system that did either in-memory queues, or wrote to sqlite for durability. With a little bit of tuning, the Sqlite queues could handle at the very least hundreds of thousands of messages a day - we never pushed it to it's limit.

Presumably there were many SELECTs for every INSERT.

Yes. Yes there were. :)

We always limited transactions for UPDATES and INSERTs that involved more than one table.

It was quite the learning experience since you do have to think of different ways of working with data. You get to learn very quickly the difference between what like to store vs. what you actually need to store to get a particular functionality. But that turned to being a boon in the end because we got a much simpler forum as a result. Fewer bells and whistles meant users focused on actual discussion and not ancillary, shiny bits and bobs.

But on a machine with 2GB of RAM running a forum that small, pretty much none of those selects should ever need to hit disk - it should all pretty much be in the buffer cache, unless there were lots of full text searches of the entire posting history.

EDIT: And that is without any app specific caching.

Put simply, you thought wrong. SQLite is one of the most-used pieces of software out there, absolutely including production software. http://www.sqlite.org/mostdeployed.html

I should add, that we ran nothing else on the box. Just the forum and it was text only (no image uploads etc...).

Backups were pretty easy. ;)

Please clarify "for development purpose only"? SQLite is used in a lot of production software you rely upon daily, and is one of the most robust, useful pieces of code going.


Now it's an embedded database, which is why leokun's comment is out of place. However you can absolutely drop a queue in front of it, for instance, and serve many "simultaneous" users for a purpose such as a web forum.

I'm really glad to read that, and I was stupid not to do it.

You basically just said your car is reliable unless you need a semi-truck, in which case your car is worthless.

Apparently I'm wrong! Too late to edit. Good to know though, not afraid of saying something and getting downvotes if I'll learn something. :)

I don't think you were downvoted because you were wrong. I like to think the HN community isn't that shallow.

Reasons for downvoting is usually due to posts containing no new information, excessive snark, flippant attitude etc... Users and moderators are extra touchy about avoiding the Reddit ambiance here. Basically if it doesn't contribute to the discussion, then it may be downvoted. That said, I didn't downvote you because what you said allowed me to add to the discussion.

I try to follow this when I vote : http://news.ycombinator.com/item?id=1065084

Just to remind us all what "testing" means[1]:

    As of version 3.8.0, the SQLite library consists of approximately 84.3 
    KSLOC of C code. (KSLOC means thousands of "Source Lines Of Code" or, 
    in other words, lines of code excluding blank lines and comments.) 
    By comparison, the project has 1084 times as much test code and 
    test scripts - 91452.5 KSLOC.

[1] http://www.sqlite.org/testing.html

The related article on the new planner is great reading too: https://www.sqlite.org/queryplanner-ng.html

For me at least, partial indices are the killer new feature of this release - https://www.sqlite.org/partialindex.html

"Query planner are what make SQL database engines so amazingly useful and powerful. (This is true of all SQL database engines, not just SQLite.)"

No it isn't. I have been using SQL databases which got absolutely horrible query optimizer totally ruining any performance exceptations. I have blogged about that, and I was very frustrated and disappointed with results.

Which is faster if id>100 or simply walking database by using series of selects select ... id=101 ... select ... id=102 ... Turns out that the later method is a much faster. Engineers also then suggested that I could optimize the query by using structure select ... id=101 UNION select ... id=102. That's simply insane. How ever the query was written, query otpimizer / planner should make sure that all these queries are delivered using similar background process. Of course that walk is impossbile to optimizer, because it's series of individual queries. But it's just crazy that select id>100 is slower than that!

Yes, you might have guessed that the SQL database I'm referring to isn't SQLite.

And finally, I love SQLite and use it for most of my (work & hobby) projects as local database.

Just curious, what database is that? If the index on id is a B+ tree rather than a hashtable, it's very unlikely id > 100 would be slower. You are issuing one query with it vs. N queries with the 2nd approach.

Were there a lot of qualified rows for id > 100? If there're a million rows qualified for id > 100, which way is faster?

The optimiser might not be using an index in the most optimal way. Sometimes you have to guide the optimiser and it may not have enough statistical data to get the best plan. One way of doing this is to rewrite the query. In this case, if I had to hazard a guess, the developers have suggested you break down the where clause from OR conditions to instead use a UNION ALL, which may help influence the optimizer to choose the most optimal plan.

Adding statistics or an index may help. Just understand that if your data distribution changes radically the index or rewritten query may no longer be valid.

I love query planners and optimisers. However, here's where I get a bit bemused: as it states in the article, a query planner must often go on incomplete information. Therefore you have to know the underlying assumptions built into the planner.

This is where explain/execution plans are great. Unfortunately, even these aren't completely documented. SQL Server is not bad, and Postgres is not only well documented but you can read the source. Oracle is often a basket case - there are literally thousands of undocumented parameters that you can set, not to mention underlying data views that also haven't been documented. In fact, there have been whole books trying to unravel the Oracle CBO, but even the most obscure often admit that Oracle fix optimiser bugs which can cause an impact on query plans.

In some cases, you must implement hints, because there is no way of catering for all queries under all loads based on all possible data distributions. Sure, you can add a variety of different types of indexes, histograms and other statistics to try to influence the execution plan, but at the end of the day sometimes you simply have to add a hint to force a particular sort of join, etc.

I guess my main point as I muse on query optimizers/planners is that none of them are perfect. It occurs to me that one of the best so far is actually Postgres, but alas - they don't allow for hints.

What's my main point here? It really comes down to the fact that the ideal for database queries is to allow for a declarative syntax ("give me all rows of product x that are over $100") without having to worry about the mechanics of retrieving the data, it's just not entirely possible. You really need to give developers and database folks the ability to nail down the actual mechanics of data manipulation.

The write up of the NGQP seems (at least superficially?) weird: they say that the reason for the rewrite was poor performance on a particular artificial benchmark (8-way join.. how often do people join 8 tables in practice?). At the same time, I was surprised to learn that the only kind of join algorithm SQLite supports is nested loops. I would think they would benefit more from adding hash and merge joins instead of rewriting the QP from scratch.

I regularly do 20+ table joins.

SQLite 3.7.17 already makes automatic indices, if none exists for the join, if the planner recommends it.

> how often do people join 8 tables in practice?

If I were to judge based on my current experience, I'd say more often than they make queries with less than 8 joins. In the stored procedure I'm working on now, there are many queries with up to 20.

People in this thread say SQlite is fine for apps with lots concurrent writes (people advocate running forums and other multi-user apps using it) but that's not correct in my experience. I wrote a blog[1] in Python that uses a Sqlite database and on each post view a view counter was incremented[2] using a simple UPDATE ... views = views + 1 on the post row. Removing this update caused page speeds to increase hugely, from ~10 requests a second to ~200 or so. Maybe I was doing it wrong, but I can't see how.

1. https://github.com/orf/simple

2. https://github.com/orf/simple/commit/2121401030cab4d500330e9...

I'm not familiar with the ORM here, but is it really just an increment ("update posts set ctr = ctr + 1 where id = 9") or a select, add one in code, and then an update? Also bear in mind that one operation in an explicit transaction like this is quite slow, at least prior to WAL.

Be ready to hear more and more from SQLite as the world moves to mobile (iOS/android) - it really shines in that environment.

Ah, the beta presupposition: SQLite version 3.8.0 is actually one of the most heavily tested SQLite releases ever. Thousands and thousands of beta copies have be downloaded, and presumably tested...

I really wish there was something like sqlite to replace MS Access - actually I think what I really want is something to replace MS Excel that isn't MS Access. The ability to have a self contained file based database that can be moved around and shared, but that can also hook up to external data sets has a lot of use cases in enterprises. But it would require a good query view, the ability to create forms and reports, and have a sane sql dialect.

Well SQLite is already around and stable, so all you'd need then is a really good Access style front-end for it.

"May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give."

can it nest joins in parenthesis yet?

  SQLite version 2013-01-09 11:53:05
  Enter ".help" for instructions
  Enter SQL statements terminated with a ";"
  sqlite> create table a(id integer);
  sqlite> create table b(id integer);
  sqlite> create table c(id integer);
  sqlite> select a.id, b.id, c.id from a 
     ...>     left outer join (b join c on b.id=c.id) on b.id=a.id;
  Error: no such column: b.id
SQLite is now the only major database I'm aware of that still can't handle this syntax (postgresql, mysql, oracle, SQL server, firebird, etc., no problem. I'm also well aware of how to use a SELECT subquery as a workaround).

Supporting some crude form of stored procedure would be nice too (you can already add SQL functions, why not procedures?), so that a 3rd party can provide semi-native ALTER support (don't worry, I don't want to put ALTER into SQLite itself, I understand it requires rewriting of table data, I'd just like there to be a standard implementation of that approach somewhere).

I haven't tried SQL Lite, I always used Firebird SQL embedded.

It's a pretty nice little DB for a lot of things.

If you just want to play with it a bit, there's a Firefox extension available : https://addons.mozilla.org/en-US/firefox/addon/sqlite-manage...

It's not very powerful, but you can play around with different schemas. You can even export to different formats and edit manually if you like.

Wish it work on iOS...

Is there something about sqlite that postgres can't come in to replace it completely (a kind of dynamic postgres that starts up and shuts down like sqlite). With mysql,postgres,and sqlite, the open source db industry gets diluted. Some kind of merger would be great.

edit: to below posters, I'm fully aware sqlite is embedded, i use it everyday for android. I'm just asking if there's actually a technical reason why sqlite and postgres (and mysql) couldn't be merged. A db that can be both embedded or run as server. This would reduce the duplication of effort in open source db community.

SQLite isn't a replacement for postgresql (or vice versa). SQLite is a replacement for fopen.

Among its many advantages in that situation: a database is contained in a single file with minimal overhead, the source is shipped as one .c file and encouraged to be statically linked, and the code size is relatively small (compared to something like postgresql).

I hope sqlite someday add functions and/or store procedures. Plus better error messages (like when a foreign key is violated)

Why exactly do you need stored procedures in sqlite? You can just write code. Are you using it in a scenario where you can't write code to interact with sqlite?

For the same reason are used in others DBs.

Some things are better expressed in the DB native language. Help to abstract and share implementation of DBs that are used by different languages.

Why would you want to use PostgreSQL in the contexts that SQLite is used?

SQLite is fantastic when you need a data store for mobile or embedded use, you'd like ad-hoc queries over your data with the option of adding indexes to speed up frequently used ones, you want transactional update to the data, and multi-user access is not a problem.

It's basically a much better replacement for the hash table on a disk approach of DBM libraries.

PostgreSQL would just be total overkill for this kind of scenario.

I just would like the simplicity of use of sqlite on my server. No socket, no connection, no driver that has to match the server, no config file in /etc, no user in /etc/passwd, no cruft.

The separation between the client and server that brings all the 'cruft' makes it a lot simpler to have high performance concurrent operations, a major advantage of a full DBMS over SQLite

> I just would like the simplicity of use of sqlite on my server.

The simplicity comes from largely from not supporting multiple user accounts (authn/authz are left to the owning application), and secondarily (I think) from not being designed to support large numbers of connected processes. If your server need fits that profile, then you can use SQLite and benefit from its simplicity. But if -- as is often the case -- it doesn't, you can't, and its not because no one has merged SQLite with PostgreSQL, its because there is inherent complexity that comes with serving certain use cases.

This is a brilliant idea.

If postgresql could have a single file, embedded mode, then it could replace sqlite.

The advantages of such a design would be: 1. All the optimisations and transaction guarantees of a full-featured DB 2. All the conveniences of a single file DB 3. Better utilisation of open-source resources as the OP points out.

On the other hand, the postgresql system might be bigger (in terms of code size and heap usage) than sqlite. So it may not be feasible to use in all embedded applications.

I think embeddable mode is more realistic than single file since the single user mode used by for example pg_upgrade is quite close to that. Some features are missing though for wide use.

SQLite is database for clients and relatively small application specific databases. Postgres is database for servers.

Your Firefox browser and Skype client come with SQLite.

It might be off-topic, but my dream is that SQLite would also become a document-based database, like MongoDB, but keep SQLite's lightweight and speed!

If you really wanted, you could always take a FriendFeed style approach and create a document store on top of SQLite. http://backchannel.org/blog/friendfeed-schemaless-mysql

Quote: "Do not fear the zero!"

I love SQLite ...

How does SQLite compare to H2? When would I want to use one or the other?

I guess when you want to use a pure Java solution. H2 is good that it's just a jar file. You pre-configure and package up everything in one bundle. There will be no setup. Things just work when you copy the bundle over.

I wish I could use it in the browser in a future proof way.

Web SQL Database rocks.

Sqlite and Core Data are synonymous in my mind (hint: it's because I'm a terrible programmer)

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact