Hacker News new | comments | ask | show | jobs | submit login
Appropriate Uses for SQLite (sqlite.org)
359 points by tzhenghao on Feb 12, 2018 | hide | past | web | favorite | 150 comments

This is one of my favourite technical sites. It is very clear and to the point about its good use cases while being clear where an alternative would be a better choice.

In a similar vein, VueJS has a comprehensive exploration/comparison of itself compared to other frameworks[0].

I really appreciate open source maintainers that have the humility to compare their work fairly with others within the domain while still taking pride in their own acheivements.

[0]: https://vuejs.org/v2/guide/comparison.html

I'm trying to make a habit of adding a "Do NOT use if..." section near the top of my projects' READMEs whenever there are simple exclusion criteria.

For example https://github.com/sapcc/swift-http-import#do-not-use-if -- This particular instance was born out of watching users bending over backwards to use the software even though standard software does the job just fine or better in their concrete case.

Agreed. One of my pet peeves are projects that gleefully point out all of the issues that other projects face ('Not ACID compliant! Bound by CAP theorem! Difficult replication!') then present themselves as magical solutions free of all trade-offs.

Yeah, sadly those comparisons are mostly feature driven. Coming from riot.js I have to say that while I was very happy with the features riot.js offered, I am kinda unhappy how the development is done lately (discussing the same issues over and over again, but nothing changes).

So while I have no idea about the status of vue.js regarding that topic, I belief that even if they are just in a normal state it is a huge positive aspect in comparison to riot.js, but the whole issue might be a little subjective too.

Well, actually it is very simple when you understand the why SQLite performs differently than other database systems. The major difference is that 'normal' database systems have a process which receives and processes SQL queries and has the exclusive access to the database file.

SQlite on the other hand doesn't have such a process so, every client has to open the database file itself, process the SQL and close/write the file again.

So the biggest difference is, that if you want to use SQLite, you will have to limit your database operation to one or fewer ;-) processes to not get into a situation where one process has to wait until the other finished writing the database file to the filesystem just to read it again in the next moment. Otherwise your filesystem and the continuing reading and writing of the database file will make the whole thing very slow.

But as long as you have only one process using the SQLite file and keep it open for a while you will see a comparable performance to other database systems. Actually, SQLite can be very fast if you do use it in a sane way.

As far as I understand, with WAL, SQLite supports multiple readers plus one writer on a database. And multiple concurrent readers were always supported.

So the limiting factor is not the degree of concurrency, but the amount of write-activity. Workloads that mostly read and only occasionally write to the database work well with SQLite.

SQLite is fine with multiple processes as long as they are read mostly. And if you switch journal to WAL mode, it is as good any multiprocess DB without MVCC (which, depending on access patterns might be as good as it gets - although mist patterns do benefit immensely from MVCC)

I'm not sure what you mean by "multi-process DB", but there are plenty of workloads in which SQLite under-performs non-MVCC databases. It is simply not designed for highly concurrent access patterns. In a previous job, I had to replace the persistence layer for a medium-load (100-500 events/sec) event tracking system from SQLite to SQL Server which is, by default, non-MVCC. The latter was considerably more complicated and considerably faster and more scalable than the former.

I say this not because SQLite is not good. It's _great_ for what it intends to do which is, as the linked page says, offer an excellent alternative to fopen(). It doesn't really offer an alternative to a full DBMS, though. There may be some gray area between a "heavy fopen()" workload and a "light DBMS" workload where there might be a legitimate case to be made for SQLite over a DBMS, but standard OLTP or OLAP workloads with any degree of concurrency are almost uncertainly outside of SQLite's intended use case.

There was a recent conversation on a similar topic with input from Richard Hipp, SQLite's author, here: https://news.ycombinator.com/item?id=15303662

Not really.

In a usual quassel (irc bouncer that can, optionally, store logs in SQLite) setup, you’ll have between 1 and 10 writers, constantly writing hundreds of IRC messages a second to SQLite, while you’ll also have readers that need to respond with low latency for queries loading hundredthousands of messages, including those that were just written.

The result is that SQLite gets so slow that the writers queue up so much work that the IRC connection times out.

SQLite is absolutely useless for multiple writers.

> SQLite is absolutely useless for multiple writers.

I don't really quite agree with this, but would say that with SQLite it's more important be be aware of how concurrency is being used with regards to the database, in comparison to something like MySQL or PostgreSQL. As far as IRC logs, I think the important thing (that would almost certainly hamstring any SQL database that's actually being ACID) is to not use autocommit on inserts, and to not use a transaction for every single insert (which is possibly what's happening). I wrote a plugin for ZNC to do exactly what you're doing above, and by batching my writes (either waiting 15 seconds or until n (where n is maybe 1000 or more) messages are queued), I've gotten superb performance out of SQLite3 running with a WAL journal, with the trade off being a little bit of data might be lost if your bouncer happens to fall down in an unhandled way. I was also able to get really good read performance without hurting raw write performance by leveraging partial indices, which something like MySQL doesn't give you.

Well, with PostgreSQL everything works just fine. Read and write performance is amazing, and full text search returns instantaneous results.

With SQLite, it’s basically useless.

It really depends... I've worked on systems that PostgreSQL (or any SQL rdbms) wouldn't be able to keep up with. I've seen many others that would do fine with SQLite. It depends.

In the end, it's a matter of testing. Also, with faster drive options (Optane and whatever is next), it could very well be a decent option. It all depends.

Frankly, for logging, I'd lean towards elasticsearch (ELK) these days, which is about the most write heavy situation you're likely to see. Depending on the scaling Mongo, RethinkDB, ElasticSearch and Cassandra can all be better solutions than RDBMS. I'd still reach for SQLite or PostgreSQL first depending on the situation though.

Sure, with optane it may work — but the usual usecase for SQLite is on the slowest SD cards or spinning HDDs you can imagine, where the user didn't want to spend time configuring postgres

The key thing you omitted is you can have multiple readers at once. As long as you don't have really slow reads or writes and don't have a tonne of writes trying to queue up, nobody should get starved with database level locking.

You’re right, SQLite can be very fast if you know what you’re doing. Expensify was able to achieve 4M queries per second.


One of my favorite points about SQLite which hasn't been addressed yet in the comment thread is that its databases are just single flat files. Need to copy data from a server to a local dev machine? No need to log into the server, make a dump, copy the dump file to the local machine, and import the dump. Just copy the database file as you would any other file. That's it.

Making a backup before trying something risky (and then restoring the backup when something screws up) is similarly trivial.

Maybe to some of you it's not worth that much, but it's a legitimate timesaver in my opinion.

That's why it is sql "lite". It's great for certain things because it's generally lightweight and manageable.

But you wouldn't want a single file format on enterprise level systems where enterprise level performance and stability is required. At the very least, you'd want your data and transaction log files to be different files on different disks. Especially when IO throughput is important.

Also, since DBs tend to have important data on them, being able to copy a file (db) and move it anywhere you want isn't much of a selling point on serious systems.

Having said that, sqlite is great for application/client side storage.

This is the main reason I choose SQLite for simple stuff and experiments. Just being a file also makes it really easy to come back to a half finished project or pick up from another machine with a simple rsync.

Try https://github.com/tinspin/rupy/wiki/Storage it stores each value (not column/row/table or database) as a single JSON file! Make sure you "mkfs.ext4 -T small"! ;) Upside is tar+gzip compresses the db 100x!

Sadly, in the browser SQLite has been kicked to the curb, replaced by IndexedDB.

It's deeply disappointing to not be able to use SQL on both client and server. Yes, there's Lovefield, which employs its own DSL that has nothing to do with SQL, so you can scrap everything on the server and redo it on the client.

Or just give up and go with NoSQL on client and server, that seems to be what the browser vendors are implicitly pushing developers toward.

> Sadly, in the browser SQLite has been kicked to the curb, replaced by IndexedDB.

SQLite hadn't been kicked to the curb, the WebSQL API whose “standard” was essentially “act like version x of SQLite” was kicked to the curb.

The concept of WebSQL was good, but the standard itself wasn't great, and the absence of any intent to implement from other vendors or intent to improve the standard from those who had embraced it made it a dead-end standard.

> WebSQL API whose “standard” was essentially “act like version x of SQLite” was kicked to the curb.

Nobody said what was so wrong about that.

A standard needs to have multiple open source implementations to be accepted. What you're asking is basically to reimplement sqlite, feature for feature, bug for bug, and keep up with it as soon as it changes. No wonder no one wanted to try that.

What about JSON? That became a standard without multiple implementations or even any discussion about what people would’ve wanted, like comments.

I just don’t think there’s any rule that says that we need multiple implementations of thing to make it a standard. Also, we already have a standard for SQL and they could’ve just used the parts of SQLite that were compliant with that.

JSON became an official standard in 2013, by which time many open-source, independent implementations existed.

> I just don’t think there’s any rule that says that we need multiple implementations of thing to make it a standard

There are rules that say that we need multiple implementations of thing to make it a standard:

* The IETF requires it to be accepted as "Internet standard", ie the highest form: https://en.wikipedia.org/wiki/Internet_Standard#Standardizat...

* The W3C strongly desires that proposed standards have multiple, independent implementations to advance to the state of "Proposed reccomendation": https://www.w3.org/2005/10/Process-20051014/tr.html#cfr

> Also, we already have a standard for SQL and they could’ve just used the parts of SQLite that were compliant with that.

That would probably have been the best solution, I agree. And then people would have complained about bloat again, that browsers are now required to expose a SQL engine, that they are too big, etc...

I know that you need multiple implementation of a proposed standard to make it a standard. What I was trying to (and failed to) convey is that you don’t need multiple implementations of a thing, as in SQLite itself, to let that become the standard because it’s mostly already compliant with SQL, the standard.

Anyway, there are multiple implementations of SQLite itself. It’s been implemented in JavaScript and .NET and Java as well I believe.

> Also, we already have a standard for SQL

I know right, there are in fact several (which may have been part of the problem). Most SQL database vendors tend to stray slightly (or not so slightly) from the SQL standard that they target; SQLite is no exception.

Just unfortunate that the browser vendors couldn't come together and build a client-side SQL based API rather than punting and implementing a very basic key/value store on top of which someone will eventually reimplement SQLite. It's like 5 or 10 years from now we'll have what we had 10+ years ago, such progress!

they said. it is not a standard.

Yeah, Mozilla killed it out of concerns that there was only one implementation (which admittedly has some quirks)...so the question was, do we standardize SQLite and all its quirks? That didn't seem like a good option to them. Plus this is the heyday of nosql fanboi-ism, so you have all the sheep at mozilla clamoring for NoSQL in the browser. "Devs don't want SQL" Mozilla said.

It's hard to say which reason was more compelling for Mozilla, but my gut feeling is that they were betting hard on NoSQL and came up with convenient technical justifications afterwards.

> Sadly, in the browser SQLite has been kicked to the curb, replaced by IndexedDB

Are you thinking of WebSQL? I don't remember a SQLite interface client-side in any browser. I might be wrong, but I thought IndexedDB was originally implemented on top of SQLite in both Chrome and Firefox. I think Chrome moved to a different backing store, but that has no bearing on the API.

SQLite has been ported to the browser via Emscripten. I’ve seen it used in a couple niche projects (generally those that revolve around the SQL language), but I don’t remember it being a standard or even a primary use of SQLite.

[0]: https://github.com/kripken/sql.js/

Sure, but that's got to be a massive download. Probably not really an option for most sites.

I'm aware of it, nice project, but 2.6MB + in memory only storage, no thanks.

It shouldn't be super difficult to backup/restore your db from localstorage using a dataURL.


At that point, why wouldn't you just look for some SQL to LocalStorage API/shim, since it's bound to be faster than (and likely more stable with regard to page resets than) putting a full SQL engine in place that just serializes to a LocalStorage object anyway?

Are you thinking of WebSQL? I don't remember a SQLite interface client-side in any browser.

Well, yes and no. The whole problem with WebSQL was that it effectively meant using SQLite and never anything else that wasn't completely bug compatible.

So the whole problem with WebSQL was that it was, de facto, a web standard built around giving an SQLite interface client-side.

> Are you thinking of WebSQL?

Yes, exactly, but Web SQL uses SQLite as the backing store. IndexedDB is its own thing, a standalone key/value store, nothing to do with SQLite.

IndexedDB may be backed by SQLite. I think it likely is in Firefox?

> so you can scrap everything on the server and redo it on the client.

Were you writing the same queries on the client and the server? Generally client and server are different moving parts of the same machine, so they don't really share a lot of code anyway. (Perhaps in some cases computations could optionally run on client or server depending on the circumstances, but security concerns limit the opportunities to do that.)

Yes, it's a pain to have to know two different technologies, but it doesn't seem like this often leads to literally needing to scrap code and rewrite it.

> Were you writing the same queries on the client and the server?

Well, yeah, that's the idea, isn't it? These days every language under the sun compiles to javascript, why wouldn't you want shared model, validation, queries, etc.?

> Yes, it's a pain to have to know two different technologies, but it doesn't seem like this often leads to literally needing to scrap code and rewrite it.

Currently working in an Angular 5 + TypeScript frontend with Scala on the backend, a shared nothing architecture, really blows compared to a single typed shared everything stack.

All the stats based queries run on the server? Throw 'em out and manually recalculate on the client, probably literally 100X the amount of code; less efficient and error prone to boot. Same for form validation, model layer (hey, why does this new property exist client-side and not on the server?), etc. -- it's a house of cards.

Not the greatest solution, but you should be able to compile to asm.js or WebAssembly.

I find this page to be somewhat vague. For example, any data that revolves around dates or time may require some addition hoops to jump through since, as https://www.sqlite.org/datatype3.html points out, "SQLite does not have a storage class set aside for storing dates and/or times."

As much as I love SQLite, I find this little issue always adding an extra bit of work to deal with. Not a blocker, but useful to know. I've found that in some (rare) cases, H2 became a better solution (horrendous start time, slow data load, but ability to use real date and time types reduced errors and improved analyses).

Finally, knowing that SQLite is not right for a situation is great, but it wouldn't hurt for them to mention a potential alternative, if the devs or community know of one.

The lack of a datetime data-type hasn't been a problem in my experience. I have, at different times, stored the timestamp as an integer representing number of microseconds since the epoch, or stored in YYYY-mm-dd HH:MM:SS (which sorts lexicographically so comparisons come out right). SQLite has some built-in facilities for working with datetimes as well, along the lines of strftime, etc.

H2, being java, is in a whole different universe from SQLite...not a valid comparison in my view.

...in UTC. Right? RIGHT?!

Not going to lie, after reading that I quickly went to check that "Date.now()" in javascript returned UTC.

For others who didn't know about H2, it's apparently an embeddable SQL database, like SQLite, but in Java: https://github.com/h2database/h2database

It's dual-licensed under the Mozilla Public License and the Eclipse Public License.

It grew up arm in arm with Hibernate, one of the better ORM wrappers. Fast enough for local use that you could do functional tests against your domain objects without using mocks, or crying.

This may not be helpful to you, but I've always used the epoch time stored in sqlite. It's blazing fast, indexes well(integers !), and is 100% convertible to any format you might need to represent.

Make sure you're using signed integers larger than 32 bits if you ever need dates before ~1970 or after ~2030.

SQLite's INTEGER type is 64 bits, and its REAL type is double-precision (64-bit) floating point. Either type is quite sufficient for representing a time. :)

Not as lightweight as sqllite. But I always wonder why firebird embedded-able server[0] does not get more love? The 32-bit so library just under 7MB and then you have a full rdbms available to your application.

[0] https://www.firebirdsql.org/pdfmanual/html/fbmetasecur-embed...

Well, for one, SQLite's 32-bit dll is only 858KiB :)

Feature richness comparable to oracle, mysql, postgres, comes at a price.

Another perspective; embeddable py 10MB, electron app 50MB[0], free pascal exe 1.5MB.

So 7MB for a RDBMS seems reasonable, but not comparable to SQLite as you pointed out.

[0]: https://github.com/electron/electron/issues/2003#issuecommen...

Poorly advertised? With a lack of marketing/funding I can understand it though. But as for me, I had heard of Firebird but not the embeddable, and of course of SQLite everywhere. Maybe it's also in part because the OSS space has limited room for this niche and SQLite kind of happened to grab the largest share.

That's a valid observation, but the status of sqlite as a couple of ansi c file you can compile with or link to any damn thing is really what we're talking about here. And H2, awesome as it is, is no comparison.

I don’t know if they want to be in the business of recommending alternatives. Doing so has the unfortunate tendency of making people think it’s your problem if you use them and they don’t work out.

That bite me more than once with Sqlite. I inherited small webapp with extremely messed up datetimes because Sqlite doesn't defend its data in any way. Same column, three different formats. Ouch. It's also a problem with the original developer but the database should not tolerate such mistakes.

One of my (many) tasks at my current position is to maintain a SQL Server database where a lot of the date times are stored as raw strings in various different formats.

If you're determined enough you can turn any datastore into a mess, though I suppose SQLite gives you less tools.

Though I would agree that the option to have stricter type checking in SQLite would be a good thing, I don't think your criticism is fair in this case, since it sounds like a case of poor discipline on behalf of the coder(s).

I wish SQLite was used for data exports from web-sites instead of CSV files. It would work better especially when the exported data is a some kind of star schema. I wrote about it here: http://bi-review.blogspot.com/2017/04/websites-should-offer-...

That will happen as soon as Excel implements an SQLite import feature.

> Dynamic content uses about 200 SQL statements per webpage.

Genuinely curious, what kind of page would require 200 queries? I've written plenty of moderately-complex applications and don't think I've come close to 10% of that on a single (non-crazy dashboard) page.

A Wordpress page with 12 posts will do 60 queries - shamusyoung.com/twentysidedtale/?p=41607

I should imagine something running on a generic ORM (ie without the optimised queries) would use considerably more (Rails, IIRC, will often do two queries to load a single object.)

Click on the linked site (or: https://sqlite.org/np1queryprob.html ), the sqlite project explains it

Ah, thank you. Explains it perfectly.

It doesn't say "require". Possibly they are selecting the same information over and over again, instead of passing it along the different functions, because it is fast enough.

Even so, querying data once and reusing it is usually the more maintainable option and probably means less code, too. It is not only about performance.

(And just to be clear, for my use cases, I have been very happy with SQLite's performance.)

This requires an information architecture and these days those seem to be harder to come by. I think we may be favoring velocity over Stop and Think time a bit too much.

(If you’re thinking, “no dummy, just use caching” then I regret to inform you that cache invalidation and eviction without an info architecture are as hard if not harder. And caching bugs make you look like a clown in front of your bosses. Because the whack-a-mole process never looks professional)

Worst I've seen is a CMS site which, with caching disabled, used upwards of 40,000 queries to load the homepage.

When the app is being dumb, people put caching in to hide the data flow problems. When the data flow problems are hidden, they multiply, and then turning it off again is highly improbable.

Caching has sucked all the fun out of too many projects I’ve been associated with.

Do cache headers and cache busting first. You can always put a transparent cache in front of your REST endpoints in prod. After debugging the browser caching. Don’t write your own caching implementation. Please.

Please tell me you held down the zero key accidentally ;)

Its common for people who don't know the Django ORM well to use it in a way that produces a lot of queries.

Unoptimized Drupal sites of moderate complexity with caches disabled (a use case only intended for development workflows, but it's of course quite simple to run a live site under such situations) can easily invoke 200-400 queries per page load.

Let me introduce you to the Rails ORM...

Bulk loading a CSV or similar into SQLite for analysis with SQL statements is an incredibly productive technique. With a few lines of code you can load any log file, load and merge multiple logs by timestamp, whatever you like

The amazing but criminally under-appreciated 'lnav' (https://lnav.org) facilitates exactly this approach. SQLite under the hood.

Oh, I hope there is a Windows-compatible way to use this (i.e.: in a .NET dev environment), if not now, then in the future.

Would probably work under the Windows Subsystem for Linux (WSL), in Win10, or via Docker for Windows.

I've done similar with node.js streams pretty easily.

And automate it easily.

I use SQLite for logging data from a bunch of distributed sensors. The server (RPi) sends a measurement request over MQTT, the sensors reply and the server logs each response grouped by the request timestamp. Seems to work very well for 25 sensors on the same WLAN and for continuous monitoring (about 100 measurements per minute).

This can then be displayed to the user via a web server which has read-only access to the measurement db.

I run a complex CRM that handles 1.2 million USD in transactions every year. It is running sqlite with a simple in-memory cache (just a dict) that gets purged when a mutating query (INSERT, UPDATE or DELETE) is executed. It is simple and fast enough.

Like the other commenter mentioned, I'd love to hear more about this too. How are you doing backups and deployment?

Tell us more! How many transactions per second do you process? Do you use multiple databases?

If you assume each of these transactions is a dollar, you get a bit under 0.04 transactions/sec on average. If peak load is 25x average, that would come out to 1/sec.

is this single threaded? single client?

I may have gone a bit beyond what SQLite is useful for, but I just wanted to try this. I built an experimental application-level file system (so not FUSE-like, you can't mount this, it's just within an application).


Haven't had much time to finish it, but it's been fun to play with.

> SQLite works great as the database engine for most low to medium traffic websites (which is to say, most websites).

I appreciate that there's just a hint of snark there. YAGNI, after all, is one of the most often-forgotten important principles in this industry.

I've recently used SQLite to store some metadata for a FUSE filesystem that pretends all your data is available but that just keeps a local cache and fetches it from the server on demand[1]. Initially performance was atrocious because INSERT is really slow if you don't batch several into transactions. I've worked around it with some application caching but there does seem to be a large gap between what the postgres engine can do and what sqlite can do. I definitely don't want to depend on a postgres setup for this application but it would be nice to have an SQL database for app use that had a bit more write performance. It seems that when you want that you end up going to LevelDB or LMDB and losing the niceness of an SQL interface.

[1] https://github.com/pedrocr/syncer

A lesson I've learned the hard way, from using Sqlite in many apps I've written, is that if you need any kind of insert performance you can't abstract away the transaction model, and retrofitting transactions onto an application which previously ignored them is not easy either. You really have to design for it up front.

Mainly this is because there is at most one, global transaction per Sqlite database (in other words you can't nest transactions), but also because the optimum window to gather up inserts for a bulk transaction might not match up well with the logical function call structure of you program. I recently learned about the Sqlite SAVEPOINT feature; this might help implement nested-transactions, but cannot help with the second problem.

For example, what if you made a "logging.db" file for a GUI application, and you're grouping writes to a log table in batches of 50 for each INSERT transaction. What happens if your GUI app gets to 48 inserts but then stops receiving user interaction events? You'd probably still want those 48 inserts to show up (eventually) in the logging.db rather than just being "lost" if the app subsequently crashes. So what do you do? Register a timer event to create a watchdog to flush the transaction if enough time goes by? OTOH what if a subroutine of your app is actually at that moment performing a lengthy "real" (logical) transaction and wouldn't want it flushed? Or maybe the application should use different database files for logging versus data where consistency is vital (so that a database can be opened either in "accumulate INSERTs" mode or "real logical transactions" mode)?

My point is not the detail of how to answer those questions; it is just to highlight that these are questions which have to be thought about at a whole-application architecture level.

Sadly the lack of a decimal type complicates monetary calculations sufficiently for me to stay with postgresql even for small projects that could easily be done with SQLite.

Store your monetary values as cents instead of dollars and use integers. Convert to dollars in the presentation layer. My assumption here is that you're dealing in US currency but the same idea applies to any currency; use integers, not decimals.

Say you have a savings account with an interest rate of 0.125% and any interest has to be converted into another currency for taxation purposes at a rate of 0.88619.

Working around these sorts of problems with integers is just not worth the trouble.

And yet the proper way to handle monetary values has always been to operate on integer values with a predetermined precision.


I can understand the convenience if you are setting up a simple online store, but more complicated cases (like the one in your example) absolutely need a proper handling of decimal values.

Can you explain to me what particular cases postgresql's decimal type cannot handle properly _and_ conveniently?

> Can you explain to me what particular cases postgresql's decimal type cannot handle properly _and_ conveniently?

I entirely agree that numeric is a good solution for this case.

But to try to answer your question nevertheless: I'd say there is potential for storage density and math performance issues. You really need to have a number of numerics for each row, and a lot of rows to make storage density matter. If you do a lot of aggregates over numerics you definitely can notice the CPU overhead of them over say a plain float. But most of the time that doesn't matter.

I agree completely.

Yes it is. You convert to a big decimal representation, compute the tax/interest and then intentionally round back to an integer format using the appropriate rounding mode.

GP didn't say it's impossible, he said it's not worth the trouble which is an opinion statement that can't be countered with "yes it is".

Yes, it can. The case and point is that you need to control where those moments of imprecision happen, and using fixed point a fixed point (often integer/long) representation is absolutely worth the effort because it's not much effort at all and you DEFINITELY care about when your money calculations involved rounding.

And why would I not prefer to get help from a database system to do that?

If that's the only reason to use a "real" database, then what you have is a 100MB+ multi-process (and potentially costly) library to perform a few calculations.

(I don't disagree with you, just answering your question)

Of course there are many other considerations, but there is a class of small finance related multi-tenancy systems for which monetary calculations are key and either a database server or a database library can potentially make sense.

Resource consumption isn't an issue at all. 100MB+ is less than loading an average news website in a browser and it costs almost nothing. The reason why I would have wanted to use SQLite sometimes is that it makes it easy to distribute and run the entire app on premises if needed. People are rightly concerned about losing access to web based software.

There are also also many use cases where you don't want to "store" something at all, just compute the value (eg: a shopping cart that is not yet a purchase), and you don't want to to either 1. duplicate implementations or 2. wait for a network request to your database to complete.

You could just store decimal strings and use application-language decimal functions to operate on them, even in queries, since it's fairly trivial to use application-language functions as SQLite (scalar or aggregate) database functions, at least in most decent SQLite drivers I've seen. It's one of the advantages of an in-process database.

Strings don't work well with numeric aggregate SQL functions like SUM or MAX. Also, won't work well with comparisons or sorting: "111.00" is less than "9.99" string-wise (unless you store all numerics with a bunch of leading "0"; but then that takes a lot more space).

Interesting. I'll look into it next time it comes up. I wonder how it interacts with indexes and how it affects performance generally.

Why wouldn't you just store the value in cents? And divide by 100 to display it.

As I said in my reply to asaph, it doesn't work well with interest rates and exchange rates. Also, it simply adds to the mental load in many places (e.g ad hoc queries)

And floats do? With all the (implicit) rounding and precision issues, and some base-10 numbers not being representable in an exact form?

Serious question, what's the proper way to handle this case? I'm having hard time to believing it's IEEE-754 floats.

The proper way to handle this is fixed-point representation (whether the DB natively supports this or not). E.g., you won't find floating point fields in binary protocols for any of the major stock exchanges. A typical scaling factor is 10^8, so if the exchange wants to tell you that the price is now 101.27, they send you an int64 value of 10127000000. You may then choose to take it further and store e.g. in 128-bit decimal.

Of course, there's an issue of how to handle mul/div (i.e., rounding/truncation), but that's a separate question -- at least you'd be making decisions yourself rather than relying on how IEEE-754 floats work.

Some DBs like PG, do, as a matter of fact, support decimals natively which makes things quite a bit simpler.

Certainly for decimalized exchanges, binary floating point is the wrong choice. (On a side note, it's also the wrong choice for spread sheets, at least as the default. The average freshly graduated accountant can have a reasonable grasp on decimal floating point rounding after a 5 minute explanation, whereas most professional software engineers with a decade of experience have a very weak understanding of IEEE 754 binary floating point.)

> E.g., you won't find floating point fields in binary protocols for any of the major stock exchanges.

That's true of NASDAQ's OUCH, but some (many?) other protocols use decimal floating point.

NYSE's UTP Direct[0] (See section 4.15) Is not fixed point, but actually their own decimal floating point system with a 32-bit significand and single ASCII character exponent ("0" to "6", indicating negated base 10 exponent, not allowing "5").

FIX SBE uses an 8 byte significand and a 1 byte (power-of-10) exponent, if I remember correctly.

It's a bit unfortunate that (as far as I know) there aren't any major messaging formats supporting IEEE 754-2008 decimal64 values.


OP mentioned a decimal data type, which in postgres can have a very large precision: https://www.postgresql.org/docs/10/static/datatype-numeric.h.... Not the same as a floating point.

16383 digits ought to be enough for everyone:

"up to 131072 digits before the decimal point; up to 16383 digits after the decimal point"

Non jokingly: I think precision wise that's really plenty enough for pretty much any halfway realistic usecase (no, don't do public key crypto using SQL datatypes). There's some issues with the numeric / decimal type, but lack of potential precision isn't one of them.

No, binary floating point doesn't work at all. Decimal floating point can work if the representation is large enough.

Arbitrary-precision numbers. DBs like Postgres support them, as do most languages (e.g. Ruby has BigDecimal, Python has Decimal).

Neither `BigDecimal` in Ruby or `Decimal` in Python are quite what I'd expect for "arbitrary-precision".

From the docs for [`BigDecimal`](http://ruby-doc.org/stdlib-1.9.3/libdoc/bigdecimal/rdoc/BigD...):

> Ruby provides built-in support for arbitrary precision integer arithmetic. For example: > > 4213 -> 1265437718438866624512 > > BigDecimal provides similar support for very large or very accurate floating point numbers.

My takeaway from the above is that, if arbitrary precision is needed, you should still just use integers. Tho, in practice, it may not be significant.

Integers are not magic. If you want arbitrary precision on top of integers you're going to need arbitrarily long integers as well.

I was referring to Ruby and based on the first sentence I quoted:

> Ruby provides built-in support for arbitrary precision integer arithmetic.

[I'm also assuming that it's really arbitrary precision and not just 'really big but limited' precision.]

Serious monetary work often needs values smaller than cents. Exact decimals or, better, exact fractions are often what you want.

What is the typical use-case of a unit smaller than the smallest tradable currency unit? Mind you, I understand there are smaller units than the circulated currency (For the USD you have a mill which is 1/10 of a cent).

Bitcoin currency trades? Large fractions. Convert to/from yuan/yen, large whole numbers, back to US, somewhere in the middle.

Tracking transactional values for large stock and currency trades would be an example. That said, could use integers to represent arbitrary precision.

The computer has only integers and floats for purposes of calculation (except for some IBM units implementing IEEE fixed point). In storage though, there are only whole numbers to which we assign meaning.

Store your as a single integer then write a small layer that converts that integer into decimal (that's all a language or DB does anyway). Alternatively, use two entries in the database for each side of the decimal if you want even more precision or store it as a string if you want arbitrary precision.

>(that's all a language or DB does anyway)

Well, that's my point. Postgres does it for me. SQLite doesn't. I'm lazy.

Also, SQLite's largest integer type is 64 bits. That's awfully tight for some use cases.

I'm curious how 64-bits would be tight for monetary data. A 64-bit integer is +/- 9e18, which is so big I don't even know how to actually say that number. Given that economies are only in the trillions, you have at least three orders of magnitude left when representing the largest amounts of money on the planet. Anything smaller would be a piece of cake.

Also, seems like the obvious thing to do to store money in an integer would be to store it in pennies. Or if the numbers get too big, do what corporate financial reports do and use thousands.

Pennies (2 decimals) is not fine enough. Exchanges and other financial services use 4, 6, or maybe even 8 decimals. With 6 decimals, a few trillion is the limit.

If you implement multiplication and division with integer arithmetic, you need the additional decimals to keep the error small. See https://en.wikipedia.org/wiki/Salami_slicing

The largest Zimbabwean banknote during hyperinflation was Z$100,000,000,000,000 or 1E14. I assume they printed more than 10,000 of those.

I recall something about Oracle sending people down there to help with the bank or perhaps government databases deal with those numbers. I can't find confirmation of that, but https://en.wikipedia.org/wiki/Hyperinflation_in_Zimbabwe says "Computers could not handle the amount of zeros such that other forms of money had to be used to act as normal money (bearer’s cheques)."

64 bits can be tight if you have to set aside, say, 7 digits for decimal places in order to perform a lot of divisions/multiplications by small numbers, like exchange rates, without rounding errors adding up.

But if you sacrifice 7 places you can no longer represent the market cap of Toyota Motor at the Tokyo stock exchange (24 trillion Yen) or the US GDP (~19 trillion USD).

So if you don't have a data type that can represent all of these quantities as well as the results of any calculations up to the required precision then your code will become a lot more complex and error prone.

import decimal; decimal.Decimal('3.14159')

A few words of explanation wouldn't hurt. Are you saying that I shouldn't ever do monetary calculations in the database at all? If so then I have to disagree.

No, they are giving ways to do it outside of the db. You really answered your own bias on this. You want the db to handle it, so you wouldn't use sqlite in the first place.

Could look at Firebird if you want something embedded still. Can even scale to use a server mode. I wrote a distributed system about a decade and a half ago using Firebird, worked pretty well.

I've used sqlite as an index for S3 objects ..and stored the DB file in S3 itself. It was very practical to use, and a handy pattern to apply to object storage where the number of objects to be indexed for a given time frame was reasonable for sqlite (I had in 100k range).

Also - easy to dump the sqlite db into a SQL text file then import into something like postgres :)

I love SQLite and am currently working on a project that makes extensive use of it. In the book "The Definitive Guide to SQLite" it says that SQLite can scale to 2TB of data Looking at the website it says

  An SQLite database is limited in size to 140 terabytes (247 bytes, 128 tibibytes).

That's ludicrous. I wonder if anyone has come close to that in day to day use.

Almost certainly not seeing as the single hw largest device is currently 12TB. Someone would have to pool together 10+ (and more for redundancy) just to store the single file.

I have heard of people running ZFS pools that were hundreds of TB in size. Not sure if they were using those for huge SQLite databases. ;-)

I presume "247 bytes" is supposed to be "2^47 bytes".

What would be a bottleneck / limit for SQLite? If you have to many SELECT query because your site is too busy, what would happen?

SQLite can probably handle a lot of SELECT queries at once, but only one process can be writing to the database at once.

I think it's a hard question to answer, "what's the upper limit on SQLite?"

Better question would probably be, "We have this app, what database should I use?"

I use SQLite for a lot of smaller apps. Applications that have a focus on single-business, low-volume use.

They might do a good amount of computation and data-fetching, but they don't make heavy use of the database.

If you need features Postgres has, use that. If you need features MS SQL has, use that... if you don't need anything special, think about using SQLite :)

I've benchmarked SQLite on my laptop at just north of 20,000 queries (SELECTs) per second for the workload I was interested in at the time.

However, the limit depends heavily on your workload. If your working set (the data that all the simultaneous queries require) fits in memory then you'll be limited by memory bandwidth or the language you're calling from: GC stalls and query preparation can really hammer the throughput; if you're doing any actual work with the results of the query then I'd hope that time spent there would dominate the data access.

If your data doesn't fit in memory then you'll be limited by the speed of your disk system. If you have a fixed working set then you can increase the performance by adding RAM until the working set fits again.

If you have a streaming workload then you'll never be able to fit it in RAM (by definition) so you'll be limited by the speed of the IO systems involved; either network or disk.

If things are still too busy then stuff will start to queue up. This will cause each individual process to run slower as it'll be contending with the others. This, in turn will cause more incoming work to queue up. At some point the machine will run out of some other resource (memory for incoming connections, network sockets, space in queues, etc). How the system responds to that depends on how it's configured but, traditionally, your system is no longer providing service.

While not strictly related to computation, sqlite is a non-starter for deploying to PaaS without a persistent file system.

"SQLite on Heroku" [1], essentially: don't do it, switch to postgres.

I suppose it would work with AWS Beanstalk and EBS.

[1] https://devcenter.heroku.com/articles/sqlite3

If you're already in that environment RDS is probably a safer bet though.

SQLite only allows one writer at a time, so depending on your access pattern that might be an issue, but if your site is getting a million requests a day then your only processing an average of 11.5 requests per second, and what percentage of them are writes? Front your app with varnish, nginx (w/cache), or cloudflare and absorb some additional load. Though as far as I can tell www.sqlite.com is running on a single linode instance with no fancy hacks.

Could anyone using SQLite FTS share their opinion/story? I'm especially interested in indexing hundreds of gigabytes.

Every once in a while I have sqlite eat it, and end up with a corrupt db file. About as often as mysql really. I wish I had some more serious automatic recovery in sqlite. I have to have a boot script check the db, export everything to an sql file if the db is corrupt then create a new file.

That does not mirror my experience, I hate to be the guy suggesting this but have you considered the fact that you're using it wrong?

What I mean by that is that SQLite really is a replacement for open() and as such is not multi-thread compatible. If you're doing a lot of work that's corrupting you can also turn up the level of binary logging until recovery becomes easy enough.

I'm using SQLite effectively on a few thousand multi-threaded windows machines and I've encountered corruption exactly once- which was when a hardware raid had degraded non-gracefully.

My understanding is that using SQLite is completely fine in a multithreaded architecture, but you have to ensure that the simultaneous database operations match the correct threading mode:


I also believe that the common way of gaining performance while protecting safety is to use multi-thread mode with connection pooling. The idea being that your application can use different database connections to achieve multiple writes without relying on the overhead of additional locking required within SQLite itself for serialized mode.

Would be interesting if anybody has performance metrics that backs up what I said with data converging on an optimum number of database connections.

that's exactly the case; make sure only one thread writes, but how you feed data to that thread is up to you.

Oh I guarantee I am doing something wrong. This is running on some crappy sandisk sd cards as well. So it is entirely possible that the corruption I run into is just an issue with the sd card.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact