Hacker News new | past | comments | ask | show | jobs | submit login

Can anyone share some cool applications or bigger production systems built with sqlite? I love virtually everything I read about it, but I have never encountered a single use case where I am not better served by storing the data in some flat file (parquet, csv, etc.) and running a pandas program or simple Spark program to analyze it (locally on my laptop, or through a managed service like Dataproc or Databricks - it’s exceedingly cheap for any data size conceivably processable with sqlite).

I am asking sincerely, what is the comparative advantage of sqlite or some example scenarios or trade off circumstances where sqlite is a comparatively more effective solution?




Most flat-file formats I'm aware of can't be updated in place. SQLite databases can.

SQLite is used throughout Chrome and Firefox as a storage format, e.g. for browser history, bookmarks, cookies, etc. It's also commonly used as a storage format on macOS, iOS, and Android.


Once in memory, the flatfile’s data can be mutated in place, and creating the equivalent of a WAL-like system or oplog to protect against sudden failure & loss of in-memory data is a pretty easy part to solve.


> ...creating the equivalent of a WAL-like system or oplog to protect against sudden failure & loss of in-memory data is a pretty easy part to solve

That's a pretty bold claim. Judging by the number of applications (including "real" databases) which fail to get this right, I'd have to say it's probably harder than you think.

Besides, SQLite has done that work already, and has done so very thoroughly. I would definitely trust SQLite over something home-grown.


I've done the flatfile thing several times on various projects, and I'm starting to appreciate the wisdom of letting sqlite handle it. Things it can handle correctly include:

- crashes/power failures

- multiple processes opening the same file

- transactional updates: it rolls back if you throw an exception in the middle of writing the update. This seems like a common failure mode while under development

- scaling to huge size: many gigabytes are no problem. These get slow with a flatfile

- manual surgery on data: it comes with handy command line tools for manipulating its databases

- upgrading format: you can check a version number when opening the file and do "alter table add column" for any new features.


I feel like if I am worried about those issues, when would I ever use sqlite instead of Postgres?

Sqlite seems like it is only appropriate for the knife’s edge boundary between small data, low reliability, in-memory situations (better served by server applications using tools like pandas or R) and bigger data, transactional structure, reliability constraints (better served by Postgres).

I just can’t understand what use cases live in between them and are better served by sqlite.


There are literally billions of smartphones in the world that fall in between those exact constraints...

It's honestly bizarre to me to see someone claim that Sqlite has few use cases when there are likely hundreds of Sqlite files sitting in the phone in your pocket.


There’s probably files on their computer as well that use SQLite. Lightroom libraries are SQLite databases for example, which is a pretty leaky abstraction in their case


> I feel like if I am worried about those issues, when would I ever use sqlite instead of Postgres?

Sometimes you do not want to run an additional process/daemon like postgres. Your state is in essence now a file on some file system that you can atomically (full ACID) update using multi processing without the need for more complex machinery - you can get very far with this architecture.


For example, SQLite is excellent for making an email client. Even on old embedded Arm devices with 64MiB memory, it’s fast enough to handle a gigabyte of messages and an address book with thousands of entries, provided the SQLite DB is located on a good CF or SD card. SQLite is really quite remarkable for the number of occasions where it has been immensely valuable to me and countless others, on such a variety of platforms and in such a variety of circumstances.


For interactive use, sqlite queries are snappier than pandas at the ~4gb dataset size. I’ve switched over for some logfile analysis tasks and both memory usage and speed are a bit better.

They’re not massive differences but i definitely appreciate the snappiness. Interactive work is just nicer that way.

I could do postgres - the docker version is pretty handy i find but it’s just more faffing about (-v /path/to/wherever:/var/postgres password, cleanup when you’re done etc.)


For logfile analysis, take a look at https://lnav.org -- it has SQLite embedded, along with a bunch of helpful utils for ETL on a small scale.


I'll need to give that a whirl, one feature i like the look of from glancing the docs - https://docs.lnav.org/en/latest/hotkeys.html#chronological-n...

Sure beats regexing my way to the next minute marker or whatever.


If you're using pandas, try duck db


Perhaps you could answer the opposite question:

Why would I use flatfile (aside from the fact I've never heard of it) over SQLite? So far you've just said SQLite has some features you can't imagine I'd ever need, but even if that were true then it wouldn't be a problem in itself. What are things that are actually bad about SQLite compared to the alternative you're suggesting?


Possibly adding the complexity of SQL, or an ORM, in their use case is a layer of complexity beyond reading and writing serialized data structures to text.

Not sure why the sibling posters determined s/flatfile/flatfiles/ the most interesting part of your question.


You've never heard of using a file to store data in?


The question was what is the benefit of flatfile over SQLite. SQLite stores the database in a single file, so "it uses a file" is not an answer to that.


Flatfile means a plain text file, generally one you can process on a row-by-row basis. It sounds like you think there is some software called "flatfile", but it is a reasonably common term.

There are advantages over SQLite, eg: the ability to use unix command line tools against the file directly.

In my view SQLite has more advantages though.


Altering data with text editors is more or less the opposite of the integrity afforded by a database.

Of course principled command-line tools for text (e.g. awk) and the possibility of running arbitrary SQL queries on a database blur the line, but the distinction between files that are read or written once as a step in a transformation pipeline or as something to load or save and databases that represent a persistent state, with transactions, remains sharp.


My comment was that you seem to be under the impression that there is a piece of software or file format named 'flatfile'.


Oh. It would've been easier if you'd just said that in your first comment.

Looking back, you seem to be right. I thought mlthoughts2018 was referring to a specific file format, or at least a specific library because (1) They made up that weird terminology "flatfile" which looks like product name instead of just saying "plain text file" ... I wonder why they did that? (2) They made claims about transactional integrity with a WAL-like log. Now I realise they were suggesting you homebrew that anew for each fresh project that needs to store data! The idea that this would be easier than just using SQLite's existing mechanisms is so bizarre it didn't even enter my head they could've meant that.


Why on earth do I want the overhead and attack surface of a full, multi-user, networkable database with user-defined languages, extensions, and all of the other amazing things that PG can do, when all I need is a reliable, single-user structured data store?


Until you deploy this custom database of yours and customers start raising tickets reporting data loss on remote devices you can no longer access. Much of the value in SQLite is that it has achieved the level of clout that big players deploy it remotely to devices that are hard (or impossible) to debug without much second thought.


We did deploy it. This whole architecture was chosen in an adtech business I worked in before, supported by data platform teams, analytics teams and more. The RFC process was intense and the decision was vetted with seriously heavy effort.


Sounds like you can't have multiple processes concurrently access it while it's being updated, even if the updates are small and infrequent (you'd have to load the whole database into memory and write it back out again if I've understood your comment correctly?).

With SQLite you can have multiple processes access the same database and they can make changes to it that are immediately seen by the other processes. The nature of SQLite locks means this doesn't scale well if you have lots of processes all wanting to make heavy updates, but so long as you're not in that situation it works very well.


Apple use SQLite in a lot of places, both in macOS and in iOS, and both in their applications and in their frameworks.

For example, they use it in the Photos app on macOS and in the Photos app on iOS. Their Core Data framework is an abstraction that is built on top of SQLite, and is available for application developers on iOS, macOS, etc.

Specifically about the Photos apps on macOS and iOS though, I was underwhelmed by the performance when you deal with 100,000+ pictures and videos. I don’t know if the sort of performance issues I was seeing was tied to SQLite or not though. My own solution to this has been to move away from using the Photos app on macOS all together, and to store my photos and videos in directory hierarchies instead. And to move photos and videos off of my phone as well, into the directory hierarchies on my hard drive.


> I don’t know if the sort of performance issues I was seeing was tied to SQLite or not though

It's certainly doable to serve million-asset libraries with SQLite (PhotoStructure is proof of this).

You need to be careful with indexed queries and make sure you've set up a large enough RAM cache, but for PhotoStructure, almost all queries are kept under 10-50ms.


Performance is an odd metric to judge a database, not because its unimportant but because its mostly in the developer's court if they are configuring and accessing data in a performant way. If the data is poorly indexed or queried its going to be slow no matter what database you use. Odds are the photos app is not configured well for your edgecase of having hundreds of thousands of photos.

For the record I dislike SQLite - especially the cargo culting of SQLite by people who aren't even using it in production.


Apple tries to force you to use the Photos app on the Mac to store your photos. But I never trusted it. Maybe because I didn’t understand what data model it was using behind the scenes. They don’t advertise that part. Even though for the most part it works pretty well. In fact, for most daily uses, the Photos app works almost magically.

But once, after a few years of perfect usage, the program crashed. Then it started crashing more regularly. Then finally it completely died. Data corrupted. Completely unusable. Major system malfunction.

I lost all my photos and videos in the Photos app.

But because I was always paranoid of it, I had the original pictures backed up elsewhere. So data saved. But I never trusted the Photos app again.


I regularly use SQLite as the database for production apps.

User registrations, content posts, in-app purchase receipts etc. Up to hundreds of thousands of concurrent users it works just fine.

It saves me a tone of time and complexity from setup I don’t have to do for a dedicated database engine.

There is a good overview with general use cases here: https://sqlite.org/whentouse.html


I have one of the simplest toy services you can envision and am already running into a problem. Basically, I'm logging a temperature from a sensor to a SQLite db every minute or so (whenever it sends the reading). Sometimes I run a query manually thru the SQLite CLI to look into the data -- the query tabulates about 1.5 years' worth of data, by month and with averages, and can take about 20-30s as it's all running on an RPi3. As far as I can tell SQLite locks the entire table instead of just the records involved in the read, if one of my readings, new data to be inserted, happens to come in during me running the query I get "database locked" and the write fails.

Yes this is a heavy query that would not make it to a production system, still I am surprised that load placed on a sqlite db by "hundreds of thousands of concurrent users" would not surface problems due to this simple detail.


Have you tried journal_mode=wal?

https://sqlite.org/pragma.html#pragma_journal_mode

This lets sqlite read w/o a write lock (among other things)

Since it stores the log to an adjacent file you have to make sure the process can write to the whole directory containing the db.


Will give that a shot, thanks!


Please update us about this!


Would you share details about the architecture of such systems?

Specially on how to deal with concurrent writes on SQLite.


I avoid concurrent writes as much as possible.

With this approach, services are usually separated (e.g. microservices or a monolith with several db clients) and use several sqlite db files. For example, users.db, receipts.db, posts.db etc.

For data which is frequently accessed, it really helps to consider caching (e.g. HTTP).

I realise of course this won't scale infinitely so I usually make use of ORMs (e.g. GORM for Go/Gin or Diesel for Rust/Actix) just in case the SQL engine would need to change (so I don't have to rewrite queries etc). I haven't had the need to do so yet.


100k CCU on sqlite I just don't believe it.


I use it in a network video recorder. https://github.com/scottlamb/moonfire-nvr/blob/master/design...

Data in a flat file would be much less efficient, and a SQL server (eg PostgreSQL) would be more for the administrator to deal with. SQLite's a nice sweet spot.

I've thought about using a key/value database (either a BTree one like SQLite uses under the hood or a LSM one like RocksDB). It'd be faster and less total code. But SQLite's too convenient to give up. The SQL interface is nice for debugging in particular. And it's already plenty fast enough, so my time's better spent adding some glaring missing features and better UI.


this is a cool project!


Consider how inserts, updates, single-point lookups, indexes etc work with your analytical system.

You use SQLite to back and operate an application. SQLite is a wonderfully lightweight transactional database; it compares more to e.g. MySQL than with OLAP systems like Spark.

SQLite isn't competing for analytical use cases :)


No, I meant using eg pandas even for that. We built an data annotation processing system at my last job like this. Literally customer specific csv files that got booted up and then an online flask service received restful updates of data annotations transactionally written back to the backing csv.

Managing it as separate csvs per customer allowed some incredible optimizations for fanning out processing and performing reporting and dashboarding. The process running pandas allowed us to do much nicer aggregations, pivots, filters, etc., and by not writing it in SQL, we had so much more flexibility in application code and especially in unit & integration testing code.

If data size per customer was going to grow substantially larger, we would have needed to migrate the workload to a backing SQL database, likely Postgres, but the nature of the problem meant this axis of data size was not a problem (every separate csv represented a completely isolated advertising campaign from a customer, with only up to a few million records per campaign).

Using a flask server program to do this in pandas was an aspect that really, really paid off for us.


One of the reasons to prefer SQLite even in simple cases is that "filesystems are hard" and "transactionally written back to the backing csv" has a ridiculous number of failure modes, many of which SQLite handles correctly: https://danluu.com/file-consistency/

But in many cases, yeah, simple file use is good enough, as long as important stuff is backed up somewhere / a human can easily re-upload and repair anything needed. It's a <0.1% optimization, it really only saves you noticeable effort when you're doing like millions of those operations per day.


SQLite would give you these benefits over CSV:

- The parsing code for SQLite is exactly the same C code in every language, making it impossible to make mistakes in reading/writing.

- SQLite likely has stronger transactional writing ability than what your application created.

- Any SQL GUI will allow you to inspect and iterate on queries during development/production.

- SQL is a good first step to writing “join/filtering” queries and is backed by C code/indexes so should be fast for simple stuff.

Of course if you do not need any of those and are able to spend the extra time to write out your “queries” in pandas/Python that will work well too - just another way of doing the same thing.

Personally I like SQL as a first stop for prototyping, with the hope I do not need to use other tools. Joins, transactions and using the disk for state are all ”good enough” starting points, and I can take those techniques to any language I use.


SQLite is so ubiquitous you'd be hard pressed to find an app or service that doesn't use it.

Look through the licenses for the software included as part of your phone's OS. You'll find SQLite in there.

Search GitHub for sqlite, there are several projects with thousands of stars that use SQLite. Here's one: https://github.com/Tencent/wcdb


I'm using magnetico (https://github.com/boramalper/magnetico), a selfhosted bittorrent DHT scraper. After a few months of running I now have a little bit under 1M magnets in a ~3G database. SQLite is useful not just because I can query it with the full power of a relational database, but because I'm using its FTS capability for searching.

If I had to install a typical database or some search engine I would never have used it. It is more than enough for what I'm using it for.


> it’s exceedingly cheap for any data size conceivably processable with sqlite

As most people do, you are severely underestimating how much data sqlite can quickly work with. I doubt that any csv based solution could compete.


Most web browsers use it for history. That seems like a good example. Browser history needs reliable single user read/write and decent query performance and flexibility.


Cross-language tools, cross platform as well, enforceable schema, far lighter weight whilst still being fast. More durable and supports ACID transactions.


Conversely, I don't know of a situation where I'd use csv or json for storing data instead of SQLite. Maybe when the user is expected to change the data, but even then I think I'd rather make that part of the program depending on the target audience.


Appending lines of json or csv data to a file is very, very fast. Also takes up much less space because sqlite can't keep pages completely full. Also sqlite becomes quite slow on large datasets.

For some applications these properties make sense. I have hundreds of gigabytes of data that is only ever processed sequentially in a pipeline, I don't need random access -- for this use json/csv are fantastic.


SQLite not only makes it easy to store data, it makes it very easy to search that data.


Adobe Lightroom catalog - sqlite is used for lrcat, which can hold hundreds of thousands metadata of pictures and (non-destructive) edit sequences on those pictures.

Some examples [0][1], or just play with it yourself.

[0]: https://dphacks.com/2019/05/07/how-to-search-multiple-lightr... [1]: http://regex.info/blog/2006-07-29/221


Sqlite is the de-facto persistent store for most iOS (Core Data) and Android (Room) apps.


Android apps use SQLite all over the place... not sure if you're unaware of this or looking for different examples.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: