Hacker News new | past | comments | ask | show | jobs | submit login
SQLite is not a toy database (antonz.org)
986 points by nalgeon on March 25, 2021 | hide | past | favorite | 354 comments



> There is a popular opinion among developers that SQLite is not suitable for the web, because it doesn’t support concurrent access.

No, the issue is it doesn't have high availability features: failover, snapshots, concurrent backups, etc. (Edit: oops, comment pointed out it does have concurrent backups.)

SQLite isn't a toy DBMS, it's an extremely capable embedded DBMS. An embedded DBMS is geared towards serving a single purpose-built client, which is great for a desktop application that wants a reliable way to store user data.

Once you have multiple clients being developed and running concurrently, and you have production data (customer accounts that are effectively legal documents that must be preserved at all times) you want that DBMS to be an independent component. It's not principally about the concurrent performance, rather it's the administrative tasks.

That requires a level of configuration and control that is contrary to the mission of SQLite to be embedded. They don't, and shouldn't, add that kind of functionality.


I think that the absence of 'high availability' is not an issue for small websites or web apps. Transactions are ACID, concurrent readers are fully supported. Backups and administrative tasks are super-easy.


I just looked, and the entire database for my webapp (which typically serves around ~50 people) is 139k. As you say, you can do a data dump quickly and safely; at that size, you could afford to just dump the entire DB every hour and keep the last year's worth of snapshots if you wanted.


https://litestream.io/ That's exactly what litestream accomplishes.


That's close. Litestream takes a snapshot of the database and then continuously replicates out WAL frames. On restore, it grabs the snapshot and replays all those WAL frames (or all the frames up to a certain time if you want point-in-time recovery).


Also if you really need HA there are nice projects like https://dqlite.io/


Looking at https://dqlite.io/docs/protocol it seems like it does not allow for embedded usage though. Doesn't this turn it into more of a traditional client-server RDBMS?


Looking at the archtecture page [0] on that same website the second heading is "Embedding".

[0] https://dqlite.io/docs/architecture


Ah, right, I should have looked deeper!


From the home page:

> Dqlite is a fast, embedded, persistent SQL database with Raft consensus that is perfect for fault-tolerant IoT and Edge devices.


I see availability as orthogonal issue to scale/size. You might be processing one transaction per day but it can still be super important that the service is available for that one transaction.


So if you're saying everything does support a web database... then what's the reason people aren't using it for websites?

Why do you say it works for "small" websites but presumably not large ones? If it's not transactions, concurrent reading, backups, or administrative tasks... then what's the issue you run into?

Genuinely curious... I'm wondering if everything I've heard about "don't use SQLite for websites" is wrong, or when it's right?


The primary reason for me is that Postgres has stronger default constraints. If you care about keeping your data logically consistent then Postgres has more of that out of the box.

SQLite just makes the tradeoff to be simpler since often it doesn't matter. But don't make the mistake that it doesn't matter. Since PG helps avoid data problems and you might need to scale out web servers that is why Django for instance recommends switching to Postgres (or whatever you're actually going to use) ASAP cause there are differences. You may end up relying on PG to reject things SQLite doesn't care about by default. SQLite might let you get away with inserting data which PG refuses to handle.

Not to mention the DB specific features can differ. Like PG's JSON field types or etc.


Not sure why you were downvoted because that is a legitimate concern and it's my only beef with sqlite3. I will kill for an embedded PostgreSQL. If sqlite3 becomes that I'll absolutely pay for a license if they require it.


Check out DuckDB! It is designed for OLAP instead of OLTP, but it uses Postgres syntax and types! It's columnar and lightning fast for big queries.


Been looking for an excuse to use it, this might just be the one. :)


> then what's the reason people aren't using it for websites?

I'd guess the reason to be that people keep hearing things like "don't use SQLite for websites" and thus don't even try.

> Why do you say it works for "small" websites but presumably not large ones?

Not the GP, but the main reason I wouldn't use SQLite for a large website is that SQLite itself doesn't offer much re: failover/replication (i.e. multiple servers, one database), and I haven't used RQLite enough (or at all; I should fix that) to be comfortable with it in production. Because of that, I'm more likely to reach for / recommend PostgreSQL instead.

That being said, if your website has crazy "web scale" FAANGesque needs and you're at the point where you need to write your own replicated datastore, using SQLite as a base and building your own replication layer on top of it (or using RQLite and maybe adjusting it for your needs) seems like a reasonable way to go.


I'm in the process of adding read replication to Litestream[1] so folks can scale out the read-side of their SQLite applications to multiple nodes (or replicate to edge nodes for low-latency).

[1]: https://litestream.io/


Amazing. This was missing for my use-case and I gave up on LiteStream because of it but now it seems I have to revisit!

Wonderful job.


Neat. I'll have to give that a whirl.

Would this enable any node to be a writer (i.e. would it lock the DB across all nodes)? Or would I have to designate some "master" server with exclusive write access and have any other servers forward write requests to that server?


It would be the latter. You'd need to have a single primary and replicas would need to redirect writes to that node.


> That being said, if your website has crazy "web scale" FAANGesque needs and you're at the point where you need to write your own replicated datastore, using SQLite as a base and building your own replication layer on top of it (or using RQLite and maybe adjusting it for your needs) seems like a reasonable way to go.

Exactly what Bloomberg did with Comdb2: https://github.com/bloomberg/comdb2


Crazy web scale folks flip from Mongo (NoSQL) to Postgres: https://www.theguardian.com/info/2018/nov/30/bye-bye-mongo-h...


Great article. It took them 10 months to successfully migrate! No mention of how many man-hours, but probably a lot.


Depending on the use case (SaaS offerings with per customer shards) you can actually scale sqlite quite high. Expensify scaled to 4 mio qps (queries per second) [1], so if one of your customers is exceeding that, you better be looking at another DBMS but below that - well sqlite is an option.

[1] https://blog.expensify.com/2018/01/08/scaling-sqlite-to-4m-q...


I would use SQLite on a single-machine web server. However, I prefer to build my web stuff as stateless machine images that connect to a separate database instance, because it lets me scale them independently.

I should revisit this policy now that you can run a truly huge site off a 1U slot (I work for an alexa top10k, and our compute would fit comfortably in 1U); computers are so fast that vertical scaling is probably a viable option.


That's exactly what I am leaning to in my current job. Looking at the load of our servers I can comfortably put all our software on my gaming machine (which is mid-range!) -- with the DB included -- and I bet no request ever will be above 100ms.

IMO a lot of organizations should start re-investing in on-premise as well. Having a mid-range AMD EPYC server can serve most businesses out there without ever having more than 40% CPU usage.

That, plus scaling down Kubernetes clusters. Most companies absolutely didn't need them in the first place.


I don’t know about on-prem for serving external workloads; getting a reliable, redundant, high-speed connection can cost far more than paying for colocation at a DC that’ll do at least as good a job. If you’re lucky they’ll even install fire suppression systems.


Yep, can't argue with that.

Still, for the cloud I started preferring going for Rust (where applicable!) and optimizing the hell out of the performance hot-spots. So far the results have only been crushing successes. Horizontal scaling has been employed only as a means of backup instances with a load balancer in front of them. And for blue/green deployments.

Horizontal scaling hasn't been at all necessary otherwise. A $25 instance never got north of 15% CPU for one of the services I rewrote. I/O utilization never got beyond 60% and is usually 3-5%.

But I can agree that the cloud has unquestionable benefits. It's just that I feel that their number is gradually dwindling.


I think one valid fear is that a web application might, for multiple reasons, scale beyond a single process on a single server. This is common enough that using an embedded database can be problematic, compared to a separate RDBMS process that can be shared between processes. Some web applications are just never going to scale out for any reason, and for those SQLite might be appropriate.


Agreed, but it’s not only a question of scaling. Any app with high uptime requirements will need more than a single process from day 1.


It depends on what you consider high uptime. You can achieve 99.95% uptime with 4h of downtime a year. A lot of downtime occurs because of overly complicated systems so running a single process on a single server can give you relatively high uptime.


I just want to be able to deploy updates during business hours without downtime. Hard to do with a single process.


Most web servers support greaceful restarts, i.e. reloading code and configuration without downtime. Usually you just send a pre-specified and documented signal to the process to do it.


For a web server serving from the filesystem, sure. That doesn't help you deploy a new version of a web application you're making code changes to (which is the kind of thing that would need a database).


I was actually talking about both. For example for python, which is what I usually use, both uwsgi and gunicorn support graceful restarts. Admittedly I don't have much experience with other languages, but it's technically possible. If your application doesn't support it out of the box, you can still achieve it with something like systemd's socket activation.


Sqlite is catching up quickly with SQL support, but like with any fast moving technology - unless they're really interested in it - most of devs are lagging few years behind with their knowledge on what can be done with it.

For instance I remember that at some point sqlite didn't have foreign keys, so I couldn't run my existing migrations on it (without rewriting everything) which was a big issue for me at the time. Now I see that they've added the support for referential integrity in the meanwhile, but I had no idea about it because simply there's so many other libs and technologies to follow - one just can't keep track of every single tool in the world obviously - so some great tools just fall out of focus.

My guess is the word will slowly get out and in a few years people will probably shift to using it more in a web world - but it will take some time.


If writes are infrequent I think it might be fine. Otherwise I’d be wary.


I ran it once on an NFS disk. There must be some file locking happening I think. But I could read from multiple nodes, but for some reason the delete was failing (could not delete a database). Did not test the write as my app at that time was writing from one location only. Anyone tested such HA scenario?


How do backups work? Is there a locking mechanism to prevent file corruption if the file is copied during writing?


It supports multiple readers, and writes are atomic - no reader should ever get corrupt data just because there is a write happening in parallel. You probably should not use the OS to copy the live db file, rather, have an in-process thread that reads the db and writes to a backup location periodically.


Exactly, many forget the little detail that they should use the backup feature, rather than just copy the database file.


You can also combine file system snapshots to get versioning which would allow you to pull the db file out of the latest snapshot (right after taking it) and send it somewhere


It also doesn't have strong/static typing (it's dynamically typed) so you have to typecheck your inputs or do type coercion upon read.

And it doesn't have a native date type. Date handling has to be handled at the application layer. It can be tricky to do massive time-series calculations or date-based aggregations.

You can use integers or text types to represent dates, but this open-endedness means you can't share your db because everyone implements their own datetime representation.


Datetimes really are the achilles heel of SQLite.

The JSON extension library is amazing and works well. If SQLite were to grow a first-rate RFC 3339 library, one which could read from tz when available and do the things which strftime cant, acting on your choice of Unix timestamp and valid RFC 3339 date string, this would be a real boon to the ecosystem.

I haven't found typechecking inputs to be a real barrier. Sure, `val INTEGER CHECK (val = 0 or val = 1)` is a long way to spell `val BOOLEAN` but `CHECK json(metadata)` is a reasonable way to spell `metadata JSON`, and a similar function would surely exist for a SQLite datetime extension. You can do it now by coercing the string through an expected format, but that doesn't generalize well.


> If SQLite were to grow a first-rate RFC 3339 library, one which could read from tz when available and do the things which strftime cant, acting on your choice of Unix timestamp and valid RFC 3339 date string, this would be a real boon to the ecosystem.

OTOH, as an embedded database with freeform advisory typing and easy extensibility (for functions, etc.) in most host languages, it's not hard at all to get whatever you need for datetimes if you are using a host language that has a decent datetime library itself (and, as a bonus, you then don't have to worry about subtle differences between manipulations of datetimes through SQL and manipulations through other mechanisms in the app.)


This. I'm working on a hobby project that is essentially a web app for personal use. I started with sqlite because it was the simplest to start with, but I'm about to reluctantly migrate to postgres. Why? Constraints, data types, and programmability.

In terms of scale, sqlite is just fine. But I am tired of fiddling with the dates, it's too easy for bugs to sneak into my code, and I want to use table valued functions to essentially parameterize views instead of having to build complex queries in the app layer.

If your web app is mostly reading and writing single rows, yeah, sqlite is just fine. But if there's substantial and complex logic involved, it has its limits.


Check out rqlite (https://github.com/rqlite/rqlite) for that functionality - "rqlite is a lightweight, distributed relational database, which uses SQLite as its storage engine. Forming a cluster is very straightforward, it gracefully handles leader elections, and tolerates failures of machines, including the leader. rqlite is available for Linux, macOS, and Microsoft Windows."


rqlite is a separate daemon written in Go, negating most of the reasons to choose Sqlite in the first place.

It absolutely has good use cases, but those are rather niche. You'll mostly be better off with the traditional postgres etc.


For web applications, having a separate daemon for sqlite is best practice anyway. SQLite doesn't play nice with multiple concurrent connections (costly locking, the possibility of data corruption). You typically need to offload transactions to a queue in a daemon thread.


To me this is obviating a lot of the advantage of SQLite over any other RDBMS.


I thought the same, but maybe it's useful if you start with SQLite and decide to scale up to a distributed RDBMS without having to rewrite too much?


Two thoughts:

SQLAlchemy, at least, lets you switch from SQLite to Postgres in a configuration file.

Isn't there a useful SQL subset which allows you to switch from one database to another without rewriting? There seems to be such a subset for C, for example, which multiple compilers all interpret the same way, and SQL is a standardized language, too.


Yes, both of those are alternative solutions.

Using a wrapper like SQLAlchemy is probably an improvement for most medium and large projects, but it has costs of its own. Using portable ANSI SQL - above the Hello World level - is something basically nobody does unless they make a serious effort, lean on linting tools and forego some of the most useful features of their DBMS. sh is perhaps a closer comparison than C here: it's at least possible to write portable C accidentally.

And neither of these help you if you started with a SQLite database - an excellent choice for most - and decide that you need to support a dozen or a hundred concurrent users.


> Isn't there a useful SQL subset which allows you to switch from one database to another without rewriting?

Only if you're happy throwing away a lot of the features, which is taking away from what makes SQL an attractive solution in the first place. Plus, even some basic things aren't the same between different implementations (e.g., SELECT TOP 1 * FROM t vs SELECT * FROM t LIMIT 1), so you'd really have to be testing with different RDBMSes the whole way through to make sure you didn't accidentally break compatibility.


I agree that high availability features are outside of the goals of an embedded database. There's an ecosystem of tools SQLite to provide these benefits though. There's dqlite & rqlite for providing HA over SQLite. I'm the author of Litestream[1] which provides streaming replication to S3 for SQLite databases so folks can safely run a single node instance.

[1]: https://litestream.io/


One could, for instance, run “static” sites with the data stored in a SQLite database. Update on write apps often can do okay with 90% uptime. It’s the “generate everything in every request” crowd that needs an HA solution and why do we keep doing this to ourselves?


Yes, static generation is a great way to go if that works for your use case. Throw it up on a CDN and you'll likely have five 9s of uptime. I don't think that's a viable solution for a lot of applications though.


Half of arguments on the Internet are someone trying to win an argument they lost elsewhere. If I could get some people I work with to actually run the numbers on how much it costs us to make everything dynamic, versus how much we earn versus static, I think we’d quickly arrive at an agreement that most of it should be static with a little AJAX thrown in for the 10% that needs to be special.


SQLite itself does not have those features, but at the disk level you can get those features:

On GCP:

- A regionally replicated disk will replicate writes synchronously to another zone (another data centre 100km away). This means when the SQLite write call returns, the data will be geographically replicated, and the second disk can be used as a fail over. This is all transparent to app/db.

- Disk snapshots are incremental, and are stored in Cloud Storage, which is geo replicated across regions (E.g. Europe and US).

In a way, this gets you even better failover and backups than traditional server DBMS's that have these features built in, and often need custom administration to ensure they are working.


> "No, the issue is it doesn't have high availability features: failover, snapshots, concurrent backups, etc."

I think it has concurrent backups via the backup api: https://www.sqlite.org/backup.html


I'd expect this to be unnecessary in WAL mode (which you should use, when possible), since WAL allows concurrent readers while permitting up to one writer. In the old undo-log mode (which remains the default for compatibility) writing excluded readers.


I don't think you can do naive file copy backups even in WAL mode. WAL will checkpoint the log into the main database file from time to time. You need to use the SQLite backup API to let you make a backup while blocking any potential checkpointing.


Sorry, I was unclear. I meant the "copy a few pages then sleep for some time to release the database lock" part. With WAL mode you should just be able to copy as fast as you can without disturbing other readers or the writer. SQLite also has a similar SQL built in, VACUUM INTO. That's nice because the backup API just copies all pages (iirc).


I think the biggest issue can be distilled to "lack of concurrent network access". You start getting into a lot hairy management problems with distributed file systems/copying files around.

Imo part of that is just web applications tend to be not very efficient when compared to something like unix cli utilities so to get performance you end up with potentially massive amounts of horizontal scaling

On the other hand, the lack of performance /usually/ buys you higher productivity so you can make product changes faster (you let a GC manage the memory to save time coding but introduce GC overhead)


> Imo part of that is just web applications tend to be not very efficient when compared to something like unix cli utilities so to get performance you end up with potentially massive amounts of horizontal scaling

It's less about performance than about resilience IMO. Of course if you don't have a proper HA datastore (master-master) then you kind of undermine that, but a datastore SPOF is better than the whole application being a SPOF.


Right, and Expensify plausibly solved this by wrapping SQLite to produce a different DB, BedrockDb. But that shouldn't imply at all that SQLite alone could satisfy main stream web applications.


>Once you have multiple clients being developed and running concurrently, and you have production data (customer accounts that are effectively legal documents that must be preserved at all times) you want that DBMS to be an independent component. It's not principally about the concurrent performance, rather it's the administrative tasks.

For my reporting/read-only use case https://datasette.io/ solves the above beautifully.


>No, the issue is it doesn't have high availability features: failover, snapshots, concurrent backups, etc. (Edit: oops, comment pointed out it does have concurrent backups.)

Actually, I think it's network access that is the issue. SQLite isn't really designed to have 4-5 network connected apps using it as a shared data store. In most shared hosting setups, database is treated as a network service, so you can have 200 different sites leveraging the same db server... and sometimes the db server isn't on the same hardware as the http server.


Not to misunderstand. It fully supports concurrent reads. And also concurrent reading while some other process is writing the same database.

What it doesn't support is concurrent writes only.


For replication of SQLite there are some options:

http://litereplica.io - single-master replication

http://litesync.io - multi-master replication

https://aergolite.aergo.io - highest security replication, using a small footprint blockchain


Have you used litereplica or litesync? It doesn't inspire confidence that their websites haven't been updated since 2016 and 2017 respectively.



A kludge---I forget from where---for low traffic sites is one SQLite DB per user.


Not sure I understand: sqlite is file based, so snapshots and concurrent backups are literally just file copies/backups.

I'd much rather SQLite not waste its time on implementing features they're not good at, leaving that to the tools we already have available for rolling file backups, instead spending their time and effort on offering the best file-based database system they can.

(Heck, even failover is just a file copy initiated "when your health check sees there's a problem")


Unfortunately backing up with a simply file copy operation isn't guaranteed to work if you have write traffic at the same time.

Instead you need to use the .backup mechanism or the VACCUM INTO command, both of which safely create a backup copy of your database in another file - which you can then move anywhere you like.


Not really, it still works exactly as you should expect: writes are handled atomically, using a separate file, so if you copy the database file "mid-write" you're not actually copying a database in flux, you're copying the database before the write operation gets applied, and SQLite applies the write after the copy lock is gone.

The main issue here is time: because copying temporarily locks the db out of further changes, it will be "out of sync" if you naively believe that performing a write means your next call will see that data, and you bake that assumption into your code. While there's a wait, on modern hardware using SQLite for its intended purposes (any volume of reads, but low volume of writes), that's just not an issue.

There are other problems associated with file-based backup, of course, such as missing out on in-memory data, or corruptions caused by power outages, but those problems only exist if we were to copy a db once. Backups run regularly, and any data we miss out on the current pass, we'll get on the next pass.

Basically: when used for the purpose that SQLite was created for, file copies are a perfectly fine backup strategy that really only shows its limitations when you start to push your project into "SQLite isn't really appropriate here anymore" territory.

(having said that: the fact that VACUUM can be run into a new file is super nice, and everyone should know that it exists)


SQLite uses advisory locks but the `cp` command won't adhere to those so you can get a copy that contains only some of the writes from a transaction. If you're lucky, you just get some wonky data. If you're unlucky, you can get a corrupted copy.

Transactions are only atomic from the perspective of other transactions. Other processes or commands on the system can see partial state. This applies to both the rollback journal & the WAL modes.

File copies are also covered in their "How to Corrupt a SQLite Database File" web page[1].

[1]: https://www.sqlite.org/howtocorrupt.html#_backup_or_restore_...


Depends on the file system. On Windows you can use shadow copy for example.


Oh man, I didn't know about VACCUM INTO, thanks!


Every database is ultimately on the file system but there’s a reason that method of backup is rarely used.


Except not every database was designed from the ground up to BE a file, that's one of Sqlite's benefits. A database is by definition a single, fully self-contained file. Unless you go out of your way to make Postgres, MySQL, MariaDb, etc. do this, that is very much not the case.

For SQLite, "copying files", rolling file backups, shadow volumes etc. are by definition valid strategies when it comes to SQLite.


Postgres lets you write directly to block. In practice, nobody does this though.


From the dbadmin POV and understandable. From the dev side, if you want to take a stab at your own failover, shards, concurrents and snaps, there may be no better platform to learn on. Writing a custom backend to do tricks with sqlite dlls is extremely satisfying.


I learned this the hard way when I stuck some app containers down on a few RPis and mapped the folder of stateful stuff (including a SQLite database) to an NFS share on my NAS.

It's....not great.


The FAQ suggests that NFS is problematic for concurrent access:

> But use caution: this locking mechanism might not work correctly if the database file is kept on an NFS filesystem. This is because fcntl() file locking is broken on many NFS implementations. You should avoid putting SQLite database files on NFS if multiple processes might try to access the file at the same time.

https://www.sqlite.org/faq.html#q5


Oh yeah, I read all that after I started seeing problems, I know I'm in the wrong here. It works just good enough that I haven't addressed it yet (I just kick over the process when things get weird, and the db usually recovers just fine).


Can't remember where did I read this now but about a year ago I have found material saying that sqlite3 over NFS is a very bad idea and even though it's supported it's also strongly advised against as a use-case.


no, those things have never stopped me. What has stopped me is that the default configuration for a Rails app throws intermittent exceptions when I'm trying to save, because concurrent writes fail and Rails (at least as of a couple years ago) does the wrong thing by default


I've built a complex CRM that handles 2.1 million USD in transactions every year. It is running sqlite with a simple in-memory lru cache (just a dict) that gets purged when a mutating query (INSERT, UPDATE or DELETE) is executed. It is very simple and more than fast enough.

Friendly reminder that you shouldn't spend time fine tuning your horizontal autoscaler in k8s before making money.


Do you mean I don't need Go microservices talking gRPC deployed in multiple kubernetes clusters with bash script based migrations via GitOps with my hand made multi cloud automation (in case we move clouds) following all the SCRUM practices to ship working software?

Mindblowing.


You will for your blog series that you mention prominently on your resume that gets you your next Senior Architect gig.


Agh!... that's the catch... what I'm gonna give talks about and what do I write on medium then.... now I get it. Thanks!.


> my hand made multi cloud automation (in case we move clouds)

Aren't these symptoms of a deeper problem? Many Product Manager I talk to wanted me to build something that is as flexible as possible and solves all problems for everyone, everywhere. Go microservices with gRPC and Kubernetes feels like the only high-level technical decisions I can take in light of such information. :)


> Friendly reminder that you shouldn't spend time fine tuning your horizontal autoscaler in k8s

Oh, but most companies need this!

The biggest feat of microservices (which require ways to manage them, like k8s) was to provide the ability for companies to ship their organizational chart to production.

If you don't need to ship your org chart and you can focus on designing a product, then you can go a long way without overly complicating your architecture.


I think this org chart issue has been overstated.

If you have two services that are completely orthogonal, then combining them into a single application for deployment and operation purposes can be limiting. Developers of all people should understand the benefits of decoupling.

There are a lot of downsides that come with jamming a whole lot of unrelated functionality into the same deployment unit. It's similar to the problems created by global variables. Once you start to depend on your services operating in a monolith, you can easily create rigidity that can be difficult to roll back.

It's not that you can't design a monolith well, but microservices force you to consider important boundaries as opposed to simply violating them for the sake of convenience. It's another "human" issue, but it's unrelated to org charts.

This doesn't mean that everything should be a microservice, though. Good architecures address the requirements of the systems being developed.


I think the deployment benefits have been even more overstated.

This is an absolutely enormous trade-off between logical modularity and run time operational complexity.

If your devs aren't good enough to enforce modular design in a single application, what makes you think they're good enough to handle complex distributed systems?


> If your devs aren't good enough to enforce modular design in a single application, what makes you think they're good enough to handle complex distributed systems?

That's a simplistic take on the reasons that monoliths tend towards breaking modularity.

What makes me think managing microservices is perfectly viable is that the last three companies I've been at have been able to do it successfully, with a typical mix of good and less good developers, including cheap offshore devs.

As for handling "complex distributed systems," in many cases all that's really needed is something like a managed container platform. Services like Fargate or Cloud Run, or managed Kubernetes, can do a good job of this. Developers can deploy and publish new services with minimal effort, and most of the operational complexity is managed by the platform.

You do ideally want someone paying attention to overall architecture to avoid obvious pitfalls, such as effectively doing distributed joins via REST calls, and so on. This isn't that hard to understand, though, and teams that don't do the necessary architecture upfront tend to figure it out once they run into those problems themselves.


How do you ensure data is not lost to oblivion if a catastrophic system failure occurs?


You put in-place a loss mitigation strategy. This strategy will vary by application. In my case, I have a similar setup where we write 25-30k records to SQLite daily. We start each day fresh with a new SQLite db file (named yyyy-mm-dd.db) and back it up to AWS S3 daily under the scheme /app_name/data/year/month/file. You could say that's 9 million records a year or 365 mini-sqlite dbs containing 25-30k records. Portability is another awesome trait of SQLite. Then, at the end of the week (after 7 days that is), we use AWS Glue (PySpark specifically) to process these weekly database files and create a Parquet (snappy compression) file which is then imported into Clickhouse for analytics and reporting.

At any given point in time, we retain 7 years worth of files in S3. That's approx. 2275 files for under $10/month. Anything older, is archived into AWS Glacier...all while the data is still accessible within Clickhouse. As of right now, we have 12 years worth of data. Hope it helps!


This sounds interesting. Have you thought of doing a talk or blog article about it?

p.s., I run the SF Bay Area ClickHouse meetup. Sounds like an interesting topic for a future meeting. https://www.meetup.com/San-Francisco-Bay-Area-ClickHouse-Mee...


I'd be interested in hearing more about this design.


Backups.


Does that mean it's okay for your application to loose transactions (which occured between the backup point and the failure point) or do you have other mitigations ?


I'm the author of Litestream, which is an open-source tool for streaming replication for SQLite. That could be a good option if you need to limit your window for data loss. We have a pretty active Slack if you need help getting up and running. https://litestream.io/


Guess this daily 2 seconds of downtime is worth it, when that reduces cost say from $2000/month to $20/month.


Many banks still “shutdown” for hours every night to do backups.


I’m not anywhere near the banking industry but from HN alone I’ve been led to believe dailyish huge file transfers are also the norm in a variety of situations (aka SQLite’s backup strategy).


ftp or sftp if you're lucky - upload a giant CSV or receive one. It is crazy to me it still works this way.


and sometimes CSVs sent by mail on a set frequency (by a bot i'm guessing?)


Isn't that how all backups work? If you need to prevent data loss then backups probably aren't your tool of choice. And if you're paranoid about data loss then any replication lag is also unacceptable.

* I'm worried about my server blowing up: Transactions have to be committed to more than one DB on separate physical hosts before returning.

* I'm worried about my datacenter blowing up: Transactions have to be committed to more than one DB in more than one DC before returning.


With no sense of overstatement here, SQLite is one of my favorite creations in the entire world, so I have a bunch of links some of you might find interesting if you want to dig further:

https://github.com/sql-js/sql.js - SQL.js lets you run SQLite within a Web page as it's just SQLite compiled to JS with Emscripten.

https://litestream.io/blog/why-i-built-litestream/ - Litestream is a SQLite-powered streaming replication system.

https://sqlite.org/lang_with.html#rcex3 - you can do graph-style queries against SQLite too (briefly mentioned in the article).

https://github.com/aergoio/aergolite - AergoLite is replicated SQLite but secured by a blockchain.

https://github.com/simonw/datasette - Datasette (mentioned at the very end of the OP article) is a tool for offering up an SQLite database as a Web accessible service - you can do queries, data analysis, etc. on top of it. I believe Simon, the creator, frequents HN too and is a true SQLite power user :-)

https://dogsheep.github.io/ - Dogsheep is a whole roster of tools for doing personal analytics (e.g. analyzing your GitHub or Twitter use, say) using SQLite and Datasette.


Using sql.js we have built online SQL course where the code is executed in the browser itself. https://academy.bigbinary.com/learn-sql


That’s pretty cool! Thanks for sharing this. My SQL knowledge has been a hodgepodge of random stuff for years, so maybe going through your course will help fill in the gaps.


This is cool!


Seconded; it's a fantastic tool and ecosystem. We use sql.js in production (https://www.executeprogram.com). The SQL course's live code examples all run directly in the browser via sql.js.

I initially had low expectations because it's such a weird use case, but it's been totally reliable. We did have to ignore a few types of errors from old browsers that don't support wasm properly, but we've never had a bug in current browsers caused by sql.js.


Wow! The sql.js bundle is only 8kb. This is a hidden gem for sure. Thanks for pointing it out!

EDIT: It's actually 1.2MB. Thanks for pointing it out :)


Nope, try using the demo website, it loads a 1.2mb wasm file (https://sql.js.org/dist/sql-wasm.wasm). SQLite might be impressively small but it's not _that_ small.


I was actually looking into options for storing some data for a game I'm working on a while back, and decided against using SQLite because it was too big of a dependency (project has a specific size limit/target).


I'd point out that it's 1.2MB uncompressed, but only around 390k compressed.


There's a 1.2 Mb WASM file as well


SQLite is one of those things that also shows by contrast how bloated and inefficient much of modern software is. It accomplishes so much so efficiently with such a small footprint.


Agreed. There's a real beauty to it, to me, like looking at a Swiss watch or something. Redis gives me similar vibes though is a little more complex given its networked nature (although the C is nice to read).


I love it. SQLite is light, easy, and useful in a variety of situations.


>so I have a bunch of links some of you might find interesting if you want to dig further

oooh, thank you! I'm starting to see SQLite the same way!


sqlite can actually be scaled quite high: 4mio qps (queries per second) https://blog.expensify.com/2018/01/08/scaling-sqlite-to-4m-q...


It's truly a shame that Web SQL was opposed by Mozilla. The web would be a better place if a tool as powerful as SQLite was available by default on billions of devices.


Can you share more details? Why Mozilla opposed SQLite?



There's also a pretty nice built-in sqlite extension for transitive closures that helps in searching hierarchical/tree structures in sqlite: https://web.archive.org/web/20141127001741/https://charlesle...


Clicking the above link takes me to https://imgur.com/32R3qLv.

I fail to understand why have a blog at all if its author don't like people linking to it.


How strange, sorry, I didn't know that it did that. Just updated it to an archive link. This is the only source I know of that gives an overview of closures.c so it's worth the read.


Definitely not your fault :)


I believe jwz did this first


well, he doesn't like one particular group of people linking to it :-)


we can also have a SQLite database with branches, like in Git:

https://github.com/aergoio/litetree


Which if any of these is efficient enough to run on mobile and can cerealize an entire database to localStorage?


SQLite underpins large parts of both iOS and Android, so... all of them? Whether you can serialize "an entire database" just depends on the size of the database and whether it will fit in the local storage. SQLite scales up to a few TB at least, so that is unlikely to be the bottleneck.


I don't fully get it -- this is an SQlite implementation in JavaScript, which is different from what you'd use in a native app. Unless you store it to localStorage or cookies, all variables disappear when you navigate away from the page.

Can you write an offline HTML5 webapp with any of these libraries such that it can cerealize the entire database into a string and then store that to localStorage and reload the next time?

It's an honest question and not sure why people are downvoting without giving reasons.


Really you'd want to do this in to an indexeddb store rather than local storage. Less issues with size limits and string serialisation as you can chuck Blob instances straight into it.

You'd have to handle loading/saving though


Is there some way by which the indexedDB store can stay synchronised with the server database running in the backend.

Something simple so that the front-end app developer can just focus on implementing business requirements.


indexedDB is too low level for that kind of thing by itself. It's just a fast Key-Value store with secondary indexes that's in your browser.

You either need to find a solution that has a sync system built in (such as <https://dexie.org/> or <https://pouchdb.com/> ) or you'd roll your own atop whatever you were using.


localStorage is a web API, the storage limit is ~5MB.

However it can only store strings so it's pretty taxing.

Chrome and Safari actually embed SQLite directly as WebSQL, however it's not becoming standardized because Mozilla doesnt consider "add SQLite" as a sensible web standard.


Two major gripe I had with SQlite

1. SQLite doesn't really enforce column types[0], the choice is really puzzling to me. Since schema enforced type check is one of the strong suit of SQL/RDMBS based data solution.

2. Whole database lock on write, this make it unsuitable to high write usages like logging and metric recording. WAL mode will help but it will only alleviate the issue, you will need row based lock solution eventually.

Just like the offical FAQ said, SQLite competes with fopen[1] instead of RDBMS systems.

--

[0]: https://sqlite.org/datatype3.html

[1]: https://www.sqlite.org/whentouse.html


> SQLite doesn't really enforce column types[0], the choice is really puzzling to me.

This is acknowledged as a likely mistake, but one that will never be fixed due to backward compatibility:

> Flexible typing is considered a feature of SQLite, not a bug. Nevertheless, we recognize that this feature does sometimes cause confusion and pain for developers who are acustomed to working with other databases that are more judgmental with regard to data types. In retrospect, perhaps it would have been better if SQLite had merely implemented an ANY datatype so that developers could explicitly state when they wanted to use flexible typing, rather than making flexible typing the default. But that is not something that can be changed now without breaking the millions of applications and trillions of database files that already use SQLite's flexible typing feature.

https://sqlite.org/quirks.html


SQLite already has different operating modes, right? e.g. WAL is turned on and stays on, I think; It seems like you could at least make type-checking an opt-in mode


WAL is journaling mode set on the connection. It’s not a db level setting.


WAL mode will stick after subsequent "connections".


Something like "PRAGMA STRICT = ON;" should be trivial to add after the strict mode is actually implemented.

But IMO that would just warrant an entirely separate software package. Shoving it inside sqlite3 is likely to make it much more complex.


Maybe there could be a fork called SQLITE, an acronym for SQLite Including Type Enforcement


I think that would win the prize for most ungooglable name. (As Google is not very cases sensitive)


Yeah, still I would really wish SQLite is a feather weight RDBM system than almost-but-not-quite-your-typical-RDBMS the better `fopen`.

Having a tool like this would made my life whole lot easier, well, one can dream.


I would not say that it is acknowledged as a mistake. Sqlite originally comes from Tcl world and thus it is somewhat natural that it follows Tcl's stringly-typed object model.


> but one that will never be fixed due to backward compatibility

I wonder if a fork/"new version" could address this. Like, sqlite2 (v1.0, etc.).


Considering that we're on sqlite3 already, it'd probably be something for the v4 ;)


There is already an sqlite4 but they haven't done any work with it for a long time.


This would be something on the order of the python2 -> python3 transition. Meaning, it would likely take a decade. After going though that, I'm not sure it would be worth it to just change from (default) flexible data types.


It's more accurate to phrase point 1. as SQLite column constraints are opt-in. If you explicitly add a check constraint on a column schema, SQLite will dutifully perform it for you. It's a little extra work, and as siblings have pointed out, a greenfield SQLite would probably not have done things this way. But it's also easy, I have check constraints on many columns and they serve the purpose.

Think of SQLie as having a weird dialect where `Col1 INTEGER` is spelled `Col1 INTEGER CHECK (typeof(Col1) IN ('integer', 'null'))`. Ideal? No, but also, not a showstopper.

There are a few good reasons not to use SQLite. Your point 2 is one of them, running the client on one machine and accessing the database file via a network file system is another. Although I've pushed write-heavy workloads pretty hard with some care, it's easy to create a situation where contention becomes untenable. You can really pound on it with one client, but with several it gets dicey.


Using a single write thread and multiple readers gives perfectly sound and high performance concurrency in SQLite. Of course the particulars of how one actually does that depend on the language in use.


1: Actually this is a feature, It's awesome and easy to map typing to your language types. In python see: https://docs.python.org/3/library/sqlite3.html#using-adapter... specifically the DECLTYPES option.

Other language bindings do things like this also, and makes it pretty idiot proof. you `create table test (mydict dict);` so your tables know their types, and then at bind time you say a sqlite column type of dict == a python dictionary.

Obviously python is sort of a terrible example, because python typing is somewhat non-existent in many ways, but you see the point here.

2: There are definitely cases where it won't work out well, high-concurrent write load is definitely it's big weak spot, but those are usually fairly rare use cases.


> SQLite doesn't really enforce column types[0], the choice is really puzzling to me.

It's not the worst thing in the world; you're validating on data ingest anyways to prevent sqli, for example, right?


Data validation is a dangerous method to try to prevent SQL injection. The only surefire method is to used parameterized queries, which you should be doing anyways.


huh? By data validation I mean a validation library in your surrounding PL, it's flowing through the types of that language, and at no point is unprepared SQL entering your system.


The expensify blog linked from a comment here claims:

> But lesser known is that there is a branch of SQLite that has page locking, which enables for fantastic concurrent write performance. Reach out to the SQLite folks and I’m sure they’ll tell you more


I love that I don't have to define a length for the 'varchar' columns and I can store strings of any length to those columns ;)


If you are in WAL mode, you can have unlimited readers as one writer is writing.


Also, another issue when not in WAL mode is that a long read will actually have a write lock and block writes.


> unsuitable to high write usages like logging and metric recording.

If the speed of your write-only workload is limited by whole file locks rather than by raw I/O speed, you can probably consolidate your writes into fewer transactions (i.e. fewer disk accesses, amortizing lock cost over more data) and write to several databases in parallel according to any suitable sharding criteria. Which is what any RDBMS would have to to anyway.


Don't forget about User-Defined Functions.

https://www.sqlite.org/appfunc.html

We just started enhancing our SQL dialect with new functions which are implemented in C# code. One of them is an aggregate and it is really incredible to see how it simplifies projections involving multiple rows.

One huge benefit of SQLite's idea of UDFs is that you can actually set breakpoints and debug them as SQL is executing.


They're awesome!

One common use case is REGEXP. SQLite has a keyword for regular expression matching, but it has no implementation for it. What your application needs to do is to take whatever regex library it's using, make (through whatever FFI method it's using) a C function that interfaces with it, and register it with SQLite.

A more advanced feature of this binding mechanism is that, if you provide a bunch of specific callbacks to SQLite, you can expose anything you like as a virtual table, that can be queried and operated on as if it was just another SQLite table.

See: https://www.sqlite.org/vtab.html. The plugin for full-text search is implemented in terms of this mechanism.


Application-defined functions are very useful.

One of my company's applications is already designed to work with different SQL systems and a new customer desperately wanted SQLite for a very special use case. As SQLite is quite simple and doesn't support many functions that are standard in SQL Server, MySQL, Oracle, etc., we used application-defined functions to implement all functions the application needs in C#. It's not very fast but also not slow and the customer is happy, which is what really counts.


Yes, that's a very useful feature to extend the functionalities of SQLite. mORMot (https://github.com/synopse/mORMot) added JSON support to SQLite even before JSON1 was introduced.


Sounds very interesting. Can you elaborate on how you're leveraging C# for this?


Directly through Microsoft's SQLite provider - Microsoft.Data.Sqlite.

See: https://docs.microsoft.com/en-us/dotnet/standard/data/sqlite...


SQLite is so robust, that I bet most websites could use it without really needing to move onto a client/server RDBMS.[1] I use MySQL, and I know PostgreSQL has a large marketshare now, but I wonder how much of either is really necessary when you think about traffic usage alone. I know at least in my use cases, neither seem necessary.

[1]: https://sqlite.org/whentouse.html


I ran a niche community social bookmarking site (around 100-200k pageviews per month) on SQLite for several years and it was no problem at all. If a write was occurring, having a simultaneous request wait 100 milliseconds was no big deal. It only became a problem when I got tired of ops and wanted to put it on Heroku at which time I had to migrate to Postgres.

I've always been surprised WordPress didn't go with SQLite though - it'd have made deployment so much easier for 99% of users running a small, simple blog.


Someone did write a plugin to have wordpress use SQLite as the backend: https://wordpress.org/plugins/sqlite-integration/

Perhaps not great for production since Wordpress automatically updates itself, and you would have to keep up with any changes. And not just for wordpress, but for any other plugins that use the database.

Edit: A single file fork (albeit 5k lines of PHP) of the plugin that looks interesting: https://github.com/aaemnnosttv/wp-sqlite-db


I maintain a corp Ghost blog that's backed by SQLite, it's been solid for years.


Same thing here.


At the time when WordPress rolled out SQLite was still in its infancy while MySQL was already a mature DB, hence the choice. These days however they could, at least, do a nice wizard to ask "do you want to run a blog only?" and deploy SQLite instead.


I don't have any data on this, but I do know that a significant (if not majority) of Wordpress sites now aren't even blogs. They are small websites for things like restaurants, hair salons, construction companies, photographers, and personal portfolios. Especially since most of these sites are also running on $3/mo shared hosts, switching to SQLite would be a substantial improvement.

Most these sites are running a homepage, an about page, a contact page, and maybe one or two misc pages. They might have a blog that has two blog posts on it from nine years ago. But that is really it. MySQL is really overkill considering the scenario. SQLite's big "limitation" is non-concurrent writes. But this is rarely a problem with most Wordpress sites because they are single-author and they aren't updated very often. SQLite can handle plenty of reads to support even heavily trafficked websites.

Not to mention, SQLite's greatest advantage is portability. A single file contains your entire database. A non-technical user could transfer hosts or backup their data by copying their database file like it was a photo or an excel document. That's pretty incredible when you think about it.


You don't even need SQLite for that. Something like PicoCMS with markdown files is more than enough. If you want a fronted for the owner, that's another story, but if you're the one mantaining the sites, WordPress is an overkill 99% of the time.


Pleasantly surprised to hear PicoCMS mentioned here, since it's not typically the most common PHP markdown CMS of choice folks like to mention.

I was a casual developer (i.e. not for work, just for personal) deep into the PicoCMS ecosystem for a couple years, a few years ago. I both started a site and helped a family convert an old static site to PicoCMS and really had no complaints. Re: frontend for the owner, I started with Pico Admin and made a bunch of modifications to it (including an image uploader) and the non-tech owner has no complaints and it's been working well since.

Nowadays for my own blog I'm into the whole SSG/JAM trend, but I'd still run a PicoCMS site any time, if the use case is right.


I've been reviewing CMSs for some time, and PicoCMS is the best in terms of simplicity, hands-off and easiness.


I looked at using SQLite with Wordpress for sites I maintain. In theory Wordpress core is quite DB-agnostic and according to random blogs MySQL could work. But plugins use so much MySQL-specific features that it was hopeless.


SQLite is very limited because of its threading model, imo it's not usable outside of the single app model where you have a single user.

https://sqlite.org/threadsafe.html

https://sqlite.org/lockingv3.html


The article addresses this. Basically, you can have any number of concurrent readers, but only a single writer. Writing and reading can happen concurrently just fine. So the question is -- how many users does a website need before having only a single concurrent writer becomes a bottleneck?

That number will obviously depend on the read/write ratio of any given website; but it's hard to imagine any website where [EDIT the number maximum number of concurrent users] is actually "1". And for many, that will be in the thousands or hundreds of thousands.

FWIW the webapp I use to help organize my community's conference has almost 0 cpu utilization with 50 users. Using sqlite rather than a separate database greatly simplifies administration and deployment.


>That number will obviously depend on the read/write ratio of any given website; but it's hard to imagine any website where that number is actually "1".

I can imagine a static website where the content is read-only for users and is only editable by admins/developers/content managers through some CMS.


That'd be a near-infinite ratio. The parent is discussing a site where the ratio is near 1 (i.e. roughly as many reads as writes).


It's not even the percentage of reads vs writes; it's about the total time spent writing. Suppose that while a single user was actively using the website, the time spent writing was 70%. That's a totally mad write load; but even then, if you had two concurrent users, things would probably still be quite useable -- you'd have to go up to 4 or 5 users before things really started to slow down noticeably.

Suppose, on the other hand, that a single user generated around a 1% write utilization when they were actively using the website (which still seems pretty high to me). You could probably go up to 120 concurrent users quite easily. And given that not all of your users are going to be online at exactly the same time, you could probably handle 500 or 1000 total users.


When I was working on database services there were actually many applications with more writes than reads. A common example is applications where you want to keep the user's state saved frequently, but you only need to reload it when the application is restarted. A lot of games work this way. So there can be a write every few seconds or every minute for every active user, but you only need to do a read a few times at the beginning of a new session.

Of course applications like a blog will have far more reads than writes. It just really varies depending on the type of application.


Right; lots of modern document software, for instance, basically saves continuously; so a Google Docs-style application could have such a load. But if you have a write transaction every second, but each transaction only takes 1ms, that's still ~1000 concurrent users before you start to get noticeable lag.

Remember, the person I was replying to claimed SQLite was "not useable outside of the model... where you have a single user". Yes, if you need 100k concurrent users doing a 1ms transaction every second, SQLite isn't for you. But 1000 concurrent users is a lot more than 1.

Although on reflection, what they may have meant for "single user" is a single process (perhaps with multiple threads). That sounds reasonable to me: people just don't realize how much you can actually do with a single multi-threaded process running on a modern server.


1 ms per transaction is quite optimistic! In practice occasionally you will have some sort of latency spike caused by an external event or a single extra-slow transaction, and then your single-threaded queue will get backed up and requests will fail, if your database is dependent on a single thread. SQLite is great for the right problem but it just isn't the right tool for a web service with even a small amount of traffic.


That's not been my experience. I write applications with in Go using a SQLite backend and it's not uncommon to see writes transactions that are in the 100s of microseconds. I typically see read transactions with multiple queries around ~50µs. That's all running on very modest $5/month DigitalOcean droplets. The vast majority of web applications do not see thousands of requests per second so I think SQLite is a great fit for most web apps.


As if a transaction only takes 1ms... Imagine you want to update the lastLogin field for a logged user for example.


> you can have any number of concurrent readers, but only a single writer

Only if your readers and writers are cleanly segregated.

Most languages and web frameworks don't have SQLite drivers out of box (or have extremely bad ones). Unlike SQLite, most databases don't really care about distinction between read-only and writable connections. So there is a good chance, that you will always open writable connection by default, because this is what your framework/ORM does. Furthermore, seemingly read-only web middleware often ends up writing to database on each request for one reason or another. If you try to reuse/pool connections (which is also important under high load), you need to be wary of keeping open writable connections in cache — again, something that does not matter to all major databases other than SQLite.

I was involved in maintenance of a small web app (db size < 5 Mb), written in Django, that had to serve ~1000 dynamic requests per second (the contents of each response were dependent on IP address of caller). The app worked with PostgreSQL, albeit poorly, but immediately ground to halt under load with SQLite — which was our default database choice for historical reason.

We ended up briefly caching results of most database queries in memory, which removed most of load from database (we also did a lot of other optimizations, but this was the decisive one). Eventually the app was able to withstand up to 9000 requests per second, but none of that was an achievement of SQLite — we just evaded database, Django and Python altogether on majority of requests.

While we are on this topic, the most widespread OS in the world, Android, also has extremely low-quality SQLite drivers — despite shipping SQLite as default database for many years. Android has a broken-by-design Cursor implementation (the devs admitted it themselves [1]), that always tries to count query results, even if you don't call getCount(). And a broken connection cache, that does not support read-only connections [2] (that method used to have a "TODO", but eventually they forgot, why they wanted it, so they removed it).

1: https://medium.com/androiddevelopers/large-database-queries-...

2: https://android.googlesource.com/platform/frameworks/base/+/...


SQLite doesn’t require you to know which connections are read and which are write. Connections get promoted to a write lock when you do a write DDL or if you begin an IMMEDIATE transaction.

Also, for your small app were you using WAL mode? Multiple readers only works in WAL journaling mode.


SQLite can promote connections, but it does not demote them back to read-only. As I understand, when a database connection is tainted by a single write, it's lock on SQLite file in promoted to write lock, which prevents other processes from opening any kind of connection to it.

Our Django setup needs multiple processes to work around the Grand Interpreter Lock. Disabling connection reuse in Django config slightly changed the behavior we observed, but didn't solve the performance problem.


With a single thread & SQLite connection instance, I have been able to insert hundreds of thousands of rows per second when using NVMe drives.

Note that WAL and synchronous flags must be set appropriately. Out of the box and using the standard "one connection per query" meme will handicap you to <10k inserts per second even on the fastest hardware.

The trick for extracting performance from SQLite is to use a single connection object for all operations, and to serialize transactions using your application's logic rather than depending on the database to do this for you.

The whole point of an embedded database is that the application should have exclusive control over it, so you don't have to worry about the kinds of things that SQL Server needs to worry about.

SQLite is not a direct replacement for SQL Server, but with enough effort it can theoretically handle even more traffic in your traditional one-database-per-app setup, because it's not worrying about multiple users, replication, et. al.


"The trick for extracting performance from SQLite is to use a single connection object for all operations, and to serialize transactions using your application's logic rather than depending on the database to do this for you."

It's not the right approach because it's hard to get right, you want to offload that to the DB.


It's actually not that hard to get right. You can put a simple lock statement around a SQLiteConnection instance to achieve the same effect with 100% reliability, but with dramatically lower latency than what hosted SQL offerings can provide.

Also, the only reason we ever want to lock a SQLiteConnection is to obtain a consistent LastInsertRowId. With the latest changes to SQLite, we don't even have to do this anymore as we can return the value as part of a single invocation.


You won't have to worry about the last insert row id anymore. `RETURNING` was added recently. Enjoy the speeeed.


Yes, this is exactly how mORMot (https://github.com/synopse/mORMot) implements the DB server with SQLite.


> I have been able to insert hundreds of thousands of rows per second when using NVMe drives.

Thats... amazing! What is your setup like?


As far as I'm aware that threading information is on a per-connection basis.

SQLite is perfectly capable of supporting multiple, parallel reads.

SQLite must serialize writes, which makes a highly parallel write-heavy workload not good for it. However, with WAL enabled writes do not block reads.

Basically highly-parallel read loads with low write counts (low-enough that serializing them doesn't lead to unacceptable slow down of writes) or with loads where latency is acceptable in writes (but not reads) is a perfect use case for SQLite. And it turns out that a lot of web services are heavily asymmetrically biased towards reads.


A nice fix for this might be to extract just the wire protocol from dqlite to make a network connected, but not Raft replicated, sqlite.

https://github.com/canonical/dqlite


There's also Expensify's wrapper of sqlite, bedrockdb.com. But at that point why not just use traditional RDMS?


An embedded DB with an optional network protocol, but without raft, replication, etc, seems "normal" to me. Raima used to be popular in the 90's and was pretty much that. It's still less to manage than MySql or Postgres.


I've used SQLite a bit, but not enough to say I know where you run into performance issues. I would anticipate though, if it's similar to fread/fwrite with some marginal overhead for dealing with SQL, after considering most queries aren't very complex, I think most people are going to have a hard time hitting those limits.

And that's assuming you're making queries every time an event occurs versus persisting data at particular points in time.


I disagree. I wrote a multiplayer web based game using sqlite in go: http://decwars.com/


Most websites/frameworks access their database through a singleton pattern/single-connection anyway.

Edit: Sometimes you have to lie and lead people down the wrong path to enlightenment... ;)


That's not the case no, you usually access your database with thread pool. Otherwise everyone would wait until the single connection is free. Once you have a bit more users that tries to write everything will fall appart.


One could implement a write queue and caching for the front end. That might sound like a lot of work to go through to avoid moving to a proper multi-user database. But given that writes are probably done through a handful of API endpoints, it's probably not really that much work to implement.


Well, let me throw out a crazy idea:

Nobody said you have to use a single database/file. Obviously, you are going to want to spend a couple minutes thinking about referential integrity. But how often do you delete records in your web app?


UPDATE is also a write, and references crossing files sounds like a nightmare.

If your deployment environment has serious constraints, I'm sure you could make it work. The product you deliver would be SQLite + custom DBI layer to hide SQLite's limitations.

It would be a lot more work, and not be as robust or scalable, compared to a more traditional selection. But I can imagine cases where it would be appropriate.


A big database id not an issue. Contention requires a large number of writers, not a large amount of cold data.


As an IRC client with an SQLite backend: several hundred times per second.

Per user.

While reads are more like once every day per user.


Yes.

;)


Any site with real volume will have a connection pool.

All of those connections might need to write, and this is where SQLite gets tricky to implement at scale.

I love SQLite! It's perfect for many use cases, but not all. Fortunately, Postgres is also excellent.


Can you provide some examples of frameworks with such pattern? I actually have never seen it in any of the Python or Go web applications that I had a chance to work with.


I'm sure most website could traffic wise. But for me it is not a question whether I could, but if I should.

If my goal is building a website, I don't necessarily want to experiment with different technologies if I already know Postgres will work perfectly fine without adding much operational overhead and covering use cases I don't have yet, vs the unknown unknowns of using SQLite and maintaining it over time. Again, this is not about some problem with SQLite, but just me not having experience using it this way. Same reason why I wouldn't just add any database system I haven't used before, even if on paper it would be the "better tool for the job" for a particular use case, and sure would be an interesting learning experience.

In my opinion practicality and prior experience often beats what is strictly necessary or "best".


Completely fair. It's just that some of us got obsessed with minimalism lately. With the mind-boggling power of the computers today it pays off to be able to make stuff small and still very functional.

But not wanting to get out of your comfort zone is a completely valid stance to take. We do this for money after all.


For me it is more about not leaving all the comfort zones all at once. Say if I start doing a project with 5 different technologies in the stack, I pick at least 4 that I’m solid with and maybe 1 wildcard.

E.g. I got interested SvelteJS exactly because I got jaded of how ridiculously complex many front end applications have gotten, albeit they are doing just barely more than fetching a JSON from a server and turning it into html. Then I got a really good use case for using it in production because the low end mobile phones and bad internet connections of our users were struggling with the very heavy SPA the company started off with.

For another project in the future, it might well be SQLite that is the more experimental part, but in the end it comes down to managing risks and benefits.

While one part of me would love to experiment with everything all the time, the other part likes to finish the work day on time to be able to have plenty time dedicated to non tech related things and that sleeps well at night being fairly sure that stuff is running smoothly


Very reasonable approach, I like it.


SQLite is a really neat thing. I was looking at extensions and how to augment it; you could even add a pg_notify -like feature: https://sqlite.org/c3ref/update_hook.html and have worker processes doing what would amount to out of process stored procedures in postgres (or UDF in SQLite) -- in any language you'd like.

You can only register one callback per table tho, although you could from this callback fire other functions... All in all it's an awesome tool for a project like tailscape, but I think the hackers there went for a flat file.

Personally I'd would love to see in process postgres; a build of postgres that is geared for integrating a set of your threads, and builds the whole of postgres with your app on all major OSs, only listening to the inside by default. For the same reason I'm using nodejs; to be able to run the same code anywhere. I think bundle size would be a minor issue, really, I downloaded Sage9.2 yesterday, it's 2GB! VSCode is 100MB download, and they refer to it as a small download...

cheers! happy coding,


The Firebird database can run both as a separate server and in embedded mode. I wish more databases works design for this sort of flexibility.


My only "problem" with SQLite was that it was so fast, running locally, that it hid just how much my app was needlessly hitting the database. It was a surprise when I transitioned to a networked Postgres server and performance completely tanked due to my crappy code...


I know you put "problem" in quotes, but in case you haven't seen it, this document was posted here a little while ago: https://sqlite.org/np1queryprob.html

It describes how access patterns that would be bad practices with networked databases are actually appropriate for SQLite as an in-memory DB.


That's interesting! Now I have to resist the urge to move back...


For me, the main killer feature of sqlite is portability. (Like moving it around, not porting it to new systems).

Just write the code to create an empty database if the db file does not exist, move your code elsewhere, and the database will be created on the first run. No usernames, passwords, firewall rules, IP addresses, no nothing... just a single file, with all the data inside.

Miration? Copy the whole folder, code and the database. Clean install.. copy the folder, delete the database. Testing in production? Just backup the file, do whatever, then overwrite the file.


I wish SQLite had a PostgreSQL compatibility layer. Right now, to add SQLite support to a Go project (or anything without an ORM), you have to rework all your queries and migrations. It's probably an impossible ask, but having a compatibility flag within SQLite so it would accept PostgreSQL formatted queries would be extremely helpful.


Abstracting over two complex and evolving systems is a bottomless well of bugs. Contributers would shy away from this sort of compatibility because it would encourage users to ignore the differences at first, and then complain loudly when they are discovered.


https://www.jooq.org/translate/ This does simple syntax translation, have not used it though.

Combine with a Golang AST lib and you might be able to make a CLI translate tool like `fix`.


I'd also like to add the possibility of using SQLite databases as an application file format: https://news.ycombinator.com/item?id=23508923

I had to work on a data import/export tool some time ago and SQLite has simplified the design a lot.


While this makes things easy, you should not do this for any application file format where you except your users to share files. This is because opening a SQLite database file makes the SQLite library execute any arbitrary code that may be stored in that file. [0] Therefore, SQLite is really only suitable for local-only file formats, such as configuration files, and not for files that users will e-mail to each other.

[0] https://media.ccc.de/v/36c3-10701-select_code_execution_from...


This is misinformation. SQLite does not execute arbitrary code found in the data file. There was a bug, long since fixed, that could be used by an attacker to cause arbitrary code execution upon opening the database file. The referenced video talks about it. It was a very clever attack. But the bug that enabled the attack was fixed even before the talk shown in the video was given.

Let me say that again: SQLite does NOT execute arbitrary code that it finds in the database file. To suggestion that it does is nonsense.

See https://www.sqlite.org/security.html for additional discussion of security precautions you can take when using SQLite with potentially hostile files. The latest SQLite's should be safe right out of the box, without having to do anything mentioned on that page. But defense in depth never hurts.


Are custom FTS3 tokenizers disabled by default on current SQLite versions? The documentation does not mention it (and also doesn't warn of the security concern around enabling them). Otherwise, I do not see how the attack vector discussed in the video could be "fixed". The security guideline recommends disabling triggers and views, which is also a logical remedy, but seems brittle, nevermind also impractical for a wide range of applications.

The documentation on the fts3_tokenizer function merely states that

    Prior to SQLite version 3.11.0 (2016-02-15), the
     arguments to fts3_tokenzer() could be 
     literal strings or BLOBs. They did not have to be bound
     parameters. But that could lead to security 
     problems in the event of an SQL injection. Hence, the
      legacy behavior is now disabled by default.
However, it does not discuss any mitigations for the case where SQL injections are not needed, because the attacker controls the database file.


See https://www.sqlite.org/fts3.html#custom_application_defined_...

Since 2016, arguments to the fts3_tokenizer() function must be variables (ex: ? or :var or @var or $var) to which values are supplied by the application at run-time using sqlite3_bind_pointer(). There is no way to do this from pure SQL script. Nor is there any way to do this from within a view or trigger. There is no way to invoke fts3_tokenizer() from a maliciously corrupted schema or database.


Weren't all of these issues quickly fixed by the developers after being reported?

E: oh whoops missed the dev post above mine


Thanks for the heads-up!

In my use case, files are shared between different instances of the application, usually without user intervention, but there's an attack vector to be addressed here.


> There is nothing more convenient than SQLite for analyzing and transforming JSON.

the example query given:

    select
      json_extract(value, '$.iso.code') as code,
      json_extract(value, '$.iso.number') as num,
      json_extract(value, '$.name') as name,
      json_extract(value, '$.units.major.name') as unit
    from
      json_each(readfile('currency.sample.json'))
    ;
that sure looks fun to type into a repl!

nothing against sqlite, which I like and use, just found the idea of that query being convenient for one-off analysis to be off.


If I have a choice of making a file format and doing the I/O or using SQLite, I’ll choose SQLite. It’s hardened to a degree that I won’t be for filesystem shinanigans. If I need to export a file, that’s still easy enough.


> There is nothing more convenient than SQLite for analyzing and transforming JSON. You can select data directly from a file as if it were a regular table.

Personally I love jq[0] for this purpose. I haven't really used SQLite for working with JSON, but the examples given are very verbose.

[0] https://stedolan.github.io/jq/


I wrote a plugin for Datasette that adds a jq() custom SQLite function, it's pretty fun: https://datasette.io/plugins/datasette-jq


I never managed to wrap my head around jq syntax however


I agree it can be a little obtuse at times and I still do occasionally find myself having to consult the documentation despite using it pretty regularly for some time. I do also sometimes make use of gron[0] since it integrates well with standard Unix tools. The pipelines I write with gron are probably pretty brittle in the sense that I expect they might be incorrect for some edge cases. But for some quick exploratory analysis, it's a tool I like to have handy.

[0] https://github.com/tomnomnom/gron/


I'm with you too. I've never found a query language except SQL that has really stuck in my head. Pandas, jq, XSL, etc... it all has me running screaming back to SQL.


It takes a while. jq really needs a repl.


Check out ijq...used it this past week to deal with some ugly nested FHIR data. Not perfect, a bit laggy, sometimes crashes your terminal, but really cool in a lot of cases https://sr.ht/~gpanders/ijq/


Seems like you can get something pretty close with fzf.

echo '' | fzf --print-query --preview 'jq {q} input.json'


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: