It must sound naive to some of you, but the first time stumbled upn sqlite I was so excited!
SQLite is a row store, which is best for OLTP (point queries, inserting/updating/deleting one or a few rows at a time), while DuckDB is a column store, which means the data layout has values from the same column stored contiguously, making aggregation queries (GROUP BY) perform much better.
By the way, here's a YT video of a talk given by one of the DuckDB implementers about why they made it, what it's for, and how it works: https://www.youtube.com/watch?v=PFUZlNQIndo
Not SQLite itself. One of the DB Browser for SQLite dev's is an Aussie though. :)
And check SQLiteStudio for Windows, is nice too.
PowerPivot is extremely amazing.
Loading hundreds of millions of rows into it takes a while, but given a commensurate amount of RAM and a reasonable data model (single table or star schema), performing aggregations with a pivot table is pretty snappy.
Here's a video showing rapid pivots of 100m row dataset
If it helps, our development builds for quite a few months have:
Our recent 3.12.0-alpha1 release does as well:
The first beta release for 3.12.0 should be out next week. There's not much change in it though (mostly language string changes), as the alpha1 has turned out to be really stable. :)
Still very rough, but usable.
How far off do you reckon it is from being "production" ready?
Asking because we've been adding useful SQLite extensions as optional extras in our (sqlitebrowser.org) installer. Can add yours too, if you reckon the code is reasonably cross platform and shouldn't cause (many) weird issues. :)
There's also a long tradition of "faking" decimals with integers. I have always found that to be extremely tedious and error prone.
Why naive? That's the experience of all of us with SQLite.
How do you make sure you don't lose data in a SQLite DB?
I deal with massive scale all day every day for work (kuberntes/openshift/public clouds) so I find the simplicity amazingly refreshing.
I might also say the performance of my "single box" websites is also amazingly good. I've had even developer friends ask me how I get response times so fast, and they are often surprised when I tell them.
I do regular file backups and have a cron job restarting the web apps each 20 minutes, it’s enough for now. I would love to have a more solid architecture, but it matches the traffic volume and it’s good enough..
I've never thrown more than 20GB of data at it, but it always handles things admirably, and it often (to me) makes the right calls about the level of complexity required (accepts column type defs but really just doesnt care if it maps to string.)
I'm a front-end dev that can wing backend from time to time, and I use SQLite as much as possible. On multiple projects now I've run into complications due to complexity or environments, and adding a simplified local development backend with sqlite kept down time to a minimum.
SQLite is awesome.
It's basic but enough to start.
We still have not encountered a scenario that has caused us to consider switching to a different solution. Every discussion that has come up regarding high availability or horizontal scaling ended at "build a business-level abstraction for coordination between nodes, with each node owning an independent SQLite datastore". We have yet to go down this path, but we have a really good picture of how it will work for our application now.
For the single-node-only case, there is literally zero reason to use anything but SQLite if you have full autonomy over your data and do not have near term plans to move to a massive netflix-scale architecture. Performance is absolutely not an argument, as properly implemented SQLite will make localhost calls to Postgres, SQL Server, Oracle, et. al. look like a joke. You cannot get much faster than an in-process database engine without losing certain durability guarantees (and you can even turn these off with SQLite if you dare to go faster).
There even are solutions that do this already:
- rqlite (https://github.com/rqlite/rqlite)
- actordb (https://www.actordb.com/)
Managed SQL services for SQL variants are readily available and fairly affordable for applications which do not have access to such shared service layers .
It does not mean that SQLite has no place I love using it in native applications and mobile apps, however multi-user web applications is perhaps not one of them, even concurrency is not a problem
I regularly see 50TB total of databases on SQL Server, and scaling up to thousands of clients.
That said, if we had a situation where we were pushing such volume that the transactional store was 50 TB, and we were still within the scope on the overall project, I see no reason why this would be a problem for us. As I mentioned prior, the context of our applications is single process, single server. Obviously, this is not suitable for all edge cases and you will potentially be better off with a hosted solution in such cases.
At the end of the day, there is no arbitrary number of bytes where B-Tree indicies just stop working because you didn't shell out for a SQL Server license.
I would prefer to implement things like horizontal partitioning myself using business keys and logic that make sense for my problem. I do not want a database to manage this sort of thing for me, because I want perfect visibility into exception cases so business logic can be developed around them.
I.e. if one of the drives fails, but I was only storing a lower-priority business entity on that particular partition (e.g. user preferences), I could decide to just continue processing without it (e.g. reset to defaults). With a hosted database solution where you do not have this degree of granularity, a catastrophic loss of part of the storage array would mean the entire database is down. Even if the important data isn't directly impacted.
I'm sure I just lack the imagination, but i cant imagine how there can be thwt many companies who can make TB of data they then want to shove in a database. The complete sales history of even a medium sized company should still easily fit in a GB or two.
Some of the huge ones absolutely do binary storage for HA/DR reasons, but payroll companies and realestate companies are the mainstay and they have a tendency to store a lot of XML crap on top of the lovely normalized rows.
How big is each database? With SQLite, each database is its own file.
The theoretical limit for such a file is 140 TB, but the practical limit is probably much lower (https://www.sqlite.org/whentouse.html).
One is to simply zip up the entire database and send it across the wire. This is most applicable for local development and QA testing scenarios.
Another is to have something in the business application and relevant tooling that allows for programmatic querying of the data we need to look into.
We also have some techniques where we do ETL of the data range we care about from 1 SQLite db to another, then pull down the consolidated db for analysis.
SQLite really shines when you have a more-or-less final idea of what tables you want to build and don't have to worry about users inputting crazy data. Anything up to that and it is a model of simplicity. Anything beyond that and it is undercooked.
In my case the data was always 10s of MBs.
Remember, the point of SQLite is a replacement for generating a file format. Although it's a database, it lets us (developers) re-use our knowledge of databases when doing basic file I/O.
Do the FK relationships get destroyed??
But dropping a column is different: even if the DBMS performs a similar smart trick (ignoring the value of the dropped column that is contained in old rows) space is still wasted, and it can only be reclaimed by rewriting old files.
Deleting a row is similar too - the row is not removed from the heap page and the database does not get smaller (though if that page gets rewritten the row is not kept). Last time I used innodb it didn't actually return free heap pages to the filesystem at all so no matter how much you deleted the database file never got smaller.
EDIT: Looks like that's still the case now: https://bugs.mysql.com/bug.php?id=1341
MySQL did DML for years with full table rewrites behind the scenes. It's not particularly hard to emulate, though not entirely pleasant.
(Although I really raise an eyebrow when people talk about performance. Every time I write anything server-like in sqlite I end up regretting it, and get enormous speed boost, not to mention actual concurrency, switching to PostgreSQL.)
create table foo2 (
insert into foo2 sleect col1, col2 from foo;
drop table foo;
alter table foo2 rename to foo;
create index on foo(col1);
As for the reason, see the next section on that link. It's not perfect, but it is what it is. SQLite wasn't designed to solve 100% of the use cases, and that's not a bad thing.
For my use cases I've thusly never had to drop/alter a column... but I understand it could be very annoying.
UPDATE docs SET body=edit(body) WHERE name='report-15';
UPDATE pics SET img=edit(img,'gimp') WHERE id='pic-1542';
So sqlite leaves the datetime storage decision to the user: either TEXT, REAL or INTEGER . This means certain datetime optimizations are not available, depending on what the user chooses. If one needs to ETL data with datetimes, a priori knowledge of the datetime type a file is encoded in is needed.
In that sense, sqlite really is a "file-format with a query language" rather than a "small database".
I don't want to know the use case for this.
Keep rocking on, SQLite. It's the first tool I reach for when prototyping anything that needs a DB.
I use docker volumes to write to disk. I pass the disk directory to my process via a CLI arg.
When running on a VM these disk writes are replicated between zones (this is default for regional GCP disks). So you get zero config high availability (if you can tolerate down time during a reboot).
For example I have an app that you upload files through. Files can be local to the server or on s3 and have metadata. I end up making a new table for the API points. Like a table for listing files/directories. A table for local files and a table for s3 files. Then a table for the metadata, and a table for the kind of file it is, etc. It works, but it feels like a heavy hammer.
I was recently choosing a database for a medium-size website and SQLite seemed like an obvious choice. One thing I was worried about was that the database locks for each write - but this is apparently not true anymore with write-ahead log.
This won't hold true for all use cases; one table now has 11 million rows, and I'm not sure how well SQLite would perform on that. The benchmark was very simple anyway, and it's mostly a read-only where users don't update/insert new stuff. Would be interesting to re-test all of this.
As soon as you get into privilege levels or heavy loads, then those others make more sense.
I doubt it but we’re both guessing. Personally I’ve never worked on a professional project that had all readers/writers on a single computer. So in my bubble SQLite is not an option.
TL;DR: if your data-access scenario is read-mostly write-sporadically, and your data being in a single place is ok, SQLite is fine.
It does make sense for many applications
Look for it on google, it’s possible to do it..
Specifically, you'll need to recompile libsqlite3, put it somewhere, and then set LD_LIBRARY_PATH before invoking Python. You can do that globally in your shell by modifying your .bashrc or similar file.
Or if you're super brave, you just replace the libsqlite3.so that Python is pointing to (really depends on your use case).
One of our developers (Manuel) started putting together lists of tutorials and video's for SQLite + DB Browser for SQLite a while back.
There are probably more we've missed, and contributions to those pages (etc) are welcome. :)
Very much so. PostgreSQL easily handles lots of concurrent writing. It's a use case where PostgreSQL is much better than SQLite. :)
In this case though, it seems a bit weird.
SQLite is widely known to be for single writer workloads, whereas PostgreSQL is similarly widely known for being extremely good in concurrent usage scenarios.
Those are the things they're each designed for. eg:
(the "High-volume Websites", "High Concurrency", and "Many concurrent writers?" pieces)
Feel free to run benchmarks to demonstrate this to your own satisfaction though. :)
Raspberry Pi: Yes
Mobile Apps : Yes
Desktop Apps: Yes
Big Monolith : Yes
Browsers. : No
If that is the case, if not all, the majority of them already use it for ages now; else, please clarify what you mean.