Migrations are also hilariously easy with SQLite if you use the user_version pragma. We have a monotonically-incrementing number to indicate each version. Our migrator first queries this upon startup and runs all the migrations that exist between current user_version and whatever our highest sequence # is (defined as a constant in code). Some might argue that you should just use EF (or whatever language-specific ORM), but I prefer the level of control and the substantial performance boost you get with raw SQL and a simple migrator approach. I cannot speak for EF Core, but EF6 performance was absolutely abysmal in some more complex cases.
I would also say that performance might surprise you if you turn on WAL. I have saturated NVMe disks inserting lots of data into SQLite. The trick is to use a single connection and let SQLite handle the serialization for you, rather than try to manage multiple connections to the same file in code (i.e. one connection per logical operation sucks for SQLite).
I've always added a `schema_version` table with a single row containing a single column with the version, so thanks for letting me know about this!
I don't think I'll use user_version though. I prefer my current solution: a version table with multiple rows. One for initial installation and one for each schema upgrade with timestamps and the software version that performed the operation. The current version is just the max. If my migration procedure goes badly, I have some clue what happened.
If my application supported schema downgrades, I'd want to have a separate operation_id and schema_version; the max operation_id would win rather than the max schema_version.
According to their mailing list, the way to register your own ID is to submit a patch that adds your ID to the file.
Some of the application IDs are made up using hexadecimal (such as Fossil), and some using ASCII (everything else listed in the magic.txt file in the SQLite repository). I have used base thirty-six; I later saw RADIX-50 mentioned, which would be another way.
The 'lite' is even a bit of a misnomer. Properly tuned I've seen 1TB+ SQLite databases work just fine. It doesn't scale horizontally with CPU cores as well as a "real" database but it can perform very well with large data sets if there's only one or two writers.
>[...] I think it should be pronounced "S-Q-L-ite". Like
a mineral. But I'm cool with y'all pronouncing it any way you want.
For most who browse HN that immediately means you reach for Postgres (and you would be wise to do so), but I would argue you can continue to use SQLite indefinitely (one instance per node) if you manage replication and transactions between nodes at your application layer.
I realize this is probably beyond what most developers want to screw with, but I have seen some extremely promising results in experimental implementations of this scheme. Being able to replicate items conditionally based directly on business needs is a very powerful capability on paper - E.g. User session data must be synchronously replicated to all nodes, while other entities can by asynchronously replicated based on their use cases. You could handle replication and transactions based upon any business factor or state if you go down this path.
PS Apparently my previous comment outraged the HN hive mind, judging by downvotes. I find this hilarious and revealing given that all I’ve said is obvious to any actual distsys researcher (ask eg the HyperDex or FawnKV folks about why they chose CR instead of Paxos). And at one large cloud provider I worked at, this was taken for granted back in 2012. Apparently “distsys” pop culture has yet to catch up.
The reason to use depencies is so you don't have to reinvent complicated wheels
Agreed. I wrote a performance-oriented lightweight ORM on top of SQLite calls: https://github.com/zmj/sqlite-fast . Its focus is minimizing allocations - treating SQLite as something closer to an in-memory store than a network API.
Look into Linq2Db  (Disclaimer, I contribute to the project a little bit.)
It's the perfect in-between for if you want LINQ like abstractions but don't want the overhead of an ORM, it's more like Dapper than, say, EF or NHibernate.
Best of both worlds: I can quickly prototype a design using SQLite, and then flip it over to SQL Server/Postgres by changing a config string.
EF Core is somewhat improved from EF in some ways (It's faster, often as fast as Linq2Db or Raw SQL) but has moved backwards in others (lots of stuff still doesn't work right, more importantly things that -used- to work in EF6). IMO EF Core tries to do too much; It tries to give the same API to interface with both Relational and KV stores, and results in not being great at either.
That is one other point - SQLite has enabled us to do some really deep consolidation of our codebase. We generally use one SQLite database per business entity or abstraction. Users.db, Logs.db, Sessions.db, Settings.db, etc. It allows for a truly 100% independent vertical stack (aside from coupling against business models) for purposes of developing various pieces of business functionality. This is, in my view, is the perfect ideal of 'microservice' architecture. One big folder called services, one class per 'microservice', one SQLite database per class, one collection of mappers per class, etc. I honestly feel like we have settled on the perfect grain for our codebase structure (at least in C#/.NET Core land). The benefit in our world is that instead of HTTP-based RPC, we are doing direct method invocation throughout. We also use Blazor in order to extend this concept as close to the screen as technically feasible.
Because it's not that easy with Generics. :)
The signature of the method is as follows:
public static int InsertWithInt32Identity<T>([NotNull] this IDataContext dataContext, T obj, string tableName = null, string databaseName = null, string schemaName = null)
If we tried...
public static TIdentity InsertWithIdentity<TInsert,TIdentity>([NotNull] this IDataContext dataContext, TInsert obj, string tableName = null, string databaseName = null, string schemaName = null)
You'd actually have to write:
which is also potentially confusing if you're trying to glance at the code.
But wait, some might ask, Select has two types, how come it doesn't have this problem?
public static IEnumerable<TResult> Select<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector)
But as the signature shows, the return type is based on the expression passed as a parameter, which is why you don't have to do it there. :)
Edit: That said, there's facilitation for inserting with granular control over columns.... which is when I stopped waiting for EF to get it's act together. :)
How is this typically handled? Using OS file locking primitives?
Saturating an NVMe drive requires writing with very large block sizes or very high queue depths.
I will admit that it would be very difficult to saturate a disk that can write on the order of gigabytes per second if you are trying to insert rows that are only a few KB in size (at least without batching these into larger entities or requests).
The type system in SQLite is not as strict as it is in other RDBMS.
If you don’t need irrational numbers (few financial applications do?), the tuple of (value, scale) can take you very far.
Floating point (up to a certain amount of absurdity) you can describe with integers (value, scale).
Real numbers you can describe with integers (numerator, denominator, scale).
Irrationals are more difficult. That's why we try not to use them :). Hopefully your company's commission tiers are not defined in terms of circles or euclidean distances.
Sure you can round at the end but then you have to worry about compounding errors.
I know how to do it, it is just not practical.
Unless you operate 2+degree currency polynomials, it may be okay to store money in minimal integer units at an application level too. It may be adjusted in only ui formatters. One should be careful with constants in code and external apis, but nothing you can’t get used to or catch with a test. We store timeouts in milliseconds, weights in grams, and so on anyway - not a new thing.
As an example, the primary database I maintain for our company is able to hold decimal .00 natively, but we use its fields as if they were multiplied by 1000, i.e. 12,345.00 is stored and operated as 12.35 (and goes as int64 1235 to ODS) (rounding 5.00 off is fine for this particular accounting model). Nobody made a three-digit mistake ever in a whole decade, because square/cubic money is not a thing naturally, and ratios & exchange rates work as usual. It is like storing cents, but the other way round.
Storing in-app amounts in floats has a downside of accumulating an error in for-looped calculations and the need for periodic rounding (although it should run on a really big dataset to collect that enough). If a language has a money type and it can be programmed accordingly, then a problem disappears. If not, then it persists either way in different forms.
I mean I know there are min/max ranges in the float specification that can guarantee no loss of accuracy to two decimal places... but then I need to manually make sure there's never anywhere in the code that might ever sum up transactions to beyond that, or whatever. Or people thinking it's OK to divide and then multiply, which you're hopefully less likely to do if it's an integer.
I feel much more peace of mind storing currency data as integers and doing the conversion. It feels quite natural really, since I'm already used to doing that with datetimes for different timezones.
You so not want to invent your own arbitrary precision numbers
Strings does not overflow & are easy to transfer.
You can store up to 900 trillion dollars with 0.01 cent resolution in an int64. I doubt many applications need more range than that.
> when [integers] overflow, they usually “wrap around” silently
If you want to write an application handling money, you really want to use a Money type of some description, regardless of whether the underlying storage format is an integer. So it's very easy to avoid silent overflows, since basically all languages that are prone to them also offer primitives that check perform addition, subtraction and multiplication with checks for over/under flow.
> The implicit decimal point is hard to change and extremely inflexible
If you have a Money type, it's not hard to change. The pain point would normally be serialization, but that's not necessarily different from using decimal floating point internally, since very few APIs, serialization protocols or even databases have native support for fp decimals either.
> The number of subunits in currencies change with time in the world.
I suspect Goverments are equally or more likely to drop trailing zeros (e.g. Iraq did not so long ago), which you also need to deal with anyway and which does not get easier by using a floating point format.
It seems to me that in that case you cannot use transactions though, or am I wrong ?
You can just batch them up by
mod N inserts or Z milliseconds
Although you are correct that it doesn't handle complex queries very fast ( the ORM), there is definitely a valid motivation to use/enforce it.
Complex queries are mostly handled through something like elastic search or search engine ( azure), the goal is to keep queries simple and if the code is loosely coupled, it's easier to do this.
But one can get pretty far with this design. Thousands of writes per second from a few dozen writers, no problem. Millions and hundreds, I'd use something else for that.
Though I have to admit, I was and am still disappointed to learn that the official way to pronounce SQLite is "S-Q-L-ite", i.e. "like a mineral".
You see it, even, in ostensibly public-consumption open-source projects, where the goal is clearly not merely to scratch one's own itch but to market the project for uptake for this reason or that. I feel like SQLite is a great example of what happens when a team is able to foster the kind of empathy to really understand how the user wants it to be.
2020 is turning out to be pretty terrible guys.
Apparently I've been using S-Q-L-ite, not S-Q-Lite.
All these years I've been using "Engine-X", not "en-jinx", and didn't know it.. https://www.nginx.com/resources/wiki/community/faq/
There's a certain irony in getting the name of a debugging tool wrong.
PostgreSQL is pronounced "post-gres". You absolutely never pronounce the "QL".
Luckily I code on my own, so nobody hears this abomination out loud
Their first product was SQLBase, pronounced "sequel base". The next product, which I worked on, was SQLWindows, pronounced "sequel windows".
We knew that some people pronounced SQL "ess cue ell", but no one at the company pronounced it that way, ever.
The "it's always S-Q-L and never `sequel`" is IBM's fault and an early trademark infringement issue in computing. (IBM was told it couldn't call it "SEQUEL" by an aircraft company.)
>> The forerunner of SQL, which was called SEQUEL (for Structured English Query Language), first emerged in the specifications for System R, IBM’s experimental relational database, in the late 1970s. However, two other products, with various names for their query language, beat IBM to the marketplace with the first commercial relational database products: Relational Software’s Oracle and Relational Technology’s Ingres. IBM released SQL/DS in 1982, with the query language name shortened to “SQL” after IBM discovered that “SEQUEL” was a trademark of the Hawker-Siddeley Aircraft Company. When IBM released its next generation RDBMS, called DB2, the SQL acronym remained. To this day, you will hear the name pronounced as an acronym (S-Q-L) and as a word (see-quel), and both are considered correct pronunciations.
I was hoping for a bit more interesting or detailed story.
Somewhere, I believe in an HN comment, I saw a Postgres developer say that one of the biggest regrets of the project naming was capitalizing that "shared S".
I always interpreted it as Post-gr-S-Q-L. Which is basically the same thing but makes more sense to me because it’s as if the name had a silent “e”.
I’m half joking but it might alleviate people from being confused.
The name confusion is all IBM's fault.
> Chamberlin and Boyce's first attempt of a relational database language was Square, but it was difficult to use due to subscript notation. After moving to the San Jose Research Laboratory in 1973, they began work on SEQUEL. The acronym SEQUEL was later changed to SQL because "SEQUEL" was a trademark of the UK-based Hawker Siddeley Dynamics Engineering Limited company.
And Salvatore (redis) has a way of simplifying issues that gives me a great deal of confidence in the decisions he makes and the code he writes. I really like the code and reasoning behind his 'linenoise' library in particular
The same could probably be said about Redis and Salvatore Sanfilippo.
It's all about choosing the right tool for the job.
There's great reasons to use xml files over SQLite, but that mostly comes down to needing a file to be human readable versus file size.
In the product that I work on, I moved very small tables out to xml files because it's easier to inspect them in a text editor instead of expecting non-developers to figure out a database. These files are also small enough and are written infrequently enough that any performance difference between xml and SQLite is negligible.
The vast majority of data is in SQLite. An older version of the product would always write out a giant xml file, but when the xml got big, performance suffered. In this case, SQLite is the correct choice because it can make a small change to the database without needing to rewrite the entire file.
There are plenty of SQL CLI/GUI clients that support SQLite.
But if you need end users to be able to change some configurations sure, move those to standalone files and parse them.
The other points are valid, but not everything can be shoehorned into JSON or XML. Sometimes the data size is too large for the read-everything-into-memory approach. Sometimes you need to deal with a lot of binary blobs. And so on.
Can you please elaborate on this?
Which, with a little bit of work, can be used to revert to earlier versions.
> SQLite really started life as a Tcl extension, Tcl being the programming language, the Tcl/Tk. The project I was working was working on was written in Tcl/Tk and so SQLite began as a Tcl extension and as a scripting language, like Perl or Python, where you can put any type of value you want in a variable. So a variable might hold a string, a number, a byte array or whatever. So I made SQLite the same way, where just because you’ve declared a column of a table to be text doesn’t mean you can’t put integer in there, or just because you declared a column in the table to be a short int, why not put a 2-megabyte blob there? So what? It’ll do that.
> Or if you have a column that’s declared integer and you try to put text in it, it looks like an integer and it can be converted without loss. It will convert and store it as an integer. But if you try and put a blob into a short int or something, there’s no way to convert that, so it just stores the original and it gives flexibility there. And this is useful in a lot of cases, because sometimes you just have a miscellaneous column in a table that you might need to store lots of different things in. And in traditional database systems you actually have to have multiple columns, one for each possible data type, whereas in SQLite you put it all in one column. So it works well.
> And for that matter, with SQLite you don’t have to give the column a type at all. You can just say, CreateTable T1 (a,b,c) and then you’ve got a table with three columns named a, b and c and you put whatever you want there.
> In retrospect, perhaps it would have been better if SQLite had merely implemented an ANY datatype so that developers could explicitly state when they wanted to use flexible typing, rather than making flexible typing the default. But that is not something that can be changed now without breaking the millions of applications and trillions of database files that already use SQLite's flexible typing feature.
I wish they had done this in the first place. I wish even now that they implemented a pragma to enforce types (and introduce the ANY type for folks who explicitly want it). But SQLite has enough good things going for it that I'm happy to use it in spite of this problem.
(1) Add a flag when opening a database to show that you want data types enforced.
(2) Add code to all the input pathways to check for type compliance. This way a database with correct data types will stay correct.
(3) Do not change anything in the storage layer.
What am I missing?
This is not a man who fears the existential dread of discovering, just after you've stated loudly that some part of the system could not possibly break that way, that some other developer has pulled the rug out from under you when you weren't looking.
The odds that I add the string "52" to a column I know damn well has integers is pretty low, but non-zero. And I'm gonna transpose some fields eventually. The odds of any particular coworker doing the same is some multiple of that. But q^n across my entire team, across months and years, across new hires and people switching into and out of the team? We are so deep into Murphy's Law territory that it's almost a certainty.
It's a fool's errand to try to prevent absolutely every one of those scenarios happening. All I ask is that we have a way to discover them early and cheaply.
But SQLite does the worst of all worlds, you have to specify a type, yet you cannot rely on it, it sometimes does type conversion for you e.g. int to string and back, and sometimes does not.
They should've just implemented a single ANY type or have the syntax not use types at all and deviate from the SQL spec (which they do a lot anyways).
But I have also run into hard problems using it. Deleting and renaming columns is not supported natively. On the internet one can find procedures to do that but they generally do not take foreign key references into account. Getting this right is a hard problem. In the end we had a function that could be used to rename or delete a column even without switching off foreign key checking but it was not simple and it was not the first iteration of said function and earlier iterations caused lots of subtle problems, like a column having a default value or not depending on whether the database was a new database or was created using schema updates.
Use at your own peril.
(I chime in only to remark that I myself am constantly surprised that SQLite continues to add these kinds of features – e.g. WINDOW functions were also in 3.25 – long after I had accepted their absence as a fact of life)
But if you're using SQLite for simplicity there's a good chance you're not interested writing raw SQL statements either.
I'm no ORM hater (use Hibernate daily) but claiming that using ORM will simplify working with the database is at least misleading.
Now, there’s nothing magical about alembic’s handling of renaming or deleting columns for SQLite; it has to recreate the table just like what you would do if you’re hand rolling the SQL, so it has the same problems.
The main and real reason I love fossil is that I'm always at peace knowing that I can't accidentally bungle up the history. I'm always nervous about that when I use git .. still.
I just read a great insight yesterday:
> Understanding there’s no harm to be done eases things a lot. It encourages experimenting, which is one of the most effective ways to learn new things and to get familiar with them. Imagine this: you start rebase, mess things up, hit Ctrl+Z several times and try again. Good news is, we can implement Undo/Redo given just ordinary Git implementation. Git already stores all the possible states our repo was in. All Undo/Redo will have to do is to shuffle some branch pointers here and there.
For me, there've been cases where I've worked on an angle, only to discard it for another approach .. and much later wanting to revisit that angle because it was a good idea. If I discard history to "clean it up", such a resurrection becomes impossible.
With the WAL turned off (the default), when you start a transaction SQLite will duplicate the data into a new file, change the original file, and delete the backup file when the transaction is committed. With WAL turned on it will write each change to a journal file, and merge this into the main file periodically (by default when the change file grows to 1000 entries) - so only a single write is needed.
Great tool, I can't recommend it enough.
The only gripe I have with SQLite is with full text search FTS5. The last time I tried to store and retrieve text with associated metadata from another table was really confusing. It kinda worked in the end, but then I also realized that you can't do a contains-query on your text, just starts-with and ends-with . This is a major flaw in my book. So many times I just want to dump tons of text into a table and want to query single words naturally, but I can't do that without resorting to hacks or unoptimized LIKE queries. And if I need to take into account typos or similarity, I'd have no idea how to do that with SQLite. Does someone have experience regarding this?
I do what you suggest: run dev/test against a real PostgreSQL (of the same version) running in a container. In fact, I take this further and run builds against an actual PostgreSQL, since I like to generate code based on the DB schema using jOOQ .
The build proceeds like so:
1. Spin up a fresh PostgreSQL container
2. Run my schema migrations against it
3. Run the jOOQ code generator against it
4. Compile code
This provides some additional guarantees, including:
1. My queries are fully type-safe from my code down to the database schema level
2. My schema migrations can be run against an empty database
This at least is possible to fake. But in the era of premade docker packages for postgres, I do wonder why the practice you describe persists.
The Micro-Orm I use in C# manages to work around this well enough; the only thing SQLite is really hard to work with is Time since there's no real structure defined for it.
Also, my docker-compose for a Postgres container is literally:
With postgres I just buy a managed setup from heroku, GCP or AWS. Backup is automatic, and there is a follower to takeover if the leader does/crashes.
It's easy to backup an sqlite snapshot, but you can't do that after every operation.
Curious, how do people solve this?
For better or worse, the web has always been driven by specifications, rather than implementations. Properly standardizing all of SQLite in a spec is not trivial, and as much I think SQLite is great software, I don't think that relying on SQLite the library specifically would be a good idea. What if the latest version of SQLite does something that's not in the spec or prohibited by the spec?
Either way, there's IndexedDB now, which is supported by almost everything, including IE.
The people who rejected WebSQL proposal had a lack of decisiveness and technically-focused vision. They also interpreted everything very politically.
What? The specifications document what browsers already do. Have you seen the HTML5 spec? It's mostly formalization of error-recovery, showing how to interpret previously technically-broken documents in the way modern browsers have converged on. It's not what anyone would design, but there's no one driving.
>So companies which are typically large companies that really depend on SQLite as part of their product, they just pay us an annual fee.
>... we are a 100% engineering shop.
An intriguing option for anyone sitting on an original infrastructure solution, and looking to turn it into a livelihood, with quality of life as first priority.
If the people who did the original Macintosh "resource fork" updater had been serious about maintaining consistency of the data structure, that approach would have lasted longer. That was a really good idea, badly implemented because they had to cram it into 128KB and use a floppy. Unfortunately, it wasn't rewritten when the Mac line got more memory and hard disks.
Somebody had to get the open source world past using text files for everything.
Having identical environment makes life easier. I never run anything on my Mac despite many Unix daemons being available but either run it remotely on Linux or locally via Linux VM with same distro.
I've tried SQLite for this usecase but the query falls down when querying millions of rows.
I know there is parquet file format but I'm wondering if there is like a SQL engine that can sit on top of something like that similar to SQLite
Back in the day on Windows you would use the Jet (MSAccess) db for this purpose or maybe dBase before that in DOS. Many apps that need to store data would use them as it did not need a server process just a library.
I guess you have some NoSql options like LevelDB, but its really nice to have a full ACID SQL engine running in process.
I always thought it would be great if PostgreSQL could separate into a library for the DB with a server that uses the lib or you could use the lib directly in your process but I would imagine the process model is baked heavily into the DB engine.
* Simple to embed into you application (no setting up a separate database, &c).
* Fantastically backwards compatible
* Public domain (so easy to comply with the license)
* High quality: Reasonably fast, reasonably efficient, reasonable feature support.
The lack of any of these would add significant drag. It's difficult to imagine any other project catching up and displacing it at this point.
E.g., compared with a Model T, modern cars are much easier but much more complex because the car itself understands things like adjusting the ignition timing and safely getting the motor turning, things that a Model T user had to understand if they wanted to avoid getting a broken arm before they got out of the driveway: https://www.fordmodelt.net/how-to-drive-ford-model-t.htm
They are unusually thorough in their testing. I see that as a good thing, not a sign it isn't (relatively) simple. In fact, I don't think any other database engine is tested as thoroughly. I can't even imagine how much effort it'd take to do so for Oracle or even PostgreSQL, except that it'd be prohibitive.
However the particular platform I was developing for was an embedded system using the JFFS2 filesystem. JFFS2 doesn't support shared mmap, which is a feature SQLite needs for WAL.
So basically, it was big pain - I had to setup a tmpfs mount point where the WAL-enabled DB resided and then periodically sync it to flash.
Basically there is one process periodically dumping stats into a sqlitedb. There is an HTTP API that can retrieve said stats. How do I make it so the HTTP API doesn't occasionally fail when trying to read from the DB?
Really though if you are trying to read and write to a db concurrently and over the network, it sounds much more clear cut for something like postgres.
There's a small chance even retrying 3 times you could fail all 3 times in a row. WAL journaling is a solution.
> 1. Is the data separated from the application by a network? → choose client/server
> Relational database engines act as bandwidth-reducing data filters. So it is best to keep the database engine and the data on the same physical device so that the high-bandwidth engine-to-disk link does not have to traverse the network, only the lower-bandwidth application-to-engine link.
> But SQLite is built into the application. So if the data is on a separate device from the application, it is required that the higher bandwidth engine-to-disk link be across the network. This works, but it is suboptimal. Hence, it is usually better to select a client/server database engine when the data is on a separate device from the application.