Hacker News new | past | comments | ask | show | jobs | submit login
Why SQLite succeeded as a database (2016) (changelog.com)
589 points by Tomte 48 days ago | hide | past | web | favorite | 281 comments

SQLite is a wonderful database. We use it in production many times over for all of our clients. Not having to worry about whatever arbitrary SQL Server installation is available in a particular environment has saved us so much time and frustration. Combine SQLite with .NET Self-Contained Deployments means that we (our automated tools) now copy our software distribution zip to target, extract to path, run executable as administrator and walk away. Without SQLite we could not do this.

Migrations are also hilariously easy with SQLite if you use the user_version pragma. We have a monotonically-incrementing number to indicate each version. Our migrator first queries this upon startup and runs all the migrations that exist between current user_version and whatever our highest sequence # is (defined as a constant in code). Some might argue that you should just use EF (or whatever language-specific ORM), but I prefer the level of control and the substantial performance boost you get with raw SQL and a simple migrator approach. I cannot speak for EF Core, but EF6 performance was absolutely abysmal in some more complex cases.

I would also say that performance might surprise you if you turn on WAL. I have saturated NVMe disks inserting lots of data into SQLite. The trick is to use a single connection and let SQLite handle the serialization for you, rather than try to manage multiple connections to the same file in code (i.e. one connection per logical operation sucks for SQLite).

I've been using SQLite for something like 20 years - how did I not know about `user_version` until now?!

I've always added a `schema_version` table with a single row containing a single column with the version, so thanks for letting me know about this!

I didn't know about user_version or application_id either until just now. I think I'll start setting application_id. Probably not registering it in the SQLite repository - there's no procedure to do that at the top of the file and it looks like almost no one does. I'll just pick a random 32-bit int and assume I don't collide.

I don't think I'll use user_version though. I prefer my current solution: a version table with multiple rows. One for initial installation and one for each schema upgrade with timestamps and the software version that performed the operation. The current version is just the max. If my migration procedure goes badly, I have some clue what happened.

If my application supported schema downgrades, I'd want to have a separate operation_id and schema_version; the max operation_id would win rather than the max schema_version.

application_id is really meant for scenarios where the SQLite database is treated as a document - a file that is intended to be copied around by the user, and opened by the same or another app elsewhere.

According to their mailing list, the way to register your own ID is to submit a patch that adds your ID to the file.

I tried to do that, but maybe the message didn't go through.

Some of the application IDs are made up using hexadecimal (such as Fossil), and some using ASCII (everything else listed in the magic.txt file in the SQLite repository). I have used base thirty-six; I later saw RADIX-50 mentioned, which would be another way.

We also do something very similar but our version table has two columns: module_name and version. Instead of doing central schema management like in a usual monolith, Every module manages its own schema.

I came here to say this. It succeeded because it's an extraordinarily high quality piece of code!

The 'lite' is even a bit of a misnomer. Properly tuned I've seen 1TB+ SQLite databases work just fine. It doesn't scale horizontally with CPU cores as well as a "real" database but it can perform very well with large data sets if there's only one or two writers.

There is actually no "lite" word in SQLite. Think of it as SQL-ite. The "ite" is a deliberate connotation to rocks/geology i.e. granite, magnetite, etc. Make of that what you will. This is documented BTW on the SQLite docs.

While I couldn't find anything in the docs, there's a snippet of a transcript here: https://changelog.com/podcast/201#transcript-67

Another: http://sqlite.1065341.n5.nabble.com/SQLite-Pronunciation-td8...

>[...] I think it should be pronounced "S-Q-L-ite". Like a mineral. But I'm cool with y'all pronouncing it any way you want. :-)

What is the underlying storage structure that scales to TB+?

Just regular B-trees for row and index storage. There's a reason why it's such a popular data structure!


Not to mention many filesystems are implemented as B-trees.

If you want TB+, you are talking about multiple computers (in context of indefinite scaling).

For most who browse HN that immediately means you reach for Postgres (and you would be wise to do so), but I would argue you can continue to use SQLite indefinitely (one instance per node) if you manage replication and transactions between nodes at your application layer.

I realize this is probably beyond what most developers want to screw with, but I have seen some extremely promising results in experimental implementations of this scheme. Being able to replicate items conditionally based directly on business needs is a very powerful capability on paper - E.g. User session data must be synchronously replicated to all nodes, while other entities can by asynchronously replicated based on their use cases. You could handle replication and transactions based upon any business factor or state if you go down this path.

There is a derivative called rqlite which uses the raft consensus protocol to give you a highly fault-tolerant distributed database.


It’s a great idea and I’m working on something similar, but Raft or any other quorum-based consensus protocol is a poor fit for the data plane.

Can you expand on why you think this is a poor fit ? Is this because of added latency due to the multiple round trip involved ?

It’s not latency, it’s throughput. Broadcast and incast network communication patterns in the LAN kill throughput, and are present in any quorum protocol. Atomic broadcast protocols like chain replication or LCR which use strictly sequential communication patterns (via chain or ring topologies) have much better throughput in LAN environments than Paxos/Raft (except for Ring Paxos, which requires IP multicast). In addition, because they outsource group membership and failure detection to a separate (likely Paxos-based) service, they require only f+1 members to tolerate f faults, rather than 2f+1 (see Lamport et al’s Vertical Paxos work). But they have two main downsides relative to Paxos/Raft-like protocols: latency grows linearly with cluster size and is bounded below by the slowest node (so they aren’t a good fit for high-latency WANs), and as noted above, they can’t “self-host” like Paxos/Raft.

PS Apparently my previous comment outraged the HN hive mind, judging by downvotes. I find this hilarious and revealing given that all I’ve said is obvious to any actual distsys researcher (ask eg the HyperDex or FawnKV folks about why they chose CR instead of Paxos). And at one large cloud provider I worked at, this was taken for granted back in 2012. Apparently “distsys” pop culture has yet to catch up.

I should also mention that a great virtue of quorum-based protocols (for majority or basic read/write quorum systems) is their ability to mask tail latency/transient faults. That is difficult to replicate in sequential protocols and is a good reason to use quorum protocols in high/variable-latency and otherwise unreliable environments like geo-distributed databases.

Yes, please do expand on this, since fast Paxos and similar are used for this in mission critical distributed databases for exactly this use case.

I’m not sure how often Fast Paxos is really a good tradeoff, since in the LAN it has inferior throughput to the other protocols I mention below, while in the WAN >3/4 quorums expose you to additional tail latency, probably enough to negate the advantage of eliminating a single message delay if clients are distant from the cluster (see the Google SRE book for discussion).

At some point though, aren't you essentially reinventing a traditional db over top of sqlite?

The reason to use depencies is so you don't have to reinvent complicated wheels

> Some might argue that you should just use EF (or whatever language-specific ORM), but I prefer the level of control and the substantial performance boost you get with raw SQL and a simple migrator approach.

Agreed. I wrote a performance-oriented lightweight ORM on top of SQLite calls: https://github.com/zmj/sqlite-fast . Its focus is minimizing allocations - treating SQLite as something closer to an in-memory store than a network API.

>Some might argue that you should just use EF (or whatever language-specific ORM), but I prefer the level of control and the substantial performance boost you get with raw SQL and a simple migrator approach. I cannot speak for EF Core, but EF6 performance was absolutely abysmal in some more complex cases.

Look into Linq2Db [1] (Disclaimer, I contribute to the project a little bit.)

It's the perfect in-between for if you want LINQ like abstractions but don't want the overhead of an ORM, it's more like Dapper than, say, EF or NHibernate.

Best of both worlds: I can quickly prototype a design using SQLite, and then flip it over to SQL Server/Postgres by changing a config string.

EF Core is somewhat improved from EF in some ways (It's faster, often as fast as Linq2Db or Raw SQL) but has moved backwards in others (lots of stuff still doesn't work right, more importantly things that -used- to work in EF6). IMO EF Core tries to do too much; It tries to give the same API to interface with both Relational and KV stores, and results in not being great at either.

[1] https://github.com/linq2db/linq2db/

We exclusively use Dapper for all database access. I have zero problems with a few string constants in my services that contain the requisite SQL statements for managing my business entities against the database. It's really nice having such a consolidated context where your SQLite implementations are just single partial classes tacked onto the services you are providing persistence for.

That is one other point - SQLite has enabled us to do some really deep consolidation of our codebase. We generally use one SQLite database per business entity or abstraction. Users.db, Logs.db, Sessions.db, Settings.db, etc. It allows for a truly 100% independent vertical stack (aside from coupling against business models) for purposes of developing various pieces of business functionality. This is, in my view, is the perfect ideal of 'microservice' architecture. One big folder called services, one class per 'microservice', one SQLite database per class, one collection of mappers per class, etc. I honestly feel like we have settled on the perfect grain for our codebase structure (at least in C#/.NET Core land). The benefit in our world is that instead of HTTP-based RPC, we are doing direct method invocation throughout. We also use Blazor in order to extend this concept as close to the screen as technically feasible.

Thanks for reminding me about this, just taking a quick look at it now. Quick question: do you know why it uses methods like db.InsertWithInt32Identity(product) rather than db.InsertWithIdentity<int>(product) or similar?

> Quick question: do you know why it uses methods like db.InsertWithInt32Identity(product) rather than db.InsertWithIdentity<int>(product) or similar?

Because it's not that easy with Generics. :)

The signature of the method is as follows:

public static int InsertWithInt32Identity<T>([NotNull] this IDataContext dataContext, T obj, string tableName = null, string databaseName = null, string schemaName = null)

If we tried...

public static TIdentity InsertWithIdentity<TInsert,TIdentity>([NotNull] this IDataContext dataContext, TInsert obj, string tableName = null, string databaseName = null, string schemaName = null)

You'd actually have to write:


which is also potentially confusing if you're trying to glance at the code.

But wait, some might ask, Select has two types, how come it doesn't have this problem?

public static IEnumerable<TResult> Select<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector)

But as the signature shows, the return type is based on the expression passed as a parameter, which is why you don't have to do it there. :)

Edit: That said, there's facilitation for inserting with granular control over columns.... which is when I stopped waiting for EF to get it's act together. :)

Thanks, I really appreciate the reply and hadn't thought about those side effects. I then thought "why not derive the identity type from an attribute on the identity property of the table class?", but then you have a dynamic return type, so you either return an object needing a cast, or it leads you back to the same problem you identified.

Can't you take object instead of TInsert and call .GetType to lookup the (cached) way to serialize it? I assume this is what json.net etc do.

one connection per logical operation sucks for SQLite

How is this typically handled? Using OS file locking primitives?

I love SQLite, but I'd be curious how you were able to saturate an NVMe drive with it. Were you doing the inserts in large batches?

Saturating an NVMe drive requires writing with very large block sizes or very high queue depths.

Inserting rows with ~500KB blob columns (JSON data) on a single connection w/ WAL enabled & pragma synchronous == NORMAL or OFF.

I will admit that it would be very difficult to saturate a disk that can write on the order of gigabytes per second if you are trying to insert rows that are only a few KB in size (at least without batching these into larger entities or requests).

My only grievance, and a blocker, is that SQLite does not support decimal numbers, making it less practical to work with finance related data.

You probably already heard of storing amount in cents as an integer, but it's a pattern worth mentioning if it fits your use case.

I don't know how it's done in the US, but in Europe financial calculations are done with four digits after the dot.

So store the value as milli-cents? (haha)

Shouldn't that be centi-cents? ^^

Or in MySQL: innocents.

More likely: varchar(255)

varchar(255) == varchar(2000000) in SQLite

The type system in SQLite is not as strict as it is in other RDBMS.

It is possible, but makes is somewhat impractical, because you need to convert numbers back and forth every time you touch them on an application level. Not worth the extra headache.

Speaking from experience, if you exclusively deal with integers internally and then apply the appropriate formatting when outputting to the user (which you need to deal with regardless of internal representation), it makes dealing with monetary data so much easier. Integers are generally better supported, faster, and more convenient for most languages, databases, and serialization formats.

+1 adding my anecdata to this.

If you don’t need irrational numbers (few financial applications do?), the tuple of (value, scale) can take you very far.

you never really deal with irrationals ... do you mean floating point?

Nope, specifically meant irrationals.

Floating point (up to a certain amount of absurdity) you can describe with integers (value, scale).

Real numbers you can describe with integers (numerator, denominator, scale).

Irrationals are more difficult. That's why we try not to use them :). Hopefully your company's commission tiers are not defined in terms of circles or euclidean distances.

How do you handle things like multiplying by some interest rate of 3.125 or (1 + 0.03125)^-354 as examples.

Sure you can round at the end but then you have to worry about compounding errors.

I think that's just part of the process of deciding how small an increment each integer represents (dollars, cents, centi-cents, milli-cents).

Accounting rules often care more about consistency than precision per se. As long as the code follows accepted accounting rules (say bankers rounding) choosing a known precision works. Often GAAP and other standards don’t define all details for, say amortization, but if you use a scheme the code should always use that method.

Numeric type, as supported by other SQL databases, does exactly this. I do not want reinvent it on the application level.

I know how to do it, it is just not practical.

Do you then use variable length for the precision depending on the currency?

Not insisting, and could be seen as a bad practice, but I think I have experience worth sharing.

Unless you operate 2+degree currency polynomials, it may be okay to store money in minimal integer units at an application level too. It may be adjusted in only ui formatters. One should be careful with constants in code and external apis, but nothing you can’t get used to or catch with a test. We store timeouts in milliseconds, weights in grams, and so on anyway - not a new thing.

As an example, the primary database I maintain for our company is able to hold decimal .00 natively, but we use its fields as if they were multiplied by 1000, i.e. 12,345.00 is stored and operated as 12.35 (and goes as int64 1235 to ODS) (rounding 5.00 off is fine for this particular accounting model). Nobody made a three-digit mistake ever in a whole decade, because square/cubic money is not a thing naturally, and ratios & exchange rates work as usual. It is like storing cents, but the other way round.

Storing in-app amounts in floats has a downside of accumulating an error in for-looped calculations and the need for periodic rounding (although it should run on a really big dataset to collect that enough). If a language has a money type and it can be programmed accordingly, then a problem disappears. If not, then it persists either way in different forms.

I dunno... the only alternative seems to be storing as a float everywhere, since most coding languages don't support a decimal type... and using floats for currency honestly just feels scary to me, since they're not fixed-precision.

I mean I know there are min/max ranges in the float specification that can guarantee no loss of accuracy to two decimal places... but then I need to manually make sure there's never anywhere in the code that might ever sum up transactions to beyond that, or whatever. Or people thinking it's OK to divide and then multiply, which you're hopefully less likely to do if it's an integer.

I feel much more peace of mind storing currency data as integers and doing the conversion. It feels quite natural really, since I'm already used to doing that with datetimes for different timezones.

Python has Decimal, JavaScript has BigNum and so on. I disagree with you - most languages support arbitrary precision numbers. Floats are not needed or wanted.

You so not want to invent your own arbitrary precision numbers


What is the drawback of just using a string as storage? Together with something like bcmath for calculations.

Strings does not overflow & are easy to transfer.

Applications can handle integers just fine, even more of a reason to use them. 64-bit integers will work everywhere seamlessly and you only need to convert once, when showing them in the UI and/or collecting user input.

Use a 64-bit integer and store the value as microdollars (or whatever main currency you're using). Simple, fast, correct, and compatible everywhere.

Just to give context, the link points out possible drawbacks with the approach mentioned above

> Integers have complete precision, but very limited range

You can store up to 900 trillion dollars with 0.01 cent resolution in an int64. I doubt many applications need more range than that.

> when [integers] overflow, they usually “wrap around” silently

If you want to write an application handling money, you really want to use a Money type of some description, regardless of whether the underlying storage format is an integer. So it's very easy to avoid silent overflows, since basically all languages that are prone to them also offer primitives that check perform addition, subtraction and multiplication with checks for over/under flow.

> The implicit decimal point is hard to change and extremely inflexible

If you have a Money type, it's not hard to change. The pain point would normally be serialization, but that's not necessarily different from using decimal floating point internally, since very few APIs, serialization protocols or even databases have native support for fp decimals either.

> The number of subunits in currencies change with time in the world.

I suspect Goverments are equally or more likely to drop trailing zeros (e.g. Iraq did not so long ago), which you also need to deal with anyway and which does not get easier by using a floating point format.

Wait, there are finance applications that don't forbid floating point numbers ??

They're talking about decimal numbers as a different thing from floating point numbers:


I'd like you to meet my friend "Excel"...

I mean for professional applications, not amateur stuff.

Excel is used for very very expensive professional tasks.

You know its strange - We work with an incredible number of decimal facts (financial data), but we find that this has zero impact on our SQLite usage because we have policy where extremely complex business entities that are prone to rapid change are serialized to JSON and stored as BLOB columns. Within this JSON serialized data is where virtually all of our decimal values reside. Obviously, its not a solution for any arbitrary case, but you can usually find workarounds in interesting places further up the stack. We use only 3 data types in SQLite - BINARY, TEXT and INTEGER.

There is no BINARY type in SQLite. (However, the declared type mostly does not restrict the data that column may store. All columns can store blobs and nulls (unless NOT NULL is specified). Columns declared CHAR, VARCHAR, CLOB, CHARACTER, or TEXT can store only text, blob, and null; trying to store a number will convert it to text. Columns declared REAL, FLOAT, or DOUBLE cannot store integers; they are converted into floating numbers; if you try to store text representing a number, it is converted to that number. Columns declared INT, INTEGER, NUMERIC, or an unrecognized type (such as BINARY) can store any value, but text representing a number is converted to a number, and floating numbers which can be losslessly represented as integers are converted to integers. Columns declared BLOB or with no type specified can store any value.)

How do you do reporting on these blobs?

In addition to the other workarounds suggested, you can store the numbers in two columns - one for the coefficient and one for the power of ten, negated. For instance 2.50 becomes 250 and 2, meaning 250 x 10 ^ -2. You can then do math at the application level, or even in SQLite with user-defined functions or even with recursive CTEs if you’re a bit of a masochist.

There is a SQLite decimal extension, unfortunately still under development and not finished: https://chiselapp.com/user/lifepillar/repository/sqlite3deci...

Interesting extension. I'm not from the finance field so I don't know the kind of regulations/standards are used. Do you happen to know?

> The trick is to use a single connection and let SQLite handle the serialization for you, rather than try to manage multiple connections to the same file in code (i.e. one connection per logical operation sucks for SQLite).

It seems to me that in that case you cannot use transactions though, or am I wrong ?

You can use transactions. They determine batch persistence of the data rather than persisting every insert.

You can just batch them up by

    mod N inserts or Z milliseconds
if you have enough data If you don't particularly care you can disable syncing too.

I know SQLite's SQL query engine is implemented as a VM. Are there ways of optimizing repeated queries, similar to compiling a query to avoid the VM churn? Perhaps more importantly, have you ever been disappointed by the query engine's performance?

You can "prepare" a SQL statement and use it multiple times. I.e. you can use the same VM for a statement many times.

You can prepare a query to use multiple times (if you are using the C interface, at least; as far as I know the command-line interface does not support this), as another comment mentions. You can also prefix a SQL statement by EXPLAIN to view the VDBE codes. But, I see that sometimes the VDBE code is not as good as it should be.

You can put each query and the result of preparing that query into an LRU cache (or even just a static pointer in C if your application only ever deals with a single connection). Then you can reuse these without preparing them over and over.

For programming, the use-case for EF or expression-tree queries is the specification pattern.

Although you are correct that it doesn't handle complex queries very fast ( the ORM), there is definitely a valid motivation to use/enforce it.

Complex queries are mostly handled through something like elastic search or search engine ( azure), the goal is to keep queries simple and if the code is loosely coupled, it's easier to do this.

Doesnt sqlite fail fast when many users commit at once?

There's a level of write-heaviness after which SQLite, even in WAL mode, is the wrong choice.

But one can get pretty far with this design. Thousands of writes per second from a few dozen writers, no problem. Millions and hundreds, I'd use something else for that.

Or if you have 10,000 writers each taking 10ms, that's 100 WPS, not bad for most apps.

I remember listening to this when it was recorded and I still remember details from it years later – probably my favorite Changelog podcast ever. It's obviously very foolish to make software choices based on how much you like its creator as a person. But Richard Hipp comes off as a guy so likable that goshdarnit, I hope his little project succeeds. Jokes aside, though, I don't think it's entirely coincidence that SQLite, something that is so good and reliable, was made by someone who seems so conscientious and thoughtful.

Though I have to admit, I was and am still disappointed to learn that the official way to pronounce SQLite is "S-Q-L-ite", i.e. "like a mineral".

Conscientiousness and empathy are rare commodities in software development. It's difficult for somebody to get out of their own head and consider a problem fresh, and it's not always personally rewarding to do so. It's also expensive, if you want to back it up with user testing.

You see it, even, in ostensibly public-consumption open-source projects, where the goal is clearly not merely to scratch one's own itch but to market the project for uptake for this reason or that. I feel like SQLite is a great example of what happens when a team is able to foster the kind of empathy to really understand how the user wants it to be.

Well, to be fair, it sounds like that it at least started out as a project where the programmer and the user were the same person?

I thought it was always pronounced "sequel-ite" personally

Last month I discover NGINX is not pronounced "en-jinx", and today I discover SQLite is not pronounced "sequel-light".

2020 is turning out to be pretty terrible guys.

You and me both, my friend.

Apparently I've been using S-Q-L-ite, not S-Q-Lite.

All these years I've been using "Engine-X", not "en-jinx", and didn't know it.. https://www.nginx.com/resources/wiki/community/faq/

I only knew it because I had to look up the pronunciation. I kept calling it N-G-I-N-X, but could never remember the spelling.

Wait until you hear how 'valgrind' rhymes with 'grinned'!

I've been getting that wrong for quite a while.

There's a certain irony in getting the name of a debugging tool wrong.

I have always pronounced "NGINX" as /ŋɪŋks/. So, I was wrong, too. But I always pronounced "SQLite" and "valgrind" properly, at least.

No love for "seekwelitay"?

This is how I will pronounce SQLite from now on.

I'm totally in on this. We can make it happen!





Diet Seek-well

Squeal-ite. Because SQL is "squeal", like the pig.

PostgreSQL is pronounced "post-gres". You absolutely never pronounce the "QL".

Sequel-lite here. I know that makes no sense phonetically.

And I thought it was S-Q-lite. S-Q-L-ite makes sense though. Most consistent.

I say S-Q-Lite except I pronounce the S and Q in Portuguese and the rest in English...

Luckily I code on my own, so nobody hears this abomination out loud

I do the same thing. Maybe we should talk.

I think there's a case for disputing the official pronunciation because it's based off an "official" pronunciation of SQL. I've heard it ess-queue-elle and sequel, but personally side with sequel. It was originally called “Structured English Query Language”, abbreviated and pronounced "SEQUEL"; later shortened to "SQL" but same pronunciation. So I figure sequel-ite is a perfectly legitimate pronunciation.

Adding to that, in the mid-1980s I worked at a company called Gupta Technologies, founded by Umang Gupta and Bruce Scott of Oracle.

Their first product was SQLBase, pronounced "sequel base". The next product, which I worked on, was SQLWindows, pronounced "sequel windows".

We knew that some people pronounced SQL "ess cue ell", but no one at the company pronounced it that way, ever.

That's how I do it. I understand the etymological reasons for "S-Q-L-ite", but SQLite is just too ubiquitous in my everyday work for me to speak 4 syllables when referring to it.

and it does sound like a mineral which is quite in line with how I see sqlite.. pretty, timeless and rock solid

MySQL is the same way. A lot of people say My-sequel but I think its supposed to be my-ess-que-ell

PostgreSQL is the fun one out because they officially say it is Postgres-Q-L.

The "it's always S-Q-L and never `sequel`" is IBM's fault and an early trademark infringement issue in computing. (IBM was told it couldn't call it "SEQUEL" by an aircraft company.)

For anyone curious about the trademark infringement, I hunted down the excerpt from the book that Wikipedia uses as a source (a book called Databases Demystified) and this is what it says in the book:

>> The forerunner of SQL, which was called SEQUEL (for Structured English Query Language), first emerged in the specifications for System R, IBM’s experimental relational database, in the late 1970s. However, two other products, with various names for their query language, beat IBM to the marketplace with the first commercial relational database products: Relational Software’s Oracle and Relational Technology’s Ingres. IBM released SQL/DS in 1982, with the query language name shortened to “SQL” after IBM discovered that “SEQUEL” was a trademark of the Hawker-Siddeley Aircraft Company. When IBM released its next generation RDBMS, called DB2, the SQL acronym remained. To this day, you will hear the name pronounced as an acronym (S-Q-L) and as a word (see-quel), and both are considered correct pronunciations.

I was hoping for a bit more interesting or detailed story.

I typically go with just "postgres" for phonetic pronunciation. Not sure how else it would be done; "postgres sequel?"

I've heard Postgres pronounced as if it were French. "Postgree."

I've heard that, but assumed it came from splitting up the name as "Postgre"+"SQL"; if you leave off the SQL part you get a word ending in E, and "postgreh" can't be right, can it? :)

Which is why I've also heard developers that either assumed the 'g' was silent or a transposition problem and you get "poster SQL", "postreg SQL", or worst of all "posgret SQL".

Somewhere, I believe in an HN comment, I saw a Postgres developer say that one of the biggest regrets of the project naming was capitalizing that "shared S".

I go with psql personally

genuinely not sure if you are joking or not, but that's not the same thing as postgres! :)

Postgres queue ell

> they officially say it is Postgres-Q-L.

I always interpreted it as Post-gr-S-Q-L. Which is basically the same thing but makes more sense to me because it’s as if the name had a silent “e”.

i'm weirded out by the fact that i always use the “official” pronunciations for all these products

We need a normal form for pronouncing and naming relational databases.

I’m half joking but it might alleviate people from being confused.

I thought the norm was to pronounce SQL as "sequel" in Microsoft products and "ES-QUEUE-EL" in every other context

The norm is to pronounce it by how likely you think you might get yelled at for trademark infringement by a holding company of the assets of an old aircraft company: https://en.wikipedia.org/wiki/Hawker_Siddeley

The name confusion is all IBM's fault.

> Chamberlin and Boyce's first attempt of a relational database language was Square, but it was difficult to use due to subscript notation. After moving to the San Jose Research Laboratory in 1973, they began work on SEQUEL. The acronym SEQUEL was later changed to SQL because "SEQUEL" was a trademark of the UK-based Hawker Siddeley Dynamics Engineering Limited company.

So that trademark has been un-revivable since 2005... long past time to start saying "sequel" for real :)

I preferred to call it "Squeal Server."

Absolutely. Squealer for short.

I looked it up, you're right, it is officially my-ess-que-ell. This leads to another question, how do you pronounce "NoSQL"? I couldn't find a convincing answer on the web. I used to say "sequel" for everything SQL related but clearly I was wrong.

"red iss"

Noskul, rhymes with nostril

my-ass-que-ell is the way to go

I'm still just gonna keep calling it "sequel light".

I pronounce it "ess que lite"

One of the many reasons I love sqlite and redis are because of the attitude of the developers. Sqlite has just an uncanny amount of testing, I wonder if it's the most formally tested piece of user software (outside of formally proven software and the space/medical industry I suppose).

And Salvatore (redis) has a way of simplifying issues that gives me a great deal of confidence in the decisions he makes and the code he writes. I really like the code and reasoning behind his 'linenoise' library in particular

I'll be honest and say that in my head, I think 's-q-l lite'

> Jokes aside, though, I don't think it's entirely coincidence that SQLite, something that is so good and reliable, was made by someone who seems so conscientious and thoughtful.

The same could probably be said about Redis and Salvatore Sanfilippo.

What’s the episode number?

One thing that I wish more people realized, especially ones developing native apps, is this: unless you really need an specific file format for your application, don't design your own. Just use SQLite. You get so much out of it (including features like undo), as well as a huge toolkit.

"SQLite does not compete with client/server databases. SQLite competes with fopen()." --- https://www.sqlite.org/whentouse.html

Even Apple, home of not-invented-here, built their "Core Data" object storage API (which is extremely prevalent throughout iOS and macOS apps) using SQLite as a backing store.

And you lose the ability to quickly change settings in a text editor, to have multiple settings files, to have tiered (system, local, project) settings files, to version them in source control, etc. Instead of learning ini, josn and xml you'll have to learn whatever schema project x uses and there will be more schemas than file formats. Plus SQLite might be fast, but it's still a lot more overhead than parsing an ini file.

> unless you really need an specific file format for your application

It's all about choosing the right tool for the job.

There's great reasons to use xml files over SQLite, but that mostly comes down to needing a file to be human readable versus file size.

In the product that I work on, I moved very small tables out to xml files because it's easier to inspect them in a text editor instead of expecting non-developers to figure out a database. These files are also small enough and are written infrequently enough that any performance difference between xml and SQLite is negligible.

The vast majority of data is in SQLite. An older version of the product would always write out a giant xml file, but when the xml got big, performance suffered. In this case, SQLite is the correct choice because it can make a small change to the database without needing to rewrite the entire file.

> And you lose the ability to quickly change settings in a text editor

There are plenty of SQL CLI/GUI clients that support SQLite.

But if you need end users to be able to change some configurations sure, move those to standalone files and parse them.

You don't lose the ability to have multiple or tiered settings files - there's nothing preventing you from using as many SQLite database files as you want.

The other points are valid, but not everything can be shoehorned into JSON or XML. Sometimes the data size is too large for the read-everything-into-memory approach. Sometimes you need to deal with a lot of binary blobs. And so on.

Why are you comparing schemas to file formats? Those configuration files still have their own user defined schemas.

> (including features like undo)

Can you please elaborate on this?

Transaction rollback is probably what he meant. If that is a familiar topic to you the relation to undo should make sense and his point should be clear. If that doesn't make sense, look up ACID, Transactions, and Migrations in the context of a database.

I had a different interpretation. It's not uncommon in commercial code to use a sort of copy-on-write behavior where every edit creates a new row and marks the other one as dead (for instance, filling in or blanking out a column to mark it defunct). You leave an audit trail behind.

Which, with a little bit of work, can be used to revert to earlier versions.

While everyone's praising SQLite, I want to note that SQLite type system is absolutely horrible. When you can store string in the integer column, you don't have a type system. And it's absolutely not intuitive for people having experience with other databases.

All you have to do is put a column constraint on there. CREATE TABLE t ( intcol CHECK (typeof(intcol) = ‘integer’));

I agree, if you want dynamically typed columns then just define a separate Any or Variant type. I don’t see the point of schema if it’s not enforced.

Hipp talks about that in the interview a little bit:

> SQLite really started life as a Tcl extension, Tcl being the programming language, the Tcl/Tk. The project I was working was working on was written in Tcl/Tk and so SQLite began as a Tcl extension and as a scripting language, like Perl or Python, where you can put any type of value you want in a variable. So a variable might hold a string, a number, a byte array or whatever. So I made SQLite the same way, where just because you’ve declared a column of a table to be text doesn’t mean you can’t put integer in there, or just because you declared a column in the table to be a short int, why not put a 2-megabyte blob there? So what? It’ll do that.

> Or if you have a column that’s declared integer and you try to put text in it, it looks like an integer and it can be converted without loss. It will convert and store it as an integer. But if you try and put a blob into a short int or something, there’s no way to convert that, so it just stores the original and it gives flexibility there. And this is useful in a lot of cases, because sometimes you just have a miscellaneous column in a table that you might need to store lots of different things in. And in traditional database systems you actually have to have multiple columns, one for each possible data type, whereas in SQLite you put it all in one column. So it works well.

> And for that matter, with SQLite you don’t have to give the column a type at all. You can just say, CreateTable T1 (a,b,c) and then you’ve got a table with three columns named a, b and c and you put whatever you want there.

> Well, it flows directly out of the scripting language traditions. You don’t declare types for variables in Tcl; you didn’t used to do it in Python, I guess you can do it some, now. You don’t do it in JavaScript… You just say it’s a var.

This apparently has roots in some ancient practices (Tcl is not exactly young). In 21th century, we likely don't need to save a few bytes and reuse a variable, and can afford to preserve our sanity instead. Well, we can even afford a built-in ACID-compliant SQL database!..

SQLite's quirks page [1] actually suggests the same thing:

> In retrospect, perhaps it would have been better if SQLite had merely implemented an ANY datatype so that developers could explicitly state when they wanted to use flexible typing, rather than making flexible typing the default. But that is not something that can be changed now without breaking the millions of applications and trillions of database files that already use SQLite's flexible typing feature.

I wish they had done this in the first place. I wish even now that they implemented a pragma to enforce types (and introduce the ANY type for folks who explicitly want it). But SQLite has enough good things going for it that I'm happy to use it in spite of this problem.

[1] https://www.sqlite.org/quirks.html

I don't see why this must be a breaking change.

(1) Add a flag when opening a database to show that you want data types enforced.

(2) Add code to all the input pathways to check for type compliance. This way a database with correct data types will stay correct.

(3) Do not change anything in the storage layer.

What am I missing?

Seems like (1) is doable, as long as it's opt-in. But if it becomes the default behavior, existing applications would break when they do something against the type system. I'm not sure what (2) buys you.

(1) is the interface to trigger (2).

This is very similar to what they did for foreign key support.


I finished the podcast a couple hours ago. It was pretty clear to me that Hipp has the perspective of a person working on small teams, where most of the changes to the system either don't affect you or are highly visible.

This is not a man who fears the existential dread of discovering, just after you've stated loudly that some part of the system could not possibly break that way, that some other developer has pulled the rug out from under you when you weren't looking.

The odds that I add the string "52" to a column I know damn well has integers is pretty low, but non-zero. And I'm gonna transpose some fields eventually. The odds of any particular coworker doing the same is some multiple of that. But q^n across my entire team, across months and years, across new hires and people switching into and out of the team? We are so deep into Murphy's Law territory that it's almost a certainty.

It's a fool's errand to try to prevent absolutely every one of those scenarios happening. All I ask is that we have a way to discover them early and cheaply.

I actually prefer SQLite's take on type affinity: https://www.sqlite.org/datatype3.html

Can't you just add CHECK constraints then?

It's absolutely intuitive for people used to programming languages that aren't strongly typed, like Python.

Forgive me, I was confusing strongly typed with statically typed. My point still stands though, for someone used to a dynamically typed language like Python having database columns that aren't locked to a single rigid type will seem quite natural.

It doesn't though. There are almost no dynamically typed languages where you supply a type yet don't enforce it, maybe forth is the closest with its stack effect comments. If you give me a type, I do expect a type, if you omit the type fine, I'm gonna assume it'll be dynamic at runtime.

All these dynamic languages are also strongly typed for a reason, even javascript uses === everywhere instead of the weakly typed ==. The bugs that occur because of automagic are just ridiculously hard to track down, and even statically typed languages can fall into this pit, for example Scala with its implicits shudder.

But SQLite does the worst of all worlds, you have to specify a type, yet you cannot rely on it, it sometimes does type conversion for you e.g. int to string and back, and sometimes does not.

They should've just implemented a single ANY type or have the syntax not use types at all and deviate from the SQL spec (which they do a lot anyways).

Sure, it is simple and that is why people use it.

But I have also run into hard problems using it. Deleting and renaming columns is not supported natively. On the internet one can find procedures to do that but they generally do not take foreign key references into account. Getting this right is a hard problem. In the end we had a function that could be used to rename or delete a column even without switching off foreign key checking but it was not simple and it was not the first iteration of said function and earlier iterations caused lots of subtle problems, like a column having a default value or not depending on whether the database was a new database or was created using schema updates.

Use at your own peril.

Not disagreeing with your overall point about SQLite's feature set, but IIRC, `ALTER TABLE RENAME` was added in 2018, for version 3.25.0: https://www.sqlite.org/releaselog/3_25_0.html

(I chime in only to remark that I myself am constantly surprised that SQLite continues to add these kinds of features – e.g. WINDOW functions were also in 3.25 – long after I had accepted their absence as a fact of life)

Ah, that is interesting. My experiences were before 2018 and I had not kept an eye on it, so I did not know this.

Ow, but looking at the docs they still do not seem to support delete yet.

Yeah, its a pity it doesn't have the ability to do anything more than alter table statements that add to a schema, having to rename the table, create a new schema and copy its contents from the old table can be quite a tricky and troublesome thing to do if you've got a live system running with users on it.

I've never had an issue altering or deleting columns in SQLite. Grant it, I've used both Sequelize and SQLAlchemy as ORMs.

But if you're using SQLite for simplicity there's a good chance you're not interested writing raw SQL statements either.

Yeah, I'm using SQLite for simplicity and exactly because of that I want raw SQL instead of some leaky complex ORM abstractions.

I use EF core and instrument all my queries - if I see a slow one I hand optimize it. Best of both worlds I think - and I say that as someone who loves to fiddle with SQL.

Also worst of both worlds - you need to know very well both the underlying database (SQL) and also the complex leaky abstraction (EF Core) and how are those abstraction translated into the SQL.

I'm no ORM hater (use Hibernate daily) but claiming that using ORM will simplify working with the database is at least misleading.

For some reason people think it's okay to be ignorant while using an ORM. What you are interacting with is still a database! Treat it like one and you will save a lot of time in the easy situations.

Pretty sure SQLA the ORM has nothing to do with schema migration, which is usually handled with alembic, the sister project. Not sure about Sequelize.

Now, there’s nothing magical about alembic’s handling of renaming or deleting columns for SQLite; it has to recreate the table just like what you would do if you’re hand rolling the SQL, so it has the same problems.

Details: https://alembic.sqlalchemy.org/en/latest/batch.html

While on SQLite, Hipp's fossil-scm is also another great piece of work. My personal favourite to start projects since I can keep code, docs and issues all in a single .. SQLite file! fossil is also just a single executable with no other dependencies and comes with a built-in web server complete with user management if you want to expose to a small group.

The main and real reason I love fossil is that I'm always at peace knowing that I can't accidentally bungle up the history. I'm always nervous about that when I use git .. still.

> knowing that I can't accidentally bungle up the history. I'm always nervous about that when I use git

I just read a great insight yesterday:

> Understanding there’s no harm to be done eases things a lot. It encourages experimenting, which is one of the most effective ways to learn new things and to get familiar with them. Imagine this: you start rebase, mess things up, hit Ctrl+Z several times and try again. Good news is, we can implement Undo/Redo given just ordinary Git implementation. Git already stores all the possible states our repo was in. All Undo/Redo will have to do is to shuffle some branch pointers here and there.


You can still bungle history with badly authored commits, though. This is exactly why I like Git - no need to think about how messy the history is while working on something, I can create lots of small commits tweaking things as I go, and then clean it all up into larger commits representing logical groups of changes, with proper description etc.

I'm fine with my commit flow and commenting, which perhaps helps. I do use orphaned dirty branches. It helps that fossil's tags are reusable, unlike git's.

For me, there've been cases where I've worked on an angle, only to discard it for another approach .. and much later wanting to revisit that angle because it was a good idea. If I discard history to "clean it up", such a resurrection becomes impossible.

We used SQLite at my company to allow users to write SQL queries against their db. When we hit the limit of it, we had to switch to Postgres. That migration is quite difficult and I wish we had used Postgres from the start. 20/20 hindsight but that was my first thought.

Did you use WAL?

For anyone wondering why: I was working on a hobby project a few years ago that had to do a lot of inserts in a SQLite DB. Performance was ok, but it wasn't great either. Turning on WAL greatly sped up performance. In both cases just a single thread was writing to the database.

With the WAL turned off (the default), when you start a transaction SQLite will duplicate the data into a new file, change the original file, and delete the backup file when the transaction is committed. With WAL turned on it will write each change to a journal file, and merge this into the main file periodically (by default when the change file grows to 1000 entries) - so only a single write is needed.

Were you using an ORM? I ask because most people use database switching as a selling point for using an ORM. I'm rather indifferent on the matter, but I'm curious.

Ha! This selling point is just one in a miserable litany of poor reasons to use an ORM - you absolutely cannot simply switch between databases without doing some work to ensure that the data is migrated correctly and the SQL statements translate properly.

Yeah, but verification of correctness is much less work than implementation.

As an alternative to an ORM, there is another great abstraction layer that works across a large number of databases. It's called SQL!

Until you realize that not even booleans are standardized between SQL dialects...

That's true in theory, but unfortunately you can still run in to issues when different databases support different parts of SQL, and the db you're migrating from has different features than the one you're migrating to.

There are a huge number of differences in the SQL supported by different engines. You can't just switch from one to another, unless you're only using a small subset of SQL to begin with.

How do you insert a row and get the automatically set Id set by the database portably? That’s a standard create operation.

You still have to migrate the data. I faced the same kind of dilemma, but with MySQL, and kind of noped out when I got to stuff like [0].

[0] https://stackoverflow.com/a/87531/1210797

An ORM does indeed force you to write lowest common denominator code but I wouldn't rely on that.

Do you have any good articles or posts you can think of for managing the migration, in case there're some lifesavers out there.

Not an article, but I've used pgloader for this purpose in the past:


Great tool, I can't recommend it enough.

I'm glad something like SQLite exists. It's awesome if you want to store data in your applications without falling back to a full SQL database. I am also thankful for those client applications like "DB Browser for SQLite" [1]. It nearly provides the same functionality like SQL Management Studio, which is quite an achievement.

The only gripe I have with SQLite is with full text search FTS5. The last time I tried to store and retrieve text with associated metadata from another table was really confusing. It kinda worked in the end, but then I also realized that you can't do a contains-query on your text, just starts-with and ends-with [2]. This is a major flaw in my book. So many times I just want to dump tons of text into a table and want to query single words naturally, but I can't do that without resorting to hacks or unoptimized LIKE queries. And if I need to take into account typos or similarity, I'd have no idea how to do that with SQLite. Does someone have experience regarding this?

[1] https://sqlitebrowser.org/ [2] https://stackoverflow.com/a/24670334/998987

Something I see some developers do which I don't recommend is using SQLite for testing environments when they're using PostgreSQL in production. SQLite3 doesn't support a lot of types that PostgreSQL does and it doesn't support things like full outer joins. Save the headaches of debugging in production and spin up a docker PostgreSQL instance since testing environments should emulate production environments as much as possible.

Yes, I often see Java developers using an embedded H2 database [1] for development and test, and a "real" database in production, and there are always unexpected issues of false-positive tests and functionality due to this. Terrible anti-pattern. I am glad "Dev/prod parity" is recognised in The Twelve-Factor App manifesto [2].

I do what you suggest: run dev/test against a real PostgreSQL (of the same version) running in a container. In fact, I take this further and run builds against an actual PostgreSQL, since I like to generate code based on the DB schema using jOOQ [3].

The build proceeds like so:

1. Spin up a fresh PostgreSQL container

2. Run my schema migrations against it

3. Run the jOOQ code generator against it

4. Compile code

This provides some additional guarantees, including:

1. My queries are fully type-safe from my code down to the database schema level

2. My schema migrations can be run against an empty database

[1] https://www.h2database.com/html/main.html

[2] https://12factor.net/dev-prod-parity

[3] https://www.jooq.org/doc/3.13/manual/code-generation/

> and it doesn't support things like full outer joins

This at least is possible to fake. But in the era of premade docker packages for postgres, I do wonder why the practice you describe persists.

I know some people with a certain generation of Macbook or macOS version have had bad luck running docker which causes their fan to go nuts and resources get drained so they prefer to avoid docker if possible.


The Micro-Orm I use in C# manages to work around this well enough; the only thing SQLite is really hard to work with is Time since there's no real structure defined for it.

SQLite is a brilliant lightweight database which is available everywhere. All mobile OSs support it and it's a breeze to run on even embedded Linux. I just wish web browsers would also support it and make webSQL a standard, it would make programming for the web so much easier.

Once I discovered SQLite, I never went back. It’s so easy. I’m still shocked when I see docker compose files that spin up entire MySQL dbs when a simple SQLite connector would be so much easier!

Whether it is in staging, in dev or in tests, use the same DB engine as in prod. It will save your butt sometimes.

Also, my docker-compose for a Postgres container is literally:

        image: postgres:10
          - "5432:5432"
"so much easier" seems very hyperbolic. Not denying that SQLite is an amazing piece of software though!

How do you ensure persistence in production?

With postgres I just buy a managed setup from heroku, GCP or AWS. Backup is automatic, and there is a follower to takeover if the leader does/crashes.

It's easy to backup an sqlite snapshot, but you can't do that after every operation.

Curious, how do people solve this?

Actually, it was about to become standards. Some browsers implemented it, but then they decided not to go ahead as all browsers will have the same SQLite implementation.

It is weird in the hindsight as with just Webkit(which is most of blink still) and Gecko, there are just two engines left. And when Gecko goes down (which is losing market share), this would seem even more stupid. SQL is a standard. They could have standardized on it rather than an implementation.

SQL is a standard, but SQLite doesn't implement "standard SQL", it implements "SQLite SQL".

For better or worse, the web has always been driven by specifications, rather than implementations. Properly standardizing all of SQLite in a spec is not trivial, and as much I think SQLite is great software, I don't think that relying on SQLite the library specifically would be a good idea. What if the latest version of SQLite does something that's not in the spec or prohibited by the spec?

Either way, there's IndexedDB now, which is supported by almost everything, including IE.

> For better or worse, the web has always been driven by specifications, rather than implementations.

Surely you're joking ? Having things done by specifications first has been extremly rare, pretty much everything rather followed the "make the specification after the fact to be whatever X browser has done and is used". Protocols, markup, css, javascript, the whole thing was done by implementation.

There's a difference between "standardizing an implementation so everyone can use it" and "just use this standard implementation".

But IndexedDB has been struggling very hard from day 1 to be as fast and performant as WebSQL. You can do the benchmark yourself [1]. Try inserting 10000 records. WebSQL is by orders of magnitude faster than IndexedDB.

The people who rejected WebSQL proposal had a lack of decisiveness and technically-focused vision. They also interpreted everything very politically.

[1] http://scaljeri.github.io/indexeddb-vs-websql/v2.0.0/

> For better or worse, the web has always been driven by specifications, rather than implementations.

What? The specifications document what browsers already do. Have you seen the HTML5 spec? It's mostly formalization of error-recovery, showing how to interpret previously technically-broken documents in the way modern browsers have converged on. It's not what anyone would design, but there's no one driving.

Sure, but the requirement is there should be at least two independent implementations (as there is for the error recovery in the HTML5 spec). This was never going to happen with SQLite.

Internet Explorer would like to have a word with you. It was clearly an “implementation over specification” effort.

But the policy is to only accept a standard where there is at least two independent implementations. This was realistically never going to happen with the SQLite.

>... they realized that if this is a critical part of their infrastructure, they needed to make sure my business was sustainable.

>So companies which are typically large companies that really depend on SQLite as part of their product, they just pay us an annual fee.

>... we are a 100% engineering shop.

An intriguing option for anyone sitting on an original infrastructure solution, and looking to turn it into a livelihood, with quality of life as first priority.

The interview transcript is much appreciated! I hate it when random two-hour podcast has a 15 minute interview I want to listen to, and have to go hunting for it.

Which is a shame because most podcast apps have chaptering systems where particular interviews and segments can be easily bookmarked. Even without that feature show notes can include links and timestamps to particular segments too but many shows don't bother because it's extra work.

Because they took reliability seriously.

If the people who did the original Macintosh "resource fork" updater had been serious about maintaining consistency of the data structure, that approach would have lasted longer. That was a really good idea, badly implemented because they had to cram it into 128KB and use a floppy. Unfortunately, it wasn't rewritten when the Mac line got more memory and hard disks.

Somebody had to get the open source world past using text files for everything.

I came to SQLite recently having used big iron enterprise tools for my analytical work previously (namely Oracle), and wow, what a lovely elegant tool. It isn't just super useful for web and embedded applications, it is quite a powerful and convenient analytical tool for quickly ingesting huge flat files and querying them.

AFAIK nomadlist.com by @levelsio is using SQLite in production. He has a bunch of data, plenty of users and his site is very fast. Proof that you can use SQLite in production on a successful project.

Serious question, how in the heck did the podcasters get their transcript? Did they pay someone to edit a machine translation? That is one of the coolest things I have never seen for any podcast!

I don't know how they seeded the transcript set (looks like this one landed in 2017), but nowadays, transcripts are accepted via GitHub.


Yes, we pay a human to transcribe each episode and then we open source the markdown on GitHub for community collaboration/improvement.

Yeah except that is a different podcast...

... except it's the same answer, only their response is more detailed.

Meta: this podcast's RSS feed only goes back so far (2018). So I have no way to add/listen to this episode in my podcast app?

We used paged feeds[1] for our RSS, but not all podcast clients support it. I might end up going back to a full feed, but remove the show notes, etc from older episodes to keep the file size smaller...

[1]: https://podlove.org/paged-feeds/

Ok that makes sense. But I use Overcast, so how can I listen to it?

I use Huffduffer[1] either by searching for someone who has added it already[2] or adding it directly with details and a link to the mp3 myself. Then add your huffduff feed to Overcast (can add based on tags also).

[1]: https://huffduffer.com/ [2]: https://huffduffer.com/neaj/543363

I had the same problem and ended up uploading it via http://overcast.fm.

I've worked on a few internal tools for the company I intern at and I've always been told to use Postgres. Don't get me wrong, I love Postgres, but for an internal application that won't get any significant concurrent interaction, Sqlite is the way to go in my opinion. Maybe I'll try to mention it at the next meeting.

And you'll be told their production is using PostgreSQL.

Having identical environment makes life easier. I never run anything on my Mac despite many Unix daemons being available but either run it remotely on Linux or locally via Linux VM with same distro.

Is there a database file format that supports columnar storage query able via SQL? For OLAP style queries.

I've tried SQLite for this usecase but the query falls down when querying millions of rows.

I know there is parquet file format but I'm wondering if there is like a SQL engine that can sit on top of something like that similar to SQLite

Here's the MP3 file for those who can't stand 1x playback speed: https://cdn.changelog.com/uploads/podcast/201/the-changelog-...

I found this fascinating, as a student we are taught to use SQLite and then told to use PostgreSQL for production. I am told that a large amount of mobile native apps use SQLite. Does anyone have any stats on this? Seems like a nice choice, but the searches I have done are all over the place.

Is there really any other file based SQL alternative anymore?

Back in the day on Windows you would use the Jet (MSAccess) db for this purpose or maybe dBase before that in DOS. Many apps that need to store data would use them as it did not need a server process just a library.

I guess you have some NoSql options like LevelDB, but its really nice to have a full ACID SQL engine running in process.

I always thought it would be great if PostgreSQL could separate into a library for the DB with a server that uses the lib or you could use the lib directly in your process but I would imagine the process model is baked heavily into the DB engine.

h2db for java

Don't forget about SpatiaLite! Having a GIS-capable lightweight database, without relying on a typical DB system is pretty nice too.

My app had been running on CodeBase (dBase clone from Sequiter) for twenty-five years. When we converted to macOS, CodeBase wouldn't compile for 64-bit. Panicked, I turned to SQLite. Miraculously, we were able to switch in a couple of weeks. It is indeed a pleasure to work with. Especially with tools like DB Browser to poke around with.

We are currently working on IoT solution where we deploy many distributed computing units with leaf devices and SQLite works flawlessly to keep local state on the remote devices. There may be other solutions but this was a no-brainer for us and so far we haven't had any major issues with it.

Obviously because it's awesome in so many ways. The only thing I miss when using SQLite are implementations in high-level programming languages. In many cases (e.g. with .Net WinForms) using a native library is a pain while the speed native code provides is not always necessary.

I also recommend this great video walkthrough of SQLite from Richard Hipp: https://www.youtube.com/watch?v=Jib2AmRb_rk

Benedict rule.

It does one thing and does it well.

I use SQLite for demo projects and it is very cool!

TL;DR because it's simple.

I'd say there are a load of reasons, all of which combine together to make it unbeatable:

* Simple to embed into you application (no setting up a separate database, &c).

* Fantastically backwards compatible

* Public domain (so easy to comply with the license)

* High quality: Reasonably fast, reasonably efficient, reasonable feature support.

The lack of any of these would add significant drag. It's difficult to imagine any other project catching up and displacing it at this point.

I personally cannot see any near term replacement for what SQLite offers. In my world, SQLite became the way in which I now manipulate a scope of structured data on a single logical disk. I mostly view it as a very reliable and friendly layer on top of basic file IO which also happens to be compatible with virtually every computer on earth. I can copy a SQLite database out of an embedded microcontroller environment and directly open it on my x86_64 workstation, edit some values, and then copy it back down without any fears of compatibility issues.

Is it simple, though? I mean, reading about how they test it[1] leads me to believe it is not simple. Maybe it's simple compared to Oracle, and maybe using it is simple, but the actual implementation is definitely not simple.

[1] https://www.sqlite.org/testing.html

I don't care about the implementation and I congratulate the developers on abstracting all of that complexity away from me. That's what good software does.

I misinterpreted your comment. I thought you were saying SQLite was successful because SQlite itself (i.e. the implementation) was simple, not because it was simple to use.

For what it's worth, I use "simple" for talking about the complexity of the thing, and "easy" for the user experience. Quite frequently easy things have to be complex because the cognitive load gets shifted into the machinery.

E.g., compared with a Model T, modern cars are much easier but much more complex because the car itself understands things like adjusting the ignition timing and safely getting the motor turning, things that a Model T user had to understand if they wanted to avoid getting a broken arm before they got out of the driveway: https://www.fordmodelt.net/how-to-drive-ford-model-t.htm

> I mean, reading about how they test it[1] leads me to believe it is not simple.

They are unusually thorough in their testing. I see that as a good thing, not a sign it isn't (relatively) simple. In fact, I don't think any other database engine is tested as thoroughly. I can't even imagine how much effort it'd take to do so for Oracle or even PostgreSQL, except that it'd be prohibitive.

It's simple to use.

Most of the time, yes, if SQLite supports the filesystem you are using. If the filesystem doesn't have a particular feature SQLite needs, it can be difficult to get working the way you want it to.

What scenarios/filesystems lead to these kinds of conflicts?

Well the issue I ran into was that I wanted a SQLite db to have one writer and many readers. However, what I was seeing was that queries were failing because the DB was locked at the time of reading. The solution is to switch journaling mode to WAL[1].

However the particular platform I was developing for was an embedded system using the JFFS2 filesystem. JFFS2 doesn't support shared mmap, which is a feature SQLite needs for WAL.

So basically, it was big pain - I had to setup a tmpfs mount point where the WAL-enabled DB resided and then periodically sync it to flash.

[1] https://www.sqlite.org/wal.html

Why didn't you just do more granular writes and let the different processes lock the file? SQLite is impressively fast, it can handle a lot before you need a full database.

I don't know how to do that.

Basically there is one process periodically dumping stats into a sqlitedb. There is an HTTP API that can retrieve said stats. How do I make it so the HTTP API doesn't occasionally fail when trying to read from the DB?

Retry until it gets the lock and don't hold the lock longer than you need to to write to the db.

Really though if you are trying to read and write to a db concurrently and over the network, it sounds much more clear cut for something like postgres.

I don't really like the idea of a retry loop. Why doesn't sqlite have an API to just block until the DB is available?

You asked the question and now you don't like the 'idea' of the answer. Use real numbers instead of feels, and again if you need real concurrency, use a real database.

"Retry" is not a solution; it's a workaround. Imagine if that's how opening files worked: "Oh yeah, if the file fails to open just retry a few times until it works."

There's a small chance even retrying 3 times you could fail all 3 times in a row. WAL journaling is a solution.

Not exactly a filesystem, but it doesn't work on SMB shares, which is an enormous pain in the ass. And I'm not even talking about concurrent access - just one user. I get that SMB file locking is a mess, but making it a hard fail by default was a big mistake IMO as most develoers using the library don't make special cases to allow this.

The documentation is very clear than SQLite over a network isn't a good use case, for reasonably sized data. Item 1 from "Checklist for choosing the right database engine" [1]

> 1. Is the data separated from the application by a network? → choose client/server

> Relational database engines act as bandwidth-reducing data filters. So it is best to keep the database engine and the data on the same physical device so that the high-bandwidth engine-to-disk link does not have to traverse the network, only the lower-bandwidth application-to-engine link.

> But SQLite is built into the application. So if the data is on a separate device from the application, it is required that the higher bandwidth engine-to-disk link be across the network. This works, but it is suboptimal. Hence, it is usually better to select a client/server database engine when the data is on a separate device from the application.

1. https://www.sqlite.org/whentouse.html

I do wonder how much of the time that SQLite is used, that it wouldn't be even simpler to just dump things to an XML file.

Here are some considerations why: https://danluu.com/file-consistency/

Thanks for sharing that link, it's on the most valuable things in this thread for me! (I loved SQLite already and one HN thread more or less isn't going to change that)

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact