Appropriate Uses for SQLite

danso · on Sept 26, 2016

I teach SQL to non-techie students. I used to give the the option of doing either MySQL or SQLite, but not only did I underestimate how different the syntaxes were, I also underestimated how not-trivial it is for students to successfully install and run both the MySQL server and client. These are students who can't even use a spreadsheet well, not that that makes a huge difference in understanding databases.

I've moved everything to SQLite and couldn't be happier. Not only is it easier to distribute assignments (e.g. a single SQLite file, instead of CSVs that need to be manually imported), it does everything I need it to do to teach the concepts of relational databases and join operations. This typically just needs read-only access, so our assignments can involve gigabytes of data without issue.

Yaggo · on Sept 26, 2016

"I used to give the the option of doing either MySQL or SQLite"

SQLite is good choise for absolute beginners.

Later, when teaching "real" multi-user RDBMSes, although MySQL may be more popular, it makes more sense to teach PostgreSQL as "default open source database". Both will do the job, but PostgreSQL has got more stuff right from the beginning, which is especially important when learning. Think PHP vs. Python, PHP may cut few corners but it's not ideal language for teaching generic concepts.

danso · on Sept 26, 2016

SQLite teaches virtually all of the necessary concepts to jump right into a multi-user RDBMS, even most of the syntax, with the exception of being more lax about datatypes. My audience is not people intending to build applications, but to do data analysis beyond what they can do in your typical spreadsheet. I don't think the PHP vs. Python analogy is particularly accurate. I don't feel that PHP is easier to learn than Python unless the main goal is to produce something public-facing on the web. Nevermind that the syntax is significantly different.

yannis · on Sept 26, 2016

>Think PHP vs. Python, PHP may cut few corners but it's not ideal language for teaching generic concepts.

Maybe not even Python, suggest go or rust. My generation we started learning CS with Pascal and moved to C, when we had a bit more understanding.

stavros · on Sept 26, 2016

I think it's "Python will teach you more about programming, C will teach you more about computers".

jstimpfle · on Sept 27, 2016

I'm not intimate with Go, but I figure its clean and simple design makes it a good first language; and in particular a good preparation for C, because of its focus on memory layout.

Rust? NO! One of the worst choices for a first language. Similar to C++, Rust is really heavy on concepts.

pbhjpbhj · on Sept 26, 2016

If it's just too learn SQL then would a LibreOffice database do, perhaps they removed it but you used to be able to use (or perhaps just import?) a spreadsheet as a DB and run SQL against it. Definitely did joins and order-bys using standard SQL syntax.

barking · on Sept 26, 2016

Libreoffice now use firebird under the hood I think.

Firebird has a more complete support for sql than sqlite has. It also has the advantage of being able to be used both in a manner analogous to sqlite or as a full blown server database.

drvdevd · on Sept 26, 2016

I actually think SQLite could probably be used from beginning to advanced level, depending on what you're teaching (relational theory for example), but perhaps I'm not aware of its' limitations in that regard.

That being said have you considered using docker (or other popular container tech) for making installation of MySQL easier?

drvdevd · on Sept 26, 2016

And I should also add - what about cheap cloud SQL instances/services as well? Perhaps the students could learn a bit about how simple it is to use some cloud resources too without the management overhead?

RangerScience · on Sept 26, 2016

Note: http://postgresapp.com/ is trivial to install, although it only works on macs, is still more involved than a file, etc etc...

danso · on Sept 26, 2016

The one thing missing from SQLite is stronger type casting...however, SQLite does have a solid, cross-platform GUI in DB Browser [0]. It's not as nice as Sequel Pro and some of the commercial clients for MySQL. Postgres, unfortunately, doesn't have near the variety of reliable clients that MySQL and SQLite do.

However, SQLite follows most of the standard syntax that Postgres does, so having students move right into CartoDB, which lets you run raw Postgres, has never been a problem.its just the self hosting that's a pain :).

One more thing that I wish I could have from Postgres as a teacher: how it returns an error when you include non-aggregated columns in the SELECT clause. In MySQL and SQLite, selecting extra columns won't throw an error, and the results look close enough to correct to be fairly dangerous for novices. Better to just throw an error as Postgres does.

[0] http://sqlitebrowser.org/

placebo · on Sept 26, 2016

Another good cross-platform GUI for SQLite that I've been using for years is TkSQLite (http://reddog.s35.xrea.com/wiki/TkSQLite.html)

justinclift · on Sept 27, 2016

Cool timing. :)

We (DB Browser for SQLite team) put this one-pager up yesterday about our upcoming "SQLite in the Cloud" project. :)

https://dbhub.io

mjmsmith · on Sept 26, 2016

Postico (https://eggerapps.at/postico/) is a really nice PostgreSQL client for OS X.

googie · on Sept 26, 2016

Yet another free, open source and multiplatform GUI is SQLiteStudio ( http://sqlitestudio.pl/ ).

elchief · on Sept 26, 2016

Sqlite was a postgres fork, off of 8.something, IIRC

pmontra · on Sept 26, 2016

It owes something to PostgreSQL but it's the syntax, not the source code. I quote from https://en.wikipedia.org/wiki/SQLite#History

    Hipp based the syntax and semantics on those of PostgreSQL 6.5. In August 2000, version 1.0 of SQLite was released, with storage based on gdbm (GNU Database Manager).

elchief · on Sept 26, 2016

According to these slides from Sqlite founder, Sqlite was a "spin-off" of Postgres: http://www.pgcon.org/2014/schedule/attachments/319_PGCon2014...

According to this, "SQLite was originally written from PostgreSQL 6.5 documentation" https://www.pgcon.org/2014/schedule/events/736.en.html

So, "fork" wasn't the right word. Apologies.

yoo1I · on Sept 26, 2016

No. This is incorrect.

izacus · on Sept 26, 2016

While that may be true for Apple owners, it's very untrivial to install and properly configure (permissions, etc.) on Windows and Linux.

SqLite usually "just works", can easily be reset if you do something wrong (just delete file!) and has better GUI browsers and tools.

oliwarner · on Sept 26, 2016

Linux? We're okay thanks. Apt and YUM (and the rest) make installing both MySQL and Postgres very easy.

ufmace · on Sept 26, 2016

I was rather surprised to find that installing PostgreSQL on Windows was loads faster and easier than installing MS SQL Server.

qwertyuiop924 · on Sept 26, 2016

SQLite is quite possibly one of the most useful pieces of software ever created. It's small, relatively fast, and unbelivably solid. It's up there with bash, curl, grep, emacs, and nano: tools that are just so good at their job that we don't even notice how amazing they are.

I mean, really. SQLite is remarkable, impressive, and used everywhere, and we never talk about it. Emacs is a remarkably impressive piece of engineering, bash is the world's default shell for a reason, Nano is the newbie's text edior, and, well, just imagine for a second what would happen if grep or curl stopped working.

seagreen · on Sept 26, 2016

  It's up there with bash

Clearly you've just arrived from some wonderful alternate universe where bash means something different than it does on Earth. Welcome traveler!

  bash is the world's default shell for a reason

Here on Earth that reason is network effects ("If I write it in bash, it will run anywhere!"). Bash is an bad language. If you've mastered bash you can have the honorable feeling of mastering a difficult, ugly, but practical skill (see also: knife-fighting, driving a motor vehicle, running for office). But there's no need to be mean to SQLite by comparing the two.

ufo · on Sept 26, 2016

The internals of bash's implementation are also notoriously awful. Its one of those legacy C code bases that are very hard to work on.

oblio · on Sept 26, 2016

Wasn't that sh (aka Bourne shell)? With the infamous "Pascal macros":

http://minnie.tuhs.org/cgi-bin/utree.pl?file=V7/usr/src/cmd/...

Or is bash just as bad?

david-given · on Sept 26, 2016

There's a more recent version:

https://github.com/EtchedPixels/FUZIX/blob/master/Applicatio...

I spent some time making it work on modern machines, and it's... I don't really want to use the word 'bad'; because I don't think it is. I kept finding issues which I would spend ages tracing through the code muttering about ancient C which didn't understand alignment and stuff, and then discover that the code was actually doing everything right and the problem was at my end. I couldn't find a single bug in it.

But incomprehensible --- oh god yes. The way it handles memory is really bizarre. The parser is really bizarre (there's the famous Tom Duff quote: " Nobody really knows what the Bourne shell’s grammar is. Even examination of the source code is little help."). And then they interact!

laumars · on Sept 26, 2016

Oh my, browsing around that repo and some of versions of sh[1] has possibly the worst C I've seen in production.

Do you know the reasons for the "Pascal macros" as they don't even make the C code Pascal compliant?

[1] http://minnie.tuhs.org/cgi-bin/utree.pl?file=SysIII/usr/src/...

TazeTSchnitzel · on Sept 26, 2016

Do you mean the ALGOL macros, which transform C into the so-called “BourneGOL”? They're there because Mr. Bourne liked ALGOL a lot.

laumars · on Sept 26, 2016

ALGOL would make sense too. Thank you for the explanation.

TazeTSchnitzel · on Sept 26, 2016

Bash is a clone of sh, but it has its own problems.

jsmthrowaway · on Sept 26, 2016

Funny, I instead had that reaction to nano being there after I read "good at their job." I mean, that's one way of looking at nano, I suppose, depending on how you define "job"...

qwertyuiop924 · on Sept 26, 2016

Partly because listing emacs or vim would launch a flame war, but Nano's intent is to be a small text editor, friendly enough for newbies, to handle small jobs, like writing an email, or editing /etc/fstab. And it does that pretty well.

jsmthrowaway · on Sept 26, 2016

But you listed emacs?

qwertyuiop924 · on Sept 26, 2016

I forgot I did that. It was because Emacs is really technically impressive. It's a bit ugly in places, but on the whole, it's remarkably well designed.

witty_username · on Sept 26, 2016

Yeah, nano is good for people like me used to GUI editors but it isn't amazingly good.

glormph · on Sept 26, 2016

I figured the poster was just avoiding naming either text editor that gives around half of the HN population kneejerk responses.

JdeBP · on Sept 27, 2016

Brief versus WordStar yet again? (-:

This would cast nano in the rôle of Tedit, of course.

kragen · on Sept 26, 2016

Before bash was the world's default shell, it was csh, then tcsh. At my university, if you switched your shell to bash, the sysadmins would put you under extra scrutiny, suspecting you were a Linux user and thus likely to cause trouble.

We switched to a default shell with incompatible syntax not because of network effects but because it was much better than tcsh — initially just for scripting, later for interactive use as well.

zsh already existed at that point, btw.

As for mastering bash? Nobody masters bash. Brian Fox hasn't mastered bash.

qwertyuiop924 · on Sept 26, 2016

Well, you learn something new every day.

I guess I'm switching to ZSH now...

virtualwhys · on Sept 26, 2016

> Bash is an bad language

Totally OT but, "it's an bird, it's an plane, it's superman!", doesn't exactly roll off the tongue, and neither does the above[1].

[1] http://www.grammar.com/a-vs-an-when-to-use/

nandhp · on Sept 26, 2016

Presumably the original sentence was something like "bash is an abysmal language" and OP forgot to change the article when they replaced "abysmal" with "bad". This happens to me regularly.

virtualwhys · on Sept 26, 2016

Yeah, just tough to read, really jarring. I can read right through typos but for some reason this particular grammatical error is arggghhh inducing ;-)

seagreen · on Sept 26, 2016

Thanks. Too late to edit it now though. And nandhp's detective work was right, the adjective was originally "awful".

chungy · on Sept 26, 2016

What's so bad about bash?

IshKebab · on Sept 26, 2016

Just look at how [ is implemented. Tip of the iceberg.

undergrowth54 · on Sept 26, 2016

The flow control

audleman · on Sept 26, 2016

Brutality

laurent123456 · on Sept 26, 2016

To be fair, we talk about it every time a link from sqlite.org is posted here (/testing.html comes up frequently). What I also like about this project is how well written their documentation is. Thanks to it, it's very easy to read and understand how a particular command or property work and how to use it.

nine_k · on Sept 26, 2016

> if grep and curl stopped working

Did you ever run Windows?

baq · on Sept 26, 2016

    $ curl --version
    curl 7.47.1 (x86_64-w64-mingw32) libcurl/7.47.1 OpenSSL/1.0.2g zlib/1.2.8 libidn/1.32 libssh2/1.7.0 librtmp/2.3
    Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtmp rtsp scp sftp smtp smtps telnet tftp

    Features: IDN IPv6 Largefile SSPI Kerberos SPNEGO NTLM SSL libz TLS-SRP
    $ grep --version
    grep (GNU grep) 2.24
    Copyright (C) 2016 Free Software Foundation, Inc.
    License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
    This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law.

    Written by Mike Haertel and others, see <http://git.sv.gnu.org/cgit/grep.git/tree/AUTHORS>.
    $ uname -a
    MINGW64_NT-6.3 redacted 2.5.0(0.295/5/3) 2016-03-20 18:54 x86_64 Msys
    $

:)

JdeBP · on Sept 27, 2016

Now crank up PowerShell, which is what I suspect xe is subtly getting at. (-:

PeCaN · on Sept 26, 2016

…without installing curl and grep? Nope.

qwertyuiop924 · on Sept 26, 2016

... Have you ever wondered why nobody tries to port shell scripts?

kwhitefoot · on Sept 26, 2016

Unfortunately.

djsumdog · on Sept 26, 2016

I've run into a few issues with SQLite when I get to more complex cases. Right joins are a great example. I have to do a lot of searching, stackoverflowing and talking to peeps in #sqlite on freenode to try to figure out how to rewrite more complex queries to work with SQLite.

I still like SQLite, and most people don't need anything beyond simple joins, but when you do need some more advanced SQL, it can be a bit challenging.

qwertyuiop924 · on Sept 26, 2016

Yeah, it's not perfect, because nothing is, but it's pretty damned good.

AstroJetson · on Sept 26, 2016

I've always loved this line:

     SQLite does not compete with client/server databases. SQLite competes with fopen().

I have some small apps that I've written in TCL that use SQLite that I've been very happy with. Not much more effort than using a file.

There are also some nice hooks to allow the use of SQLite from Lua scrips. It's pretty easy and it fits into the Lua world view of data.

hackits · on Sept 26, 2016

I found it useful for logging. Logging tends to have structured data associated with it. Trying to re-parse log files into some meta data gets tiresome and is error prone.

tracker1 · on Sept 26, 2016

I'm a pretty big fan of line-separated JSON records... since JSON by default in most platforms serializes as a string, where linefeeds are escaped (\n) in strings, then you can separate records with a linefeed.

This can be streamed and even gzipped in said stream, meaning you get compression and a pretty easy format you can use in most platforms these days with very little intermediate processing or extensions.

With JSON you can have additional metadata on each record, and not have it affect the mainline... now, this cannot be queried directly, but it can be very easily streamed/imported elsewhere.

sebcat · on Sept 26, 2016

For certain tasks, I like to work on streams of JSON records too. jq(1) makes it really easy. However, requiring records to be line-delimited is a point of failure when accepting input from external sources. There should be no difference between:

    {"foo": 1}{"bar": 2}

and

    {"foo": 1}
    {"bar": 2}

or any variations of common whitespace between the objects. Just parse objects incrementally, one at a time. Most JSON libraries supports this.

As for general logging: I prefer it to be structured text written to stderr, and have a PEG parsing it for me. stderr is pretty much a catch-all log stream, and some 3rd party software insist on writing to it (most do however offer a way to override this behavior, e.g., chromium-headless, libxml2, &c). Having a PEG for those cases means I can still get JSON for everything if I need to, even if the data in the log is unexpected. If I need to, I can then modify the PEG for that use case.

Of course, it's a case-by-case, pragmatic decision. If I only want to log certain data and disregard stderr, or if I can guarantee everything that gets written to stderr will be valid JSON, writing logs as JSON may be better.

tracker1 · on Sept 26, 2016

It's simply easier to determine when an object ends with a delimiter...

    {"dtm":"2016-...",...,meta:{...}}

That requires more complicated input processing/streaming than read until '\n' then parse the record...

eatonphil · on Sept 26, 2016

Not meaningfully more complicated. You know when you've reached the end of your object of you are actually parsing it, so you also know when the next one can begin.

JdeBP · on Sept 27, 2016

Actually, no, one is not necessarily parsing it. Xyr point was that using line delimiters allows one to retain one-log-record-per line semantics and use the log stream with text processing tools that understand those semantics but that do not understand JSON.

tracker1 · on Sept 26, 2016

When you're working with a platform that has native JSON parsing, it generally doesn't support streams as described... so you either count braces and quotes, create a stream processor or use a delimiter... a delimiter is an easy enough solution, also allows for easier viewing in a text editor. It's about being pragmatic.

qwertyuiop924 · on Sept 26, 2016

Honestly, I'd say that logging should be text. Yes, SQLite DBs are hard to corrupt, but it can happen, especially in the sort of catastrophic failure you would want a log for. And when data corrupts, it's generally easier to extract some degree of data from a text file than from a binary.

Maybe as a log archive format, though...

simcop2387 · on Sept 26, 2016

Big problem with text logs is the moment something you log doesn't fit what you were expecting. A message has a new-line? all the sudden you either have to escape characters or you have to handle the data being put on the next line rather than the next log message. How do you detect when that happens? A message doesn't fit the format properly? Ok so let's encode the data, base64? now you can't grep the logs anymore for information, it's an opaque format with meta data, might as well use SQLite or some other structured format.

the_duke · on Sept 26, 2016

As someone else said, JSON is great for logging. Newlines get escaped, you just write line seperated json entries to a log file. JSON can be easily read by humans, and is trivial to parse in most programming languages.

I like SQLite, but I don't think logging is a good use case.

SFJulie · on Sept 26, 2016

except if one line of JSON gets corrupted ... all your log storage does go to the trash.

Heard about people login JSON ? (that may be truncated because log lines can be truncated when write is called with size > PIP_BUF in a concurrent environment?)

I know JSON is the new XML, but dinosaurs like me have learnt the hard way that logging should not be stored as a document but a journal of chunks considered truncated, and that relying on the atomicity of system read/write/close/open/seek/tell/unlink is a damned good idea because the day you need logs, is usually the day a major crash happens. Hence a day where a corruption of logs is more likely to happen.

True though that when you have no crashs, you love JSON format. But even truer it is when an incident happens you want a resilient system that still can log in a reliable way.

int_19h · on Sept 26, 2016

The log format, as described above, is one JSON object per line. So if one line gets corrupted, this only affects that one line.

And this is possible because the only place where a newline can appear in JSON is whitespace (where it can be safely replaced with a space; although most JSON serializers provide enough control over the output that you can just make sure that it never appears there in the first place).

SFJulie · on Sept 26, 2016

well, what a nice way to make a greper lose order of magnitudes in speed ; grep is fast because he does not line split.

And if \n is prohibited in string it is NOT JSON per ecma xyz anymore. it is something else.

Additional complexities that do not seem meaningful but when in congestion they add up to make your life a hell. But well, energy is cheap, VMs are cheap, operational expenses for cloud so less than coders, why bother?

With this kind of reasoning we will have so broken internet appliance coded with feet that we will experience a massive DDOS made by connected toasters and poorly coded cameras.

int_19h · on Sept 26, 2016

"\n" (that is, ASCII character "\", followed by ASCII character "n", producing escape sequence "\n" for newline) is not prohibited inside a string literal in that model. Only actual newlines (that is, ASCII characters 10 and 13) are prohibited, and that is already a part of JSON spec.

Like I said, the only place where JSON permits actual newlines is as part of insignificant whitespace. But because it's insignificant, newlines can be replaced with any other whitespace character, and the resulting JSON will have the same exact meaning.

And why would grep lose "order of magnitudes in speed"? If you were actually using grep, it'd work exactly the same as it does for any text file. The only thing that JSON does is add some structure to the contents, so that it can be easily converted to some tabular format that's more convenient for structured queries, aggregation etc. But you can still treat it as plain text for all purposes.

falcolas · on Sept 26, 2016

Line Feed (/n) is a Unicode control character (C0 group), so it must be escaped in a JSON string.

This per ECMA-404.

JdeBP · on Sept 27, 2016

The days that I frequently need logs are the days that I need to prove that a particular application did a particular thing -- handled a particular transaction, sent a particular message, responded to/triggered a particular event, and so forth. Faults in these areas are applications software faults (often not even fatal ones), and are unlikely to affect the logger process at all, especially since it is insulated from the applications softwares by each running under the aegis of its own dedicated unprivileged user account.

Crash diagnosis is but one use for logs.

emn13 · on Sept 26, 2016

Why don't you think sqlite is a good fit for logging?

Given the extensive testing including crash testing, it's plausibly more robust than plain text - because you're almost surely not using any kind of fsync's in your plain-text logger, so that text file isn't as incorruptible as you may think. And you may write a buggy logger, or use a buggy json implementation, or write incorrect error-recovery code when reading the file.

I'm skeptical that robustness is an argument in favor of plain text logging over sqlite.

JdeBP · on Sept 27, 2016

> you're almost surely not using any kind of fsync's in your plain-text logger

I am, though, and have been for many years.

* http://untroubled.org/daemontools-encore/multilog.8.html

* http://b0llix.net/perp/site.cgi?page=tinylog.8

* http://smarden.org/runit/svlogd.8.html

* http://jdebp.eu./Softwares/nosh/guide/cyclog.html

majewsky · on Sept 26, 2016

> you're almost surely not using any kind of fsync's in your plain-text logger, so that text file isn't as incorruptible as you may think

Agreed. But if the log file is corrupted, then a plain-text one will be easier to decipher for a human than any binary blob.

mschwaig · on Sept 26, 2016

Writing to files in general is a fairly difficult thing if you care about not losing any data under any circumstances, because you have to use the right syscalls for the semantics of your filesystem, which may somtimes be why you're looking at corrupted files in the first place. Correctly implemented transactions can prevent you from dealing with that. Maybe that's worth giving up easily readable, searchabe and processable textfiles, maybe not.

lcarlson · on Sept 26, 2016

Couldn't you just write the file out in a CSV format sqlite supports?

hawski · on Sept 26, 2016

I would like to see more use of netstrings - https://en.wikipedia.org/wiki/Netstring

jstimpfle · on Sept 26, 2016

This is a bad idea. What you want from text logs is the possibility to start anywhere in the file, look for a nearby newline and know it is the start of a record.

This is a very important property. It's also what makes UTF-8 resilient.

hawski · on Sept 26, 2016

Why can't I have both? I am not against text logs, but it is good to have options. When binary format is an option I say that netstrings are probably better option.

jstimpfle · on Sept 26, 2016

For literal binary data you need to store the length information separately.

Netstrings are one option for ephemeral streams. But they are a bad tradeoff for persistent information: you have to read from the beginning of the stream until you find the relevant information. And a single corrupted byte destroys all the information after it.

Better options for persistent data are (pointer+length)s or memory-pools + ranges.

michaelcampbell · on Sept 26, 2016

Ah, systemd.

vacri · on Sept 26, 2016

Don't worry, you can just have journald send loglines across the network to a centralised logserver...

... oh, wait, that hasn't been solved yet. Instead, there's a variety of hacky workarounds (send it to rsyslog and get it to ship; follow a journal with ncat and ship that; some others). Centralising journald logs is my current ops problem de juor.

duncan_bayne · on Sept 26, 2016

"Those who do not understand UNIX are condemned to reinvent it, poorly" - Henry Spencer

hackits · on Sept 26, 2016

Does that imply that LINUX was implemented poorly? I allays found the concept of socket file's to be mysterious things (coming from a windows background)

pjc50 · on Sept 26, 2016

Socket file descriptors were in UNIX before Linux was written. They were eventually copied into Windows via "Winsock", and Windows shares the same basic idea of using ReadFile to access the sockets.

(The "reinvent it badly" is systemd replacing syslogd without re-implementing functionality that was important to a big subset of admins, because the people who wrote systemd are focused on the personal workstation case to the exclusion of all others)

duncan_bayne · on Sept 26, 2016

But more fundamentally than the desktop / server focus, they either don't understand or don't value what makes UNIX great.

systemd is in many respects a major departure from UNIX philosophy, and I predict it will mark an inflection point in the quality and usability of Linux distros that adopt it, hence my efforts to migrate my own systems to FreeBSD.

For those wondering what I'm banging on about, please read The Art of UNIX Programming, which should have been called The Philosophy of UNIX:

http://www.catb.org/esr/writings/taoup/html/.

Edit: what I mean by fundamental is that it shouldn't matter whether the tools are intended for desktop or server use; they should be designed to interoperate seamlessly using text protocols via pipes, sockets and files. That way they can be composed, filtered and transformed in ways not yet dreamed of by their creators. The systemd folks are falling into the Microsoft and Apple trap by trying to anticipate how their software will be used, instead of building it so it can easily be hacked upon for uses they themselves haven't dreamed of (which oddly seems to include 'servers').

To put it bluntly: systemd has neither the hacker nature nor the UNIX nature, and history has been unkind to OSs with neither. I'm betting against it being a Good Thing in the long run.

qwertyuiop924 · on Sept 26, 2016

Yeah. As much as I hate ESR (I hate to beat that dead horse, but I don't want anyone getting the impression that I like him) he got it right here in this case, and Systemd definitely does the Wrong Thing.

It's not a Good Thing now, though: It's an attack surface.

duncan_bayne · on Sept 26, 2016

Agreed - what I meant was, in the long run, I expect that systemd will prove harmful to GNU/Linux in general, strategically.

Also FWIW, I like ESR a lot :) I'm mildly concerned about thread drift, but... why do you dislike him so?

qwertyuiop924 · on Sept 26, 2016

He's an okay programmer, but he's got a frankly wacko political agenda (Ayn Rand, Libertarian, racism, paranoia etc.), he's mismanaged The Jargon File by putting in some of his own phrases which nobody else uses, and he's just an all-around unpleasant person.

Honestly, he's kind of worse than RMS, who is merely unpleasant to be around...

duncan_bayne · on Sept 27, 2016

I agree on Rand and Libertarianism - although (as an Objectivist myself) it may interest you to know he's not an Objectivist, as he objects to several aspects of Rand's philosophy. I still have his suggested reading in that area on my TODO list.

But racism, and paranoia? I'd (seriously) like to see evidence of those if you have them. Nothing I have seen him write or do suggests he treats people in any way other than as individuals, on their merits alone.

qwertyuiop924 · on Sept 27, 2016

Did you miss the blog article where he said a friend had told him that women were trying to have sex with OS leaders, so they could accuse them of rape?

Sounds pretty paranoid to me.

I might have been wrong about the racism stuff. I swear I saw some stuff, but I can't find it now, so I might have been confused with somebody else.

But at the end of the day, he's still an unpleasant person.

bnolsen · on Sept 27, 2016

void linux is a great holdout against systemd.

qwertyuiop924 · on Sept 27, 2016

Pretty cool. The other holdouts are gentoo, and part of the Arch userbase, because Arch is very hackable, and someone set up a repo for openrc.

qwertyuiop924 · on Sept 26, 2016

No, but Lennart's been re-inventing userspace poorly...

ecnahc515 · on Sept 26, 2016

Look at systemd-journal-upload. Also look at systemd-journal-remote. I was able to get upload working with the journal-remote and fluentd's TCP input.

simcop2387 · on Sept 26, 2016

yea that's the one problem I've seen with journald that needs a proper solution. The other features it brings I've loved for my desktop but I wouldn't want it for long term data/logging yet.

qwertyuiop924 · on Sept 26, 2016

My thoughts exactly.

lultimouomo · on Sept 26, 2016

When I did this, I found that I needed to be pretty careful when rotating logs. The online backup API fits the need well, but if you're using a wrapper lib around SQLite you most probably won't be able to access it.

flatline · on Sept 26, 2016

On the other hand I've used SQLite as a more or less drop-in replacement for a full-blown client-server relational database. It was effectively a read-only file archive but way more complex than you expect to get from a flat file. From a series of flat files, maybe.

xiaomai · on Sept 26, 2016

I run the backend/website of my side business on sqlite. It is one of the best technology decisions I have made. It performs reasonably, is super straightforward (at my day job we have a team of postgres people to keep our dbs running smoothly, but for my little side business I don't have those resources); backups are dead simple. I love sqlite.

nilved · on Sept 26, 2016

This used to be how I felt, since the performance criticism of SQLite is vastly overblown. However, the safety criticism of a dynamically typed database is vastly underblown.

Since it completes with fopen(), you get about as much structure and validity.

jstimpfle · on Sept 26, 2016

I'm currently developping a text format called WSL[1]. By nature text files don't have indices, but it is strongly typed, supports standard relational integrity constraints, and indices can be automatically created when reading the file.

There is a currently only a simplistic python library which reads databases at about 1MB/s. On the plus side it's dead simple to use, only a single library call to parse a file as schema, tables, and indices.

There is also a C library in development which lexes at about 300-600 MB/s in a single thread (depending on how many columns are actually needed and thus have to be written to per-column lexem buffers) and which I hope will have a release next month.

[1] http://jstimpfle.de/projects/wsl/main.html

qwertyuiop924 · on Sept 26, 2016

What's the safety issue? Will your data corrupt, or are your types just not 100% guaranteed?

Because I can live with the ladder: weak types suck in programming languages, but are okay in DBs, and the types get verified multiple times on their way in and out of the DB in most systems.

Besides, it won't mangle your data. Unlike some DBs that I could name...

nilved · on Sept 26, 2016

I don't think it would mangle your data directly, but it could lead to incorrect results since there is a degree of mystery from query to query. You should definitely base a conclusion on their words and not mine.

https://www.sqlite.org/datatype3.html

qwertyuiop924 · on Sept 26, 2016

Thanks for the link. Yeah, it's as I remember: type affinities will convert to their type if possible, and if not... well, you get out what you put in.

The reason most people don't complain about this is that it's a far from common issue to totally miswrite your SQL statements so badly that you wind up mixing up columns. And when you do, it's usually detected pretty fast.

simcop2387 · on Sept 26, 2016

Since 3.3 you can use check constraints to enforce things, but I don't know what kind of performance impact that has.

mvc · on Sept 26, 2016

You also need to enable them each time you connect.

SQLite · on Sept 26, 2016

No you don't. CHECK constraints always work.

Maybe you are thinking about foreign key constraints, which for backwards compatibility are off by default, unless you use a compile-time option to make then on by default.

simcop2387 · on Sept 26, 2016

Really? dang that's terrible. I thought they'd stay that way as part of the table schema. Glad I normally use an ORM that'll take care of the type checking ahead of time.

qwertyuiop924 · on Sept 26, 2016

Apparently, they actually do stay around.

DrJokepu · on Sept 26, 2016

> the performance criticism of SQLite is vastly overblown

That kind of depends on how you use it. Obviously this depends on the row size, constraints, etc., but if you want to write more that a few thousand rows per second for longer periods, the performance limitations of SQLite will become very obvious very quickly.

creshal · on Sept 26, 2016

When your little side business needs to more than a few thousand rows per second for longer periods, you're either having enough customers to justify migration, or have the most inefficient database scheme of the decade.

DrJokepu · on Sept 26, 2016

Not every piece of software that needs to store data is a CRUD application and "little side businesses" are not the only use cases for an embedded database. For example, an embedded application logging readings from a number of sensors could easily reach thousands of writes per second. Another example would be a desktop (or mobile) email client such as Apple Mail using SQLite to store data, it could easily reach thousands of writes per second when e.g. downloading the entire mailbox from an IMAP server.

nilved · on Sept 26, 2016

Certainly true. I'm not saying that SQLite is always the best choice. But people like to say "SQLite is not performant" when they really mean "SQLite is not performant for concurrent-write-heavy applications" which is a small minority. In most but not all cases the performance of SQLite is perfectly adequate.

geon · on Sept 26, 2016

> validity

Are you thinking of corrupted files? Sqlite disk operations are atomic.

chii · on Sept 26, 2016

How do you handle concurrent access to your db?

Scaevolus · on Sept 26, 2016

Most applications don't actually need concurrent access. SQLite handles concurrent reads without any issues, with writes requiring exclusive locks. As long as your queries are fast and your write load is minimal, you won't really have any problems.

teraflop · on Sept 26, 2016

What you describe is the way SQLite worked originally. With the newer WAL mode, things are slightly better -- you can still only have one active write transaction at a time, but writers no longer block readers.

wila · on Sept 26, 2016

So I read about WAL mode down here and cheered because it sounded like it would solve the occasional "database is locked" error that the app I am working on is bumping into.

The database is only opened by 2 users on the same machine. One is a normal user, the other one is root for a daemon process. That by itself might be an uncommon scenario.

So I tried it out this morning and found that any writes are invisible unless I restart the app to close the database. For my use case that isn't an improvement as writes made by either user should be visible by the other user. Even tried it with setting read_uncommitted to true, but that did not help either.

Of course it is possible I am still doing something wrong, but at this moment it doesn't look like the WAL journal mode is an option for my app.

A pity as I expected -without WAL- to be able to read when another process is writing, well just a delayed read would be fine, but instead there's a -database is locked- error that pops up to the user.

domador · on Sept 27, 2016

Sorry to ask an obvious question, but are both users committing their transactions? Does the other user requery after a transaction?

wila · on Sept 27, 2016

Yeah, they both can write, although almost all of the writes are done by the daemon process and the normal user (the GUI process) reads and processes the results. The 'database is locked' problem seemed to happen most while the daemon user is writing and the normal user is reading.

For the moment I added a patch to my apps whereby the applications handle the locking by itself at a slightly higher level as I got a bit tired of the problem.

This is done via a separate lock file that is opened exclusively before any write action and closed after the write. By doing that I can simply delay the reads for a bit when the GUI process tests to open the lock file and that appears to have cured most problems.

It's a tiny bit more advanced as the above, but that's basically it and it appears to have cured most issues.

edit: might have misread your question, was it about the WAL journal mode? Yes the processes do commit the transactions they write. I need the results immediately, not after sqlite decides to process the WAL journal.

domador · on Sept 27, 2016

No, the question was not about WAL journal mode, but about running the SQLite COMMIT command.

You may wish to take a look at the last section (Transaction Control At The SQL Level) of the following page. Maybe it's relevant to your case:

https://www.sqlite.org/lockingv3.html

wila · on Sept 27, 2016

Thanks, I had read that part before though.

When staying in normal journal mode the app sees the data just fine and the data is committed directly in that case. Updates/Deletes are all pretty much instant and any queries results are correct. There might be an issue with the database drivers I depend on (FireDAC) in that layer I even go as far as closing the tables on each query/update after a commit. The problem with normal journal mode is the lock error popping up.

When I switch to WAL journal mode the data no longer appears to be written directly even when turning autocommit back on.

So while WAL mode appears to fix the lock issue, the data only gets committed on closing the database connection. As a result the GUI process can't interact with the daemon process anymore as it only sees old data. Opening and closing the database on each insert/update/delete to force the data to be written simply isn't an option.

domador · on Sept 28, 2016

Quite a thorny issue! I wish you the best as you look for a good solution.

wila · on Oct 1, 2016

An interesting issue as it turns out.

The reason this was happening was not because of SQLite, but due to how the FireDAC driver handles the locking.

That driver had a setting "BusyTimeout" which supposedly takes care of a lock waiting time.

According to the documentation it has a default timeout setting of 10 seconds. That clearly did not work, I even had set it manually, still to no effect.

The other day I figured to try and set this via an SQLite pragma setting... (busy_timeout)

I've not seen a "database is locked" issue since then and I've completely removed my manual locking layer, so "case solved" and it certainly wasn't SQLite to blame.

wila · on Sept 28, 2016

Thanks!

Looks like the manual locking that I added around reading/writing the data is working out OK. Time and a lot of testing will have to tell.

Despite all that I still like SQLite a lot. It's so incredibly easy to deploy and runs on all the platforms my app is targeting.

Scaevolus · on Sept 26, 2016

Good point! I find WAL mode especially useful when long write transactions would starve reads.

jack9 · on Sept 26, 2016

> Most applications don't actually need concurrent access

That's an interesting claim. I would rephrase that to, "Are their more http calls that use concurrent connections to a DB, or standalone applications that do not?" I would wager the former.

emn13 · on Sept 26, 2016

The sqlite website runs on an sqlite db.

Even on most websites, I suspect the need for concurrent, long-lived write transactions is much rarer than people assume. If your write transactions are short-lived, then sequential execution is a reasonable approximation of (slow) concurrency, at which point it's a question of load whether that's good enough. But the window in which it's not good enough is very slim - hardware simply isn't all that concurrent in the first place, and as you scale, some sharding strategy is required anyhow.

So the more plausible limitation is long-lived write transactions; e.g. where a write cannot be committed until after some other confirmation occurs, possibly over the network. That simply won't work well at all in sqlite - not that it's a great strategy to use on other DBs...

icebraining · on Sept 26, 2016

The sqlite website runs on an sqlite db.

Yeah, but the sqlite website doesn't need a db at all.

emn13 · on Oct 6, 2016

Well, "need"...

To quote the sqlite website itself:

> The SQLite website (https://www.sqlite.org/) uses SQLite itself, of course, and as of this writing (2015) it handles about 400K to 500K HTTP requests per day, about 15-20% of which are dynamic pages touching the database. Each dynamic page does roughly 200 SQL statements. This setup runs on a single VM that shares a physical server with 23 others and yet still keeps the load average below 0.1 most of the time.

I think its fair to assume that the sqlite site could be redesigned to meet most of its functionality as a largely static site, but that would come at a loss of functionality. And obviously it's a form of dogfooding, but that's not objectionable, right?

the_duke · on Sept 26, 2016

SQLite is not such a great choice for the server side. SQLite typically get's used on the client side portion, either as a "caching db" for offline work, or for client side only programs without a server backend.

qwertyuiop924 · on Sept 26, 2016

I think what GPP meant is that most apps don't need concurrent writes.

AstroJetson · on Sept 26, 2016

Read and understand this document:

    https://www.sqlite.org/lockingv3.html

It is the inside the engine on how to do things. There are exact steps that need to be done the way the document reads to make it happen.

There is a also big section on how to corrupt the database. It's a heads up that if you decide to do shortcuts there will not be a happy ending.

A little more complicated doing concurrent use than with something like MySQL, but there is much more engine on the MySQL side. If multiple concurrent users with high transaction levels, SQLite may not be your best first choice.

beagle3 · on Sept 26, 2016

SQLite has always supported multiple readers, which is the most common concurrent access pattern. Before the WAL (Write-ahead log) was introduced, a writer would block readers (and vice versa), but with a WAL, a single writer and multiple readers do not block each others.

SQLite does not have multi writer concurrency (usually MVCC as in Oracle/MySQL/PostgreSQL or optimstic transactions like Backplane). If you need those, SQLite is not for you.

xiaomai · on Sept 26, 2016

The WAL (https://www.sqlite.org/wal.html) facilitates concurrency.

edwinyzh · on Sept 26, 2016

We use mORMot(https://github.com/synopse/mORMot) to achieve concurrent access.

On windows it utilizes the http.sys engine (which is the same used by IIS the Microsoft web server)

With FreePascal it also works on Linux .

Lxr · on Sept 26, 2016

SQLite works great as the database engine for most low to medium traffic websites (which is to say, most websites)...Generally speaking, any site that gets fewer than 100K hits/day should work fine with SQLite.

Do people agree with this? I was under the impression you should not use SQLite for production websites for some reason. Django has this to say, for instance [1]:

When starting your first real project, however, you may want to use a more robust database like PostgreSQL, to avoid database-switching headaches down the road.

[1] https://docs.djangoproject.com/en/1.10/intro/tutorial02/

ufmace · on Sept 26, 2016

I'm rather skeptical, because I've found Postgres to be pretty simple to run on a project of this size as well. Takes like 10 minutes to install it. Maybe bump it up to 30 minutes if you want to do something a little tricky with user accounts, like run the web server with a DB account that has limited permissions, which SQLite can't do at all anyways.

On the small projects, DB admin doesn't seem to be much more complex than using a SQLite DB. By the time you get the DB load high enough that you really want to pay attention to administrating it, SQLite has probably given up the ghost long ago.

Don't get me wrong, SQLite is great for what it does. I don't see the upside to it on this though. Even if you know for sure your site will never hit high traffic, it just isn't that hard to run a conventional DB. And if it does, it's a lot easier to pay somebody to set up your DB server right than to convert over to a conventional DB and then get it set up right.

jstimpfle · on Sept 26, 2016

The upside is it doesn't need a central server. And 10 minutes to install the server (if you know how to do it) vs 0 minutes is a pretty big deal.

A project running on sqlite can be quickly taken to just about any box and run without any infrastructure dependencies.

You can easily run a hundred instances on a single machine, for dozens of simultaneous users, without any setup or coordination. Computing resources are only required for access, not for availability.

hug · on Sept 26, 2016

The only possible answer here is "it depends".

As SQLite themselves state, they're running a 500K hits/day site on SQLite just fine. They also point out that their site is not particularly write heavy, which is a somewhat important point to be making with SQLite specifically.

What do Django mean by a "real project"? If this is a project you intend to scale to beyond the scope of SQLite, then starting from something that will scale that way in the first place will alleviate later growing pains. If it's your personal site and will always and forever be run on a VPS with 1GB of RAM? There's no reason not to just stick with SQLite -- and you get the benefits of not having to maintain a "real" db service.

pbhjpbhj · on Sept 26, 2016

>not having to maintain a "real" db service.//

Is there more to do for a small site to maintain postgres relative to SQLite?

Run backups and updates, de-lint occasionally; what else?

marktangotango · on Sept 26, 2016

We built a SQL database backend as a service based on SQLite, we like it so much. The typeless schema and lack of some constraints can be a challenge, but there are work arounds (check constraints). High write uses are the only problem with SQLite, our service provides application level caching via a X-Query-Cache header to provide caching. In that case you're basically serving from redis.

Our goal is to provide a SQL backend to everyone or device on the planet. To get that level of scalability and manageability (ie disk usage and cpu usage) you pretty much have to use an in process database. SQLite is the best there is as others in this thread have noted.

Full disclosure, I'm the founder of Lite Engine:

https://www.lite-engine.com

blihp · on Sept 26, 2016

Sure. I think the warning about not using SQLite in production is more of a rule of thumb (and a good one) for those who don't really understand the tradeoffs they're making when using it. If one takes the time to both understand the needs of their application and the limits of SQLite, there's no reason not to use it in production where it is appropriate.

bpicolo · on Sept 26, 2016

Moving to an entirely different database is a monumental task, so if you start with the right tool you won't have to deal with that

nostrademons · on Sept 26, 2016

Getting a site from zero to 100K hits/day is also a fairly monumental task, and if you don't succeed at that, you don't have to move to a different database.

TylerE · on Sept 26, 2016

With Django it really isn't, as all of the differences are (in theory at least...) abstracted away behind a common API.

collyw · on Sept 26, 2016

Thats the theory but in practice its not always so good.

I tried moving from from MySQL to Postgress. Somehow my unique constrains in Django weren't unique in the database so it threw errors when I tried dumping to Django fixtures.

bpicolo · on Sept 26, 2016

Until you start usings database-specific types. JSONB, hstore, etc :) Django isn't really the limiting factor.

watermoose · on Sept 26, 2016

> Do people agree with this?

I don't. I've corrupted SQLite DBs enough to not have warm and fuzzy feelings about it like I used to have.

I think it's only a good choice when you just need a database for your app that will barely be using it, and if you didn't use it you'd be writing to a file instead. And, that's basically what the SQLite docs say.

However, even then, I think it can be short-sighted. I've used webapps before that used SQLite and I thought to myself: if they'd only used MySQL or PostgreSQL and then provided access to it, I could have used it.

Be aware though, if you decide to use a scalable DB like PostgreSQL, it will require a port to be open for the DB, even if only locally. If you're trying to minimize how people can access your data, you don't want a port open/an extra port open, and you're not going to hit it very hard, SQLite's probably your best choice.

qwertyuiop924 · on Sept 26, 2016

OTOH, it is a Real DB, if a small one.

And corruptions, while obviously not unheard of, aren't very common. Even in power failure.

watermoose · on Sept 26, 2016

Yeah- I changed my wording to "scalable". And I appreciate the developers and community around SQLite. It has its uses, and I appreciate it. However, I think it could be better with concurrency.

qwertyuiop924 · on Sept 26, 2016

It can have concurrent reads, and even concurrent read and write, but it doesn't support concurrent writes.

mappu · on Sept 26, 2016

>it will require a port to be open for the DB, even if only locally

Surely it supports AF_UNIX sockets?

chungy · on Sept 26, 2016

It does. The default PostgreSQL installation binds to port 5432 (I can't recall if that's loopback only, or global), but it can easily be disabled and use AF_UNIX sockets only.

watermoose · on Sept 26, 2016

I wasn't aware of that. Thanks!

david-given · on Sept 26, 2016

Can you talk at all about how the SQlite databases got corrupted? I'd be interested to know the circumstances. (The answer 'I have no idea, it was just one of those things' is perfectly acceptable.)

audleman · on Sept 26, 2016

No, I tried to run a site which much less traffic than this on sqlite. It threw database lock exceptions all the time. My writes must have been throwing it off? It was pretty frustrating, I should have just used Postgres from the start.

Still love sqlite for what it is though!

_pgmf · on Sept 30, 2016

You probably were holding write transactions open for the duration of the request/response cycle, which is guaranteed to get you in trouble.

chj · on Sept 26, 2016

Did you use WAL mode?

Biganon · on Sept 26, 2016

One of the reasons I don't use sqlite in my projects is the lack of an easy unaccent solution, to make queries that automatically suppress accents. All my projects are in French, and if you want to make a search function in French you absolutely need that. Almost nobody will search for "éléphant" with the accents, especially now that Google, Facebook etc. do not require it.

majewsky · on Sept 26, 2016

Does your programming language or framework not offer an unaccent (aka ASCII fold) function?

elmigranto · on Sept 26, 2016

I guess stored rows contain accents, not just query. So you run something along the lines of `unaccent(text) like unaccent(query)`; or, more realistically, you `create index on table(unaccent(text))`.

majewsky · on Sept 26, 2016

Oh I see. That makes sense.

TylerE · on Sept 26, 2016

The biggest limitation is it's complete lack of proper type support.

qwertyuiop924 · on Sept 26, 2016

...Which is pretty hard to notice unless you really screw up. It's never been a problem for me, but I only work with SQLite infrequently. Does it come up often in your work?

technomancy · on Sept 26, 2016

I wouldn't say it's a frequent problem, but it's much more likely to be a problem than performance.

keithnz · on Sept 26, 2016

If you just look at its capability, yes, it can work great for a website.

However, sqlite is limited in what it can do, so when you need to go beyond what it can do then it can be a pain.

I'd say sqlite is a DB choice for people who understand what each DB option gives them. If you were new and needed a default choice, then pg, or mysql, or sqlserver are going to be pretty flexible long term. You also are going to get a lot more technical info on the web about how to use it with whatever web framework you have chosen. However, I have used it for websites where I have a pretty good idea about my data needs. works fine.

I use it more in the "competes with fopen" case though. Super great as a settings / info / persistence store

red_admiral · on Sept 26, 2016

I used SQLite for teaching last year because it was the only thing that I could get IT to install between when I took over the databases unit and the start of term.

While it was broadly a success, I consider the following major problems when teaching to beginners:

  * very loose syntax. CREATE TABLE PERSON ( ID BANANA BANANA BANANA ); is legal :)

  * no type-checking: you can insert strings into an INTEGER column and vice versa - while you're trying with a straight face to teach students that one of the advantages of a proper database is that it can enforce some consistency on your data.

  * in the same vein - foreign key constraints are NOT enforced by default. 

  * misusing GROUP BY produces results, but not the ones you want. I'd much rather any use of aggregates that is forbidden by the standard also gave an error, to discourage students from thinking "it produces numbers, therefore it must be ok".

This year, I'll try with MariaDB. I consider SQLite an excellent product for many things and use it extensively myself, but as a teaching tool its liberal approach to typing is a drawback.

lucb1e · on Sept 26, 2016

> People who understand SQL can employ the sqlite3 command-line shell to analyze large datasets.

And a bit further down:

> SQLite database is limited in size to 140 terabytes [...] if you are contemplating databases of this magnitude [use something else]

Yeah no. "Large datasets" here means a few megabytes. I figured that out the hard way:

I had a database of about 70 megabytes and ran a query with "COUNT(a)" and "GROUP BY b" on it. This makes it write multiple gigabytes to /tmp until it goes "out of disk space" (yeah /tmp on my ssd isn't large).

I heard nothing but awesome and success stories about SQLite until a few weeks ago when this fiasco happened. I still like SQLite for its simplicity and last week I used it again for another project, but analyzing "large" datasets? Maybe with a simple SELECT WHERE query, but don't try anything more fancy than that when you have 100k+ rows.

kbaker · on Sept 26, 2016

Of course, "large" is not about the data necessarily, but about the analysis of the data. It sounds like you presented sqlite with a query which would need more than the resources of your computer in order to execute as you specified. This can be really easy to do accidentally on any DB engine with something like a cartesian product.

It is true that sqlite doesn't have as good of a query plan optimizer as larger RDBMSes, and is a little lower-level, but the tradeoff of having the simplicity is that you must understand a little more of the internals to design more complex queries.

pmorici · on Sept 26, 2016

How you considered you might have used it wrong? If everyone else says it works great and you use it and have an issue my first thought would be I must have done something wrong not this sucks everyone else must be wrong.

lucb1e · on Sept 26, 2016

Sound logic, but I didn't think to myself "gee everyone else is wrong". I just noticed SQLite did something I've never seen another database do and figured it's not made for this.

vacri · on Sept 26, 2016

I find this happens quite often - once you stray from the most common path for $AWESOME_TOOL, you start finding little oddities here and there. It's not that 'everyone else is wrong', it's just that you're off the beaten track.

cypherpunks01 · on Sept 26, 2016

I'm sure if you posted the EXPLAIN here, someone might be able to figure out why it did that. Were you able to test the exact same query on a different database?

lucb1e · on Sept 26, 2016

I was going to, but something else got in the way and I ended up not running the query again.

Over the years I've had my share of complex queries with subqueries and aggregates, both for class and for my own projects, but never have I encountered 70MB exploding into multiple gigabytes (I don't even know what its final size would have been). I guess I could have used EXPLAIN and dug into it, but never having had this before I figured it was SQLite not being made for it.

Dan_Kennedy · on Sept 26, 2016

Perhaps this is a bug. If you post the query and database schema on the sqlite mailing list or here we can take a look easily enough. If you can upload the database somewhere we can try to reproduce it.

Mailing list: http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

qwertyuiop924 · on Sept 26, 2016

That seems odd. Have you looked at the SQL docs for why this might be?

Also, have you compared this to what other DBs do?

thom · on Sept 26, 2016

I've used SQLite with a 5GB database and tables up to about 15m rows, and while I've certainly managed to chew up a lot of disk (same as SQL Server or Postgres), it's not the day to day experience, even with quite complex queries. All you can do is EXPLAIN QUERY PLAN, or perhaps look into putting temp files in RAM, if that is more plentiful:

http://stackoverflow.com/a/19259699/24618

yread · on Sept 26, 2016

Just last week I played with a project that involved inserting 40M words (and their tuples, triplets, quadruples, ...) into in memory SQLite database. Even though the db was about 600MB a query which grouped the tuples by their frequency finished in under 5s.

dijit · on Sept 26, 2016

FWIW MySQL performs in exactly the same way, any temporary tables by default go to /tmp

this includes whatever you stick in parenthesis.

`select * from table where account_id in (select id from accounts where name like 'Fred%');` for example

dorfsmay · on Sept 26, 2016

For analytics, monetDB is my new small local db.

blahi · on Sept 26, 2016

MonetDB lite embeds in R like rsqlite.

lcarlson · on Sept 26, 2016

Whatever your issue was, it most likely has to do with our query planning or lack of indexes. I've used sqlite with success on many gigabyte datasets.

yAnonymous · on Sept 26, 2016

If you're like most other developers, your tables probably look like Excel sheets. No wonder it performs badly.

Esau · on Sept 26, 2016

I am just an average, non-programming geek, and I love SQLIte. I use it to from the command line to track my blood pressure, my comic book collection, and my book collection.

It also gave me the chance to learn SQL for fun.

Sadly, it is not often looked upon as an end-user tool.

hackits · on Sept 26, 2016

If you like running SQLite I recommend looking into double entry book keeping with Ledger. Recommend it as it gives a good introduction course in journaling and accounting using double entry book keeping.

qwertyuiop924 · on Sept 26, 2016

>I am just an average, non-programming geek

Wait... Why are you on HN?

It's not a problem or anything, I'm just kind of curious.

eximius · on Sept 26, 2016

I can tell you aren't trying to be malicious, but this implies that HN is only for 'programming geeks' which is ridiculous.

qwertyuiop924 · on Sept 26, 2016

Well, the news that shows up is stuff primarily relevant to programmers...

I just wonder who else would be interested

mikeash · on Sept 26, 2016

Looking at the front page right now, out of the 30 stories maybe 2-3 of them would only be of interest to programmers versus people interested in tech in general.

qwertyuiop924 · on Sept 26, 2016

I suppose...

sliverstorm · on Sept 26, 2016

Interesting news?

kartikkumar · on Sept 26, 2016

I use SQLite to store all of my simulation data (~10s of GBs). It's remarkably versatile and the fact that there are good libraries for Python and C++ to interface with and query SQLite dbs makes it a synch to use for data analysis.

I've seen so many people struggle with custom binary formats; I imagine there are countless research hours lost in figuring out how to work with these obscure formats. I've advocated to all students I work with to make use of SQLite to store simulation data for their thesis projects and my experience is that they're quick to pick it up and figure out how to do some pretty complex querying.

It's one of those things that I don't understand about academia: there are so many standards and well-established tools in the tech/IT sector that we don't take advantage of. SQLite and JSON are the two that I constantly advocate to everyone I work with.

nbevans · on Sept 26, 2016

We use SQLite as a data integration tool. We connect to a third-party system's esoteric database using an ODBC driver. Then export tables to a SQLite database. This process can sometimes take a few minutes but is generally quite quick. Then the SQLite database is compressed and uploaded to cloud blob storage. Effectively at this point it is a "snapshot" of the third party system's state. Our cloud system is then tailored with SQLite queries to know how to use and understand that foreign schema. By doing it this way we avoid needing to know several dozen SQL dialects for esoteric database engines that "never won the race in the 1990s" (think Progress, Ingres, Paradox, etc). It means we only need to know SQLite - a current, OSS and well supported variant of SQL. Epic cost and time savings are the net result.

Const-me · on Sept 26, 2016

For Windows, “Many concurrent writers? → choose client/server” heuristic isn’t right.

On Windows, we have this: https://en.wikipedia.org/wiki/Extensible_Storage_Engine ESENT plays very nice in high-concurrency scenarios.

Implementing client/server where you only need an embedded DB comes at price. It bloats and complicates the installer, increases attack surface, conflicts with other software for listening TCP port number, interferes with firewalls, consumes more resources, slows the startup, etc…