I teach SQL to non-techie students. I used to give the the option of doing either MySQL or SQLite, but not only did I underestimate how different the syntaxes were, I also underestimated how not-trivial it is for students to successfully install and run both the MySQL server and client. These are students who can't even use a spreadsheet well, not that that makes a huge difference in understanding databases.
I've moved everything to SQLite and couldn't be happier. Not only is it easier to distribute assignments (e.g. a single SQLite file, instead of CSVs that need to be manually imported), it does everything I need it to do to teach the concepts of relational databases and join operations. This typically just needs read-only access, so our assignments can involve gigabytes of data without issue.
"I used to give the the option of doing either MySQL or SQLite"
SQLite is good choise for absolute beginners.
Later, when teaching "real" multi-user RDBMSes, although MySQL may be more popular, it makes more sense to teach PostgreSQL as "default open source database". Both will do the job, but PostgreSQL has got more stuff right from the beginning, which is especially important when learning. Think PHP vs. Python, PHP may cut few corners but it's not ideal language for teaching generic concepts.
SQLite teaches virtually all of the necessary concepts to jump right into a multi-user RDBMS, even most of the syntax, with the exception of being more lax about datatypes. My audience is not people intending to build applications, but to do data analysis beyond what they can do in your typical spreadsheet. I don't think the PHP vs. Python analogy is particularly accurate. I don't feel that PHP is easier to learn than Python unless the main goal is to produce something public-facing on the web. Nevermind that the syntax is significantly different.
I'm not intimate with Go, but I figure its clean and simple design makes it a good first language; and in particular a good preparation for C, because of its focus on memory layout.
Rust? NO! One of the worst choices for a first language. Similar to C++, Rust is really heavy on concepts.
If it's just too learn SQL then would a LibreOffice database do, perhaps they removed it but you used to be able to use (or perhaps just import?) a spreadsheet as a DB and run SQL against it. Definitely did joins and order-bys using standard SQL syntax.
Libreoffice now use firebird under the hood I think.
Firebird has a more complete support for sql than sqlite has. It also has the advantage of being able to be used both in a manner analogous to sqlite or as a full blown server database.
I actually think SQLite could probably be used from beginning to advanced level, depending on what you're teaching (relational theory for example), but perhaps I'm not aware of its' limitations in that regard.
That being said have you considered using docker (or other popular container tech) for making installation of MySQL easier?
And I should also add - what about cheap cloud SQL instances/services as well? Perhaps the students could learn a bit about how simple it is to use some cloud resources too without the management overhead?
The one thing missing from SQLite is stronger type casting...however, SQLite does have a solid, cross-platform GUI in DB Browser [0]. It's not as nice as Sequel Pro and some of the commercial clients for MySQL. Postgres, unfortunately, doesn't have near the variety of reliable clients that MySQL and SQLite do.
However, SQLite follows most of the standard syntax that Postgres does, so having students move right into CartoDB, which lets you run raw Postgres, has never been a problem.its just the self hosting that's a pain :).
One more thing that I wish I could have from Postgres as a teacher: how it returns an error when you include non-aggregated columns in the SELECT clause. In MySQL and SQLite, selecting extra columns won't throw an error, and the results look close enough to correct to be fairly dangerous for novices. Better to just throw an error as Postgres does.
Hipp based the syntax and semantics on those of PostgreSQL 6.5. In August 2000, version 1.0 of SQLite was released, with storage based on gdbm (GNU Database Manager).
SQLite is quite possibly one of the most useful pieces of software ever created. It's small, relatively fast, and unbelivably solid. It's up there with bash, curl, grep, emacs, and nano: tools that are just so good at their job that we don't even notice how amazing they are.
I mean, really. SQLite is remarkable, impressive, and used everywhere, and we never talk about it. Emacs is a remarkably impressive piece of engineering, bash is the world's default shell for a reason, Nano is the newbie's text edior, and, well, just imagine for a second what would happen if grep or curl stopped working.
Clearly you've just arrived from some wonderful alternate universe where bash means something different than it does on Earth. Welcome traveler!
bash is the world's default shell for a reason
Here on Earth that reason is network effects ("If I write it in bash, it will run anywhere!"). Bash is an bad language. If you've mastered bash you can have the honorable feeling of mastering a difficult, ugly, but practical skill (see also: knife-fighting, driving a motor vehicle, running for office). But there's no need to be mean to SQLite by comparing the two.
I spent some time making it work on modern machines, and it's... I don't really want to use the word 'bad'; because I don't think it is. I kept finding issues which I would spend ages tracing through the code muttering about ancient C which didn't understand alignment and stuff, and then discover that the code was actually doing everything right and the problem was at my end. I couldn't find a single bug in it.
But incomprehensible --- oh god yes. The way it handles memory is really bizarre. The parser is really bizarre (there's the famous Tom Duff quote: "
Nobody really knows what the Bourne shell’s grammar is. Even
examination of the source code is little help."). And then they interact!
Funny, I instead had that reaction to nano being there after I read "good at their job." I mean, that's one way of looking at nano, I suppose, depending on how you define "job"...
Partly because listing emacs or vim would launch a flame war, but Nano's intent is to be a small text editor, friendly enough for newbies, to handle small jobs, like writing an email, or editing /etc/fstab. And it does that pretty well.
I forgot I did that. It was because Emacs is really technically impressive. It's a bit ugly in places, but on the whole, it's remarkably well designed.
Before bash was the world's default shell, it was csh, then tcsh. At my university, if you switched your shell to bash, the sysadmins would put you under extra scrutiny, suspecting you were a Linux user and thus likely to cause trouble.
We switched to a default shell with incompatible syntax not because of network effects but because it was much better than tcsh — initially just for scripting, later for interactive use as well.
zsh already existed at that point, btw.
As for mastering bash? Nobody masters bash. Brian Fox hasn't mastered bash.
Presumably the original sentence was something like "bash is an abysmal language" and OP forgot to change the article when they replaced "abysmal" with "bad". This happens to me regularly.
Yeah, just tough to read, really jarring. I can read right through typos but for some reason this particular grammatical error is arggghhh inducing ;-)
To be fair, we talk about it every time a link from sqlite.org is posted here (/testing.html comes up frequently). What I also like about this project is how well written their documentation is. Thanks to it, it's very easy to read and understand how a particular command or property work and how to use it.
$ curl --version
curl 7.47.1 (x86_64-w64-mingw32) libcurl/7.47.1 OpenSSL/1.0.2g zlib/1.2.8 libidn/1.32 libssh2/1.7.0 librtmp/2.3
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtmp rtsp scp sftp smtp smtps telnet tftp
Features: IDN IPv6 Largefile SSPI Kerberos SPNEGO NTLM SSL libz TLS-SRP
$ grep --version
grep (GNU grep) 2.24
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Mike Haertel and others, see <http://git.sv.gnu.org/cgit/grep.git/tree/AUTHORS>.
$ uname -a
MINGW64_NT-6.3 redacted 2.5.0(0.295/5/3) 2016-03-20 18:54 x86_64 Msys
$
I've run into a few issues with SQLite when I get to more complex cases. Right joins are a great example. I have to do a lot of searching, stackoverflowing and talking to peeps in #sqlite on freenode to try to figure out how to rewrite more complex queries to work with SQLite.
I still like SQLite, and most people don't need anything beyond simple joins, but when you do need some more advanced SQL, it can be a bit challenging.
I found it useful for logging. Logging tends to have structured data associated with it. Trying to re-parse log files into some meta data gets tiresome and is error prone.
I'm a pretty big fan of line-separated JSON records... since JSON by default in most platforms serializes as a string, where linefeeds are escaped (\n) in strings, then you can separate records with a linefeed.
This can be streamed and even gzipped in said stream, meaning you get compression and a pretty easy format you can use in most platforms these days with very little intermediate processing or extensions.
With JSON you can have additional metadata on each record, and not have it affect the mainline... now, this cannot be queried directly, but it can be very easily streamed/imported elsewhere.
For certain tasks, I like to work on streams of JSON records too. jq(1) makes it really easy. However, requiring records to be line-delimited is a point of failure when accepting input from external sources. There should be no difference between:
{"foo": 1}{"bar": 2}
and
{"foo": 1}
{"bar": 2}
or any variations of common whitespace between the objects. Just parse objects incrementally, one at a time. Most JSON libraries supports this.
As for general logging: I prefer it to be structured text written to stderr, and have a PEG parsing it for me. stderr is pretty much a catch-all log stream, and some 3rd party software insist on writing to it (most do however offer a way to override this behavior, e.g., chromium-headless, libxml2, &c). Having a PEG for those cases means I can still get JSON for everything if I need to, even if the data in the log is unexpected. If I need to, I can then modify the PEG for that use case.
Of course, it's a case-by-case, pragmatic decision. If I only want to log certain data and disregard stderr, or if I can guarantee everything that gets written to stderr will be valid JSON, writing logs as JSON may be better.
Not meaningfully more complicated. You know when you've reached the end of your object of you are actually parsing it, so you also know when the next one can begin.
Actually, no, one is not necessarily parsing it. Xyr point was that using line delimiters allows one to retain one-log-record-per line semantics and use the log stream with text processing tools that understand those semantics but that do not understand JSON.
When you're working with a platform that has native JSON parsing, it generally doesn't support streams as described... so you either count braces and quotes, create a stream processor or use a delimiter... a delimiter is an easy enough solution, also allows for easier viewing in a text editor. It's about being pragmatic.
Honestly, I'd say that logging should be text. Yes, SQLite DBs are hard to corrupt, but it can happen, especially in the sort of catastrophic failure you would want a log for. And when data corrupts, it's generally easier to extract some degree of data from a text file than from a binary.
Big problem with text logs is the moment something you log doesn't fit what you were expecting. A message has a new-line? all the sudden you either have to escape characters or you have to handle the data being put on the next line rather than the next log message. How do you detect when that happens? A message doesn't fit the format properly? Ok so let's encode the data, base64? now you can't grep the logs anymore for information, it's an opaque format with meta data, might as well use SQLite or some other structured format.
As someone else said, JSON is great for logging. Newlines get escaped, you just write line seperated json entries to a log file. JSON can be easily read by humans, and is trivial to parse in most programming languages.
I like SQLite, but I don't think logging is a good use case.
except if one line of JSON gets corrupted ... all your log storage does go to the trash.
Heard about people login JSON ? (that may be truncated because log lines can be truncated when write is called with size > PIP_BUF in a concurrent environment?)
I know JSON is the new XML, but dinosaurs like me have learnt the hard way that logging should not be stored as a document but a journal of chunks considered truncated, and that relying on the atomicity of system read/write/close/open/seek/tell/unlink is a damned good idea because the day you need logs, is usually the day a major crash happens.
Hence a day where a corruption of logs is more likely to happen.
True though that when you have no crashs, you love JSON format. But even truer it is when an incident happens you want a resilient system that still can log in a reliable way.
The log format, as described above, is one JSON object per line. So if one line gets corrupted, this only affects that one line.
And this is possible because the only place where a newline can appear in JSON is whitespace (where it can be safely replaced with a space; although most JSON serializers provide enough control over the output that you can just make sure that it never appears there in the first place).
well, what a nice way to make a greper lose order of magnitudes in speed ; grep is fast because he does not line split.
And if \n is prohibited in string it is NOT JSON per ecma xyz anymore. it is something else.
Additional complexities that do not seem meaningful but when in congestion they add up to make your life a hell. But well, energy is cheap, VMs are cheap, operational expenses for cloud so less than coders, why bother?
With this kind of reasoning we will have so broken internet appliance coded with feet that we will experience a massive DDOS made by connected toasters and poorly coded cameras.
"\n" (that is, ASCII character "\", followed by ASCII character "n", producing escape sequence "\n" for newline) is not prohibited inside a string literal in that model. Only actual newlines (that is, ASCII characters 10 and 13) are prohibited, and that is already a part of JSON spec.
Like I said, the only place where JSON permits actual newlines is as part of insignificant whitespace. But because it's insignificant, newlines can be replaced with any other whitespace character, and the resulting JSON will have the same exact meaning.
And why would grep lose "order of magnitudes in speed"? If you were actually using grep, it'd work exactly the same as it does for any text file. The only thing that JSON does is add some structure to the contents, so that it can be easily converted to some tabular format that's more convenient for structured queries, aggregation etc. But you can still treat it as plain text for all purposes.
The days that I frequently need logs are the days that I need to prove that a particular application did a particular thing -- handled a particular transaction, sent a particular message, responded to/triggered a particular event, and so forth. Faults in these areas are applications software faults (often not even fatal ones), and are unlikely to affect the logger process at all, especially since it is insulated from the applications softwares by each running under the aegis of its own dedicated unprivileged user account.
Why don't you think sqlite is a good fit for logging?
Given the extensive testing including crash testing, it's plausibly more robust than plain text - because you're almost surely not using any kind of fsync's in your plain-text logger, so that text file isn't as incorruptible as you may think. And you may write a buggy logger, or use a buggy json implementation, or write incorrect error-recovery code when reading the file.
I'm skeptical that robustness is an argument in favor of plain text logging over sqlite.
Writing to files in general is a fairly difficult thing if you care about not losing any data under any circumstances, because you have to use the right syscalls for the semantics of your filesystem, which may somtimes be why you're looking at corrupted files in the first place.
Correctly implemented transactions can prevent you from dealing with that. Maybe that's worth giving up easily readable, searchabe and processable textfiles, maybe not.
This is a bad idea. What you want from text logs is the possibility to start anywhere in the file, look for a nearby newline and know it is the start of a record.
This is a very important property. It's also what makes UTF-8 resilient.
Why can't I have both? I am not against text logs, but it is good to have options. When binary format is an option I say that netstrings are probably better option.
For literal binary data you need to store the length information separately.
Netstrings are one option for ephemeral streams. But they are a bad tradeoff for persistent information: you have to read from the beginning of the stream until you find the relevant information. And a single corrupted byte destroys all the information after it.
Better options for persistent data are (pointer+length)s or memory-pools + ranges.
Don't worry, you can just have journald send loglines across the network to a centralised logserver...
... oh, wait, that hasn't been solved yet. Instead, there's a variety of hacky workarounds (send it to rsyslog and get it to ship; follow a journal with ncat and ship that; some others). Centralising journald logs is my current ops problem de juor.
Does that imply that LINUX was implemented poorly? I allays found the concept of socket file's to be mysterious things (coming from a windows background)
Socket file descriptors were in UNIX before Linux was written. They were eventually copied into Windows via "Winsock", and Windows shares the same basic idea of using ReadFile to access the sockets.
(The "reinvent it badly" is systemd replacing syslogd without re-implementing functionality that was important to a big subset of admins, because the people who wrote systemd are focused on the personal workstation case to the exclusion of all others)
But more fundamentally than the desktop / server focus, they either don't understand or don't value what makes UNIX great.
systemd is in many respects a major departure from UNIX philosophy, and I predict it will mark an inflection point in the quality and usability of Linux distros that adopt it, hence my efforts to migrate my own systems to FreeBSD.
For those wondering what I'm banging on about, please read The Art of UNIX Programming, which should have been called The Philosophy of UNIX:
Edit: what I mean by fundamental is that it shouldn't matter whether the tools are intended for desktop or server use; they should be designed to interoperate seamlessly using text protocols via pipes, sockets and files. That way they can be composed, filtered and transformed in ways not yet dreamed of by their creators. The systemd folks are falling into the Microsoft and Apple trap by trying to anticipate how their software will be used, instead of building it so it can easily be hacked upon for uses they themselves haven't dreamed of (which oddly seems to include 'servers').
To put it bluntly: systemd has neither the hacker nature nor the UNIX nature, and history has been unkind to OSs with neither. I'm betting against it being a Good Thing in the long run.
Yeah. As much as I hate ESR (I hate to beat that dead horse, but I don't want anyone getting the impression that I like him) he got it right here in this case, and Systemd definitely does the Wrong Thing.
It's not a Good Thing now, though: It's an attack surface.
He's an okay programmer, but he's got a frankly wacko political agenda (Ayn Rand, Libertarian, racism, paranoia etc.), he's mismanaged The Jargon File by putting in some of his own phrases which nobody else uses, and he's just an all-around unpleasant person.
Honestly, he's kind of worse than RMS, who is merely unpleasant to be around...
I agree on Rand and Libertarianism - although (as an Objectivist myself) it may interest you to know he's not an Objectivist, as he objects to several aspects of Rand's philosophy. I still have his suggested reading in that area on my TODO list.
But racism, and paranoia? I'd (seriously) like to see evidence of those if you have them. Nothing I have seen him write or do suggests he treats people in any way other than as individuals, on their merits alone.
Did you miss the blog article where he said a friend had told him that women were trying to have sex with OS leaders, so they could accuse them of rape?
Sounds pretty paranoid to me.
I might have been wrong about the racism stuff. I swear I saw some stuff, but I can't find it now, so I might have been confused with somebody else.
But at the end of the day, he's still an unpleasant person.
yea that's the one problem I've seen with journald that needs a proper solution. The other features it brings I've loved for my desktop but I wouldn't want it for long term data/logging yet.
When I did this, I found that I needed to be pretty careful when rotating logs. The online backup API fits the need well, but if you're using a wrapper lib around SQLite you most probably won't be able to access it.
On the other hand I've used SQLite as a more or less drop-in replacement for a full-blown client-server relational database. It was effectively a read-only file archive but way more complex than you expect to get from a flat file. From a series of flat files, maybe.
I run the backend/website of my side business on sqlite. It is one of the best technology decisions I have made. It performs reasonably, is super straightforward (at my day job we have a team of postgres people to keep our dbs running smoothly, but for my little side business I don't have those resources); backups are dead simple. I love sqlite.
This used to be how I felt, since the performance criticism of SQLite is vastly overblown. However, the safety criticism of a dynamically typed database is vastly underblown.
Since it completes with fopen(), you get about as much structure and validity.
I'm currently developping a text format called WSL[1]. By nature text files don't have indices, but it is strongly typed, supports standard relational integrity constraints, and indices can be automatically created when reading the file.
There is a currently only a simplistic python library which reads databases at about 1MB/s. On the plus side it's dead simple to use, only a single library call to parse a file as schema, tables, and indices.
There is also a C library in development which lexes at about 300-600 MB/s in a single thread (depending on how many columns are actually needed and thus have to be written to per-column lexem buffers) and which I hope will have a release next month.
What's the safety issue? Will your data corrupt, or are your types just not 100% guaranteed?
Because I can live with the ladder: weak types suck in programming languages, but are okay in DBs, and the types get verified multiple times on their way in and out of the DB in most systems.
Besides, it won't mangle your data. Unlike some DBs that I could name...
I don't think it would mangle your data directly, but it could lead to incorrect results since there is a degree of mystery from query to query. You should definitely base a conclusion on their words and not mine.
Thanks for the link. Yeah, it's as I remember: type affinities will convert to their type if possible, and if not... well, you get out what you put in.
The reason most people don't complain about this is that it's a far from common issue to totally miswrite your SQL statements so badly that you wind up mixing up columns. And when you do, it's usually detected pretty fast.
Maybe you are thinking about foreign key constraints, which for backwards compatibility are off by default, unless you use a compile-time option to make then on by default.
Really? dang that's terrible. I thought they'd stay that way as part of the table schema. Glad I normally use an ORM that'll take care of the type checking ahead of time.
> the performance criticism of SQLite is vastly overblown
That kind of depends on how you use it. Obviously this depends on the row size, constraints, etc., but if you want to write more that a few thousand rows per second for longer periods, the performance limitations of SQLite will become very obvious very quickly.
When your little side business needs to more than a few thousand rows per second for longer periods, you're either having enough customers to justify migration, or have the most inefficient database scheme of the decade.
Not every piece of software that needs to store data is a CRUD application and "little side businesses" are not the only use cases for an embedded database. For example, an embedded application logging readings from a number of sensors could easily reach thousands of writes per second. Another example would be a desktop (or mobile) email client such as Apple Mail using SQLite to store data, it could easily reach thousands of writes per second when e.g. downloading the entire mailbox from an IMAP server.
Certainly true. I'm not saying that SQLite is always the best choice. But people like to say "SQLite is not performant" when they really mean "SQLite is not performant for concurrent-write-heavy applications" which is a small minority. In most but not all cases the performance of SQLite is perfectly adequate.
Most applications don't actually need concurrent access. SQLite handles concurrent reads without any issues, with writes requiring exclusive locks. As long as your queries are fast and your write load is minimal, you won't really have any problems.
What you describe is the way SQLite worked originally. With the newer WAL mode, things are slightly better -- you can still only have one active write transaction at a time, but writers no longer block readers.
So I read about WAL mode down here and cheered because it sounded like it would solve the occasional "database is locked" error that the app I am working on is bumping into.
The database is only opened by 2 users on the same machine. One is a normal user, the other one is root for a daemon process. That by itself might be an uncommon scenario.
So I tried it out this morning and found that any writes are invisible unless I restart the app to close the database. For my use case that isn't an improvement as writes made by either user should be visible by the other user. Even tried it with setting read_uncommitted to true, but that did not help either.
Of course it is possible I am still doing something wrong, but at this moment it doesn't look like the WAL journal mode is an option for my app.
A pity as I expected -without WAL- to be able to read when another process is writing, well just a delayed read would be fine, but instead there's a -database is locked- error that pops up to the user.
Yeah, they both can write, although almost all of the writes are done by the daemon process and the normal user (the GUI process) reads and processes the results. The 'database is locked' problem seemed to happen most while the daemon user is writing and the normal user is reading.
For the moment I added a patch to my apps whereby the applications handle the locking by itself at a slightly higher level as I got a bit tired of the problem.
This is done via a separate lock file that is opened exclusively before any write action and closed after the write. By doing that I can simply delay the reads for a bit when the GUI process tests to open the lock file and that appears to have cured most problems.
It's a tiny bit more advanced as the above, but that's basically it and it appears to have cured most issues.
edit: might have misread your question, was it about the WAL journal mode? Yes the processes do commit the transactions they write. I need the results immediately, not after sqlite decides to process the WAL journal.
When staying in normal journal mode the app sees the data just fine and the data is committed directly in that case. Updates/Deletes are all pretty much instant and any queries results are correct.
There might be an issue with the database drivers I depend on (FireDAC) in that layer I even go as far as closing the tables on each query/update after a commit.
The problem with normal journal mode is the lock error popping up.
When I switch to WAL journal mode the data no longer appears to be written directly even when turning autocommit back on.
So while WAL mode appears to fix the lock issue, the data only gets committed on closing the database connection.
As a result the GUI process can't interact with the daemon process anymore as it only sees old data.
Opening and closing the database on each insert/update/delete to force the data to be written simply isn't an option.
The reason this was happening was not because of SQLite, but due to how the FireDAC driver handles the locking.
That driver had a setting "BusyTimeout" which supposedly takes care of a lock waiting time.
According to the documentation it has a default timeout setting of 10 seconds. That clearly did not work, I even had set it manually, still to no effect.
The other day I figured to try and set this via an SQLite pragma setting... (busy_timeout)
I've not seen a "database is locked" issue since then and I've completely removed my manual locking layer, so "case solved" and it certainly wasn't SQLite to blame.
> Most applications don't actually need concurrent access
That's an interesting claim. I would rephrase that to, "Are their more http calls that use concurrent connections to a DB, or standalone applications that do not?" I would wager the former.
Even on most websites, I suspect the need for concurrent, long-lived write transactions is much rarer than people assume. If your write transactions are short-lived, then sequential execution is a reasonable approximation of (slow) concurrency, at which point it's a question of load whether that's good enough. But the window in which it's not good enough is very slim - hardware simply isn't all that concurrent in the first place, and as you scale, some sharding strategy is required anyhow.
So the more plausible limitation is long-lived write transactions; e.g. where a write cannot be committed until after some other confirmation occurs, possibly over the network. That simply won't work well at all in sqlite - not that it's a great strategy to use on other DBs...
> The SQLite website (https://www.sqlite.org/) uses SQLite itself, of course, and as of this writing (2015) it handles about 400K to 500K HTTP requests per day, about 15-20% of which are dynamic pages touching the database. Each dynamic page does roughly 200 SQL statements. This setup runs on a single VM that shares a physical server with 23 others and yet still keeps the load average below 0.1 most of the time.
I think its fair to assume that the sqlite site could be redesigned to meet most of its functionality as a largely static site, but that would come at a loss of functionality. And obviously it's a form of dogfooding, but that's not objectionable, right?
SQLite is not such a great choice for the server side. SQLite typically get's used on the client side portion, either as a "caching db" for offline work, or for client side only programs without a server backend.
It is the inside the engine on how to do things. There are exact steps that need to be done the way the document reads to make it happen.
There is a also big section on how to corrupt the database. It's a heads up that if you decide to do shortcuts there will not be a happy ending.
A little more complicated doing concurrent use than with something like MySQL, but there is much more engine on the MySQL side. If multiple concurrent users with high transaction levels, SQLite may not be your best first choice.
SQLite has always supported multiple readers, which is the most common concurrent access pattern. Before the WAL (Write-ahead log) was introduced, a writer would block readers (and vice versa), but with a WAL, a single writer and multiple readers do not block each others.
SQLite does not have multi writer concurrency (usually MVCC as in Oracle/MySQL/PostgreSQL or optimstic transactions like Backplane). If you need those, SQLite is not for you.
SQLite works great as the database engine for most low to medium traffic websites (which is to say, most websites)...Generally speaking, any site that gets fewer than 100K hits/day should work fine with SQLite.
Do people agree with this? I was under the impression you should not use SQLite for production websites for some reason. Django has this to say, for instance [1]:
When starting your first real project, however, you may want to use a more robust database like PostgreSQL, to avoid database-switching headaches down the road.
I'm rather skeptical, because I've found Postgres to be pretty simple to run on a project of this size as well. Takes like 10 minutes to install it. Maybe bump it up to 30 minutes if you want to do something a little tricky with user accounts, like run the web server with a DB account that has limited permissions, which SQLite can't do at all anyways.
On the small projects, DB admin doesn't seem to be much more complex than using a SQLite DB. By the time you get the DB load high enough that you really want to pay attention to administrating it, SQLite has probably given up the ghost long ago.
Don't get me wrong, SQLite is great for what it does. I don't see the upside to it on this though. Even if you know for sure your site will never hit high traffic, it just isn't that hard to run a conventional DB. And if it does, it's a lot easier to pay somebody to set up your DB server right than to convert over to a conventional DB and then get it set up right.
The upside is it doesn't need a central server. And 10 minutes to install the server (if you know how to do it) vs 0 minutes is a pretty big deal.
A project running on sqlite can be quickly taken to just about any box and run without any infrastructure dependencies.
You can easily run a hundred instances on a single machine, for dozens of simultaneous users, without any setup or coordination. Computing resources are only required for access, not for availability.
As SQLite themselves state, they're running a 500K hits/day site on SQLite just fine. They also point out that their site is not particularly write heavy, which is a somewhat important point to be making with SQLite specifically.
What do Django mean by a "real project"? If this is a project you intend to scale to beyond the scope of SQLite, then starting from something that will scale that way in the first place will alleviate later growing pains. If it's your personal site and will always and forever be run on a VPS with 1GB of RAM? There's no reason not to just stick with SQLite -- and you get the benefits of not having to maintain a "real" db service.
We built a SQL database backend as a service based on SQLite, we like it so much. The typeless schema and lack of some constraints can be a challenge, but there are work arounds (check constraints). High write uses are the only problem with SQLite, our service provides application level caching via a X-Query-Cache header to provide caching. In that case you're basically serving from redis.
Our goal is to provide a SQL backend to everyone or device on the planet. To get that level of scalability and manageability (ie disk usage and cpu usage) you pretty much have to use an in process database. SQLite is the best there is as others in this thread have noted.
Sure. I think the warning about not using SQLite in production is more of a rule of thumb (and a good one) for those who don't really understand the tradeoffs they're making when using it. If one takes the time to both understand the needs of their application and the limits of SQLite, there's no reason not to use it in production where it is appropriate.
Getting a site from zero to 100K hits/day is also a fairly monumental task, and if you don't succeed at that, you don't have to move to a different database.
Thats the theory but in practice its not always so good.
I tried moving from from MySQL to Postgress. Somehow my unique constrains in Django weren't unique in the database so it threw errors when I tried dumping to Django fixtures.
I don't. I've corrupted SQLite DBs enough to not have warm and fuzzy feelings about it like I used to have.
I think it's only a good choice when you just need a database for your app that will barely be using it, and if you didn't use it you'd be writing to a file instead. And, that's basically what the SQLite docs say.
However, even then, I think it can be short-sighted. I've used webapps before that used SQLite and I thought to myself: if they'd only used MySQL or PostgreSQL and then provided access to it, I could have used it.
Be aware though, if you decide to use a scalable DB like PostgreSQL, it will require a port to be open for the DB, even if only locally. If you're trying to minimize how people can access your data, you don't want a port open/an extra port open, and you're not going to hit it very hard, SQLite's probably your best choice.
Yeah- I changed my wording to "scalable". And I appreciate the developers and community around SQLite. It has its uses, and I appreciate it. However, I think it could be better with concurrency.
It does. The default PostgreSQL installation binds to port 5432 (I can't recall if that's loopback only, or global), but it can easily be disabled and use AF_UNIX sockets only.
Can you talk at all about how the SQlite databases got corrupted? I'd be interested to know the circumstances. (The answer 'I have no idea, it was just one of those things' is perfectly acceptable.)
No, I tried to run a site which much less traffic than this on sqlite. It threw database lock exceptions all the time. My writes must have been throwing it off? It was pretty frustrating, I should have just used Postgres from the start.
One of the reasons I don't use sqlite in my projects is the lack of an easy unaccent solution, to make queries that automatically suppress accents. All my projects are in French, and if you want to make a search function in French you absolutely need that. Almost nobody will search for "éléphant" with the accents, especially now that Google, Facebook etc. do not require it.
I guess stored rows contain accents, not just query. So you run something along the lines of `unaccent(text) like unaccent(query)`; or, more realistically, you `create index on table(unaccent(text))`.
...Which is pretty hard to notice unless you really screw up. It's never been a problem for me, but I only work with SQLite infrequently. Does it come up often in your work?
If you just look at its capability, yes, it can work great for a website.
However, sqlite is limited in what it can do, so when you need to go beyond what it can do then it can be a pain.
I'd say sqlite is a DB choice for people who understand what each DB option gives them. If you were new and needed a default choice, then pg, or mysql, or sqlserver are going to be pretty flexible long term. You also are going to get a lot more technical info on the web about how to use it with whatever web framework you have chosen. However, I have used it for websites where I have a pretty good idea about my data needs. works fine.
I use it more in the "competes with fopen" case though. Super great as a settings / info / persistence store
I used SQLite for teaching last year because it was the only thing that I could get IT to install between when I took over the databases unit and the start of term.
While it was broadly a success, I consider the following major problems when teaching to beginners:
* very loose syntax. CREATE TABLE PERSON ( ID BANANA BANANA BANANA ); is legal :)
* no type-checking: you can insert strings into an INTEGER column and vice versa - while you're trying with a straight face to teach students that one of the advantages of a proper database is that it can enforce some consistency on your data.
* in the same vein - foreign key constraints are NOT enforced by default.
* misusing GROUP BY produces results, but not the ones you want. I'd much rather any use of aggregates that is forbidden by the standard also gave an error, to discourage students from thinking "it produces numbers, therefore it must be ok".
This year, I'll try with MariaDB. I consider SQLite an excellent product for many things and use it extensively myself, but as a teaching tool its liberal approach to typing is a drawback.
> People who understand SQL can employ the sqlite3 command-line shell to analyze large datasets.
And a bit further down:
> SQLite database is limited in size to 140 terabytes [...] if you are contemplating databases of this magnitude [use something else]
Yeah no. "Large datasets" here means a few megabytes. I figured that out the hard way:
I had a database of about 70 megabytes and ran a query with "COUNT(a)" and "GROUP BY b" on it. This makes it write multiple gigabytes to /tmp until it goes "out of disk space" (yeah /tmp on my ssd isn't large).
I heard nothing but awesome and success stories about SQLite until a few weeks ago when this fiasco happened. I still like SQLite for its simplicity and last week I used it again for another project, but analyzing "large" datasets? Maybe with a simple SELECT WHERE query, but don't try anything more fancy than that when you have 100k+ rows.
Of course, "large" is not about the data necessarily, but about the analysis of the data. It sounds like you presented sqlite with a query which would need more than the resources of your computer in order to execute as you specified. This can be really easy to do accidentally on any DB engine with something like a cartesian product.
It is true that sqlite doesn't have as good of a query plan optimizer as larger RDBMSes, and is a little lower-level, but the tradeoff of having the simplicity is that you must understand a little more of the internals to design more complex queries.
How you considered you might have used it wrong? If everyone else says it works great and you use it and have an issue my first thought would be I must have done something wrong not this sucks everyone else must be wrong.
Sound logic, but I didn't think to myself "gee everyone else is wrong". I just noticed SQLite did something I've never seen another database do and figured it's not made for this.
I find this happens quite often - once you stray from the most common path for $AWESOME_TOOL, you start finding little oddities here and there. It's not that 'everyone else is wrong', it's just that you're off the beaten track.
I'm sure if you posted the EXPLAIN here, someone might be able to figure out why it did that. Were you able to test the exact same query on a different database?
I was going to, but something else got in the way and I ended up not running the query again.
Over the years I've had my share of complex queries with subqueries and aggregates, both for class and for my own projects, but never have I encountered 70MB exploding into multiple gigabytes (I don't even know what its final size would have been). I guess I could have used EXPLAIN and dug into it, but never having had this before I figured it was SQLite not being made for it.
Perhaps this is a bug. If you post the query and database schema on the sqlite mailing list or here we can take a look easily enough. If you can upload the database somewhere we can try to reproduce it.
I've used SQLite with a 5GB database and tables up to about 15m rows, and while I've certainly managed to chew up a lot of disk (same as SQL Server or Postgres), it's not the day to day experience, even with quite complex queries. All you can do is EXPLAIN QUERY PLAN, or perhaps look into putting temp files in RAM, if that is more plentiful:
Just last week I played with a project that involved inserting 40M words (and their tuples, triplets, quadruples, ...) into in memory SQLite database. Even though the db was about 600MB a query which grouped the tuples by their frequency finished in under 5s.
I am just an average, non-programming geek, and I love SQLIte. I use it to from the command line to track my blood pressure, my comic book collection, and my book collection.
It also gave me the chance to learn SQL for fun.
Sadly, it is not often looked upon as an end-user tool.
If you like running SQLite I recommend looking into double entry book keeping with Ledger. Recommend it as it gives a good introduction course in journaling and accounting using double entry book keeping.
Looking at the front page right now, out of the 30 stories maybe 2-3 of them would only be of interest to programmers versus people interested in tech in general.
I use SQLite to store all of my simulation data (~10s of GBs). It's remarkably versatile and the fact that there are good libraries for Python and C++ to interface with and query SQLite dbs makes it a synch to use for data analysis.
I've seen so many people struggle with custom binary formats; I imagine there are countless research hours lost in figuring out how to work with these obscure formats. I've advocated to all students I work with to make use of SQLite to store simulation data for their thesis projects and my experience is that they're quick to pick it up and figure out how to do some pretty complex querying.
It's one of those things that I don't understand about academia: there are so many standards and well-established tools in the tech/IT sector that we don't take advantage of. SQLite and JSON are the two that I constantly advocate to everyone I work with.
We use SQLite as a data integration tool. We connect to a third-party system's esoteric database using an ODBC driver. Then export tables to a SQLite database. This process can sometimes take a few minutes but is generally quite quick. Then the SQLite database is compressed and uploaded to cloud blob storage. Effectively at this point it is a "snapshot" of the third party system's state. Our cloud system is then tailored with SQLite queries to know how to use and understand that foreign schema. By doing it this way we avoid needing to know several dozen SQL dialects for esoteric database engines that "never won the race in the 1990s" (think Progress, Ingres, Paradox, etc). It means we only need to know SQLite - a current, OSS and well supported variant of SQL. Epic cost and time savings are the net result.
Implementing client/server where you only need an embedded DB comes at price. It bloats and complicates the installer, increases attack surface, conflicts with other software for listening TCP port number, interferes with firewalls, consumes more resources, slows the startup, etc…
When people say SQLite is everywhere, they mean it. Heck you're likely using it right now as you browse HN since Firefox, Chrome, Opera, etc all use it.
I used SQLite to analyze web server logs at my last job (devops at Xero). SQLite supports in memory databases which are very fast. I'd parse a bucket of logs into a table, then run some queries against them and write the results into Graphite. The results ended up on the ops wall, and generating another data point was one more SQL query in a config file away. Wonder if they're still using it.
I use SQLite almost all the time now on my desktop apps especially, for logging and other data intensive tasks that require only a single read/write thread. It replaced text logging for me as searching and segregating log messages is now a breeze.
I've also used it as a main data store for single user Win32 apps.
In my early days of web app programming, I had an app that created a brand new SQLite data file for EACH customer that logged in and created an account on the web app. I thought it would be the most secure way to separate datasets and protect privacy for each user whilst negating the multiple write lock issues on the same SQLite database. Tip: Don't even bother to do this! The eventual data maintenance headache was far worse... :)
My personal favourite use for SQLite was for my blog. I wanted to use flat-files for storing individual entries, but I still wanted to present tag-views, and per-month entries.
My solution was to create a simple SQlite database, import all the entries into it, and then generate the views by SELECTing from that store.
Populating the database, even if it got thrown away immediately afterwards, was more efficient than trying to store all the entries in RAM.
Yes, temp storage is another 'use case' for me too. On one particular web app I designed, there is a requirement for searching across multiple tables for text data. I simply transpose the columns I need from the different tables into a memory SQLite database 'on the fly' and perform a full text search on that for lightning quick responses and no need to do multi table joins.
I love SQLite up until you need to modify the schema. That's when you find that upgrading an in place database is almost impossible. Rebuilding a whole table just to rename a column is just completely impractical and makes maintaining applications really cumbersome.
I've used SQLite for single-user analytics in the past and it's fine up to a point. It's slightly more SQL-literate than MySQL - it supports CTEs but not window functions - and it has an okay GIS extension. I've also been pleasantly surprised by performance in some cases.
I run a number of 1%-write, 99%-read web apps with decent traffic on SQLite. Works like a charm. It is low maintenance and creating a "backup" is simply copying the file.
I really enjoy using sqlite. Not everything needs a client server model, and having your entire database located in a single file makes a lot of things way easier.
"Application file format" is the thing which intrigues me. I think people don't think about it as an option enough, it should be used more frequently. File format versions backed by migrations, trivial inspection of data, transactional for free, you can keep recent history of changes, hierarchy etc. It's pretty good.
What would be the arguments to not use MySQL for a business website, even a small one? Sure SQLlite does probably the job as well as MySQL, but I don't have any problems with MySQL and it's commonly the default option when choosing a RDBMS. Just curious.
I use sqlite for an intermediate representation of a report that is rendered to multiple worksheets. I find having the data in sql lets me perform all types of transformations that are not easily handled outside of sql.
They are semi-regularly discussed here, and are known as syntax diagrams or railroad diagrams. You can find a good list of generation tools at https://en.wikipedia.org/wiki/Syntax_diagram
I've never had the occasion to use an SQL database. But say I was writing a game using C++ - at what point would I go from managing a bunch of maps or vectors of entities to using a SQL database?
If I was writing a ray tracer and needed to store vertices, would it makes sense to use a SQL database? How about for a list of object? Or textures?
In general I often need to filter on objects, update object state, generate new objects, remove some others, etc. but I never know when I should stop thinking containers and start thinking "aha! time for SQL"
It's not really appropriate for any of those things.
You should use a database to store data that you want to keep after the program terminates, not so much transient things like in-memory data structures. It's also best used for relational data- stuff that is logically linked together.
For developing a game, maybe storing item tables with items and stats or the player's inventory might be good candidates. Sqlite in particular is good for this because it's easily embedded and a lot of games use it from what I know.
This is oversimplifying a good bit, but it's hard to completely describe the scope of relational DBs.
>> For developing a game, maybe storing item tables with items and stats or the player's inventory might be good candidates
Doubtful. Player inventory is not going to be large enough to bother, and item tables you'll want to be in-memory anyway, so you might as well just read them from CSV, JSON, XML etc (and that way you can easily edit them, too).
I would say that SQLite only makes sense when your dataset is too big to be entirely loaded into memory in a cooperative environment (i.e. assuming that your app is not allowed to hog the entire memory). I'd say that starts at tens of megabytes.
People definitely use SQLite for tracking assets in the games they create. eg which resource file (bmp/snd/wav/jpg/etc) belongs to which objects, plus some string data (eg character description, stats), etc.
Saying that from seeing links to our site (sqlitebrowser.org) from game developers & users on Steam, and also people asking us questions about various database files they're trying to figure out (as an end user).
The various SQLite encryption options around seems to make a difference too, for game developers wanting a simple(-ish) way to "hide" the info from players. Embedding an encryption key isn't a fantastic approach, but it seems to be "good enough" sometimes.
It could be worth it if you need to do lots of relational queries with complex inventory management. I was imagining e.g. an RPG that would ship the stats for all its items in a sqlite data file and then you can store the player's stats and inventory with foreign keys pointing to the item table. You're gonna have to store that data somehow and if you've got enough items and/or complex enough inventory management it seems like maybe you might want to consider sqlite as it already exists and provides a lot of relevant features. I don't consider size so much as whether or not there is a need to persist data and the complexity of relationships; size is more of a factor in "should I use sqlite or should I use a beefier database like Postgres."
I know sqlite is used heavily on iOS and Android and a lot of people use it as a glorified serialization format. Probably not the best in most cases but hey sqlite is so lightweight that it doesn't have much downside. I tend to use it as intended as a lightweight database myself but hey if it works it works.
It is far easier to store such structures as object graphs in-memory (i.e. your "foreign key" is a pointer/reference to the actual object). The navigation patterns would mostly be looking up properties on the item referenced by inventory, so it's not like you need to do joins etc (but even if you did, a join on in-memory object graph is still pretty easy and blazing fast).
For C++ especially, I would recommend looking at Boost multi_index library. This gives you the ability to do fast lookups on a variety of keys across the same data.
Pretty much the only benefit I can see from SQLite in those small dataset scenarios is when you need persistence and the ability to change subset of data in an atomic way (if you only need to save the entire in-memory dataset atomically, you can always just do the rename trick to ensure atomicity with far less overhead). Well, and, I guess, optimization of complicated queries - but I'm somewhat skeptical about the ability of their optimizer to use indices in a query that's really complicated; and simple ones are trivial to do explicitly.
The main idea for a database is for storing data on disk and as a cheap man's way of sharing data across processes that are running at the same time.
If you don't find yourself needing to share data between two processes at the same time or storing data on the system, you may never need it.
SQLite can actually be used as a decent save system for a video game since you can just store into it and read from it. You could actually do a "Mass-Effect Style" storage system where you can store items from Game 1 and read it on Game 2. You could actually have your studio share a SQLite DB and have your games reference whether a user has played another one of your games.
Instead of thinking "time for SQL" you should consider using a real object database (like Realm). In my experience that will map much better to the data structures in your program.
I'm looking at a project right now where I'm planning to use SQLite as a high-level solution to file locking (i.e. create a record in the DB to "lock" a file, delete it when you're done, and don't create a record if a record for that file is already in the DB). Sound like an appropriate use of SQLite? Is there a better, more direct solution? (I understand there are platform-specific utilities but I would want something portable.)
It's not a good idea because different processes shouldn't write to the same sqlite database.
Unless you design it as a service, where one process is using sqlite to store its locking data and you use it from other processes by communicating with this service - then it's ok.
Most higher level languages have a flock or equivalent library that is cross platform (win/linux/mac). If you need exclusive locks, I haven't seen an interface for accessing files in languages above C that don't have exclusive lock already.
> SQLite supports an unlimited number of simultaneous readers, but it will only allow one writer at any instant in time.
This is the one that usually gets me. For whatever reason, I tend to prefer side projects that are "take a dataset and make a tool out of it". It often ends up with simultaneous bulk writes when the dataset is updating.
I'm a big sqlite fan. Just throwing this out as a limitation for anyone deciding if it's appropriate for their project.
I always wondered if using SQLite for the back end for a distributed map/reduce jobs was efficient. Each machine holds part of the data in an SQLite file.
It would not solve the usual sort/group by problems that require cross-machine communication, but would take full advantage of SQLite's optimizations for other problems.
The list of check-ins is computed by a single query. But that query then tosses the list over the wall to another subsystem which generates content for each check-in. And several queries are required for each check-in to extract the relevant information needed for display.
The timeline example above is an information-rich page. Perhaps it could be generated using fewer than 200 SQL statements. But SQL against an SQLite database is so cheap that it has never really been a factor. You can see at the bottom of the page that it was generated in about 25 milliseconds. Profiling indicates that very few of those 25 milliseconds were spent inside the database engine.
Perhaps the take-away is that when the SQL engine is in-process and queries do not involve a server round-trip, the "n+1 query problem" is not really a problem.
> [Is 200 SQL statements] a ridiculously high number for a single page?
It is, but it does happen, usually due to design decisions.
When I was still young enough to do PHP (about a decade ago), one of my largest projects was a domain-specific CMS for code collaboration. It was all working fine on my development system, but in production, every page loaded for at least 4-5 seconds. I inspected into the DB queries that were used to build a single page, and found a lot of duplicate
SELECT * FROM {table} WHERE id = {id};
because of how the model classes were built. Since it was too late to change the architecture, I sent these types of statements through a simple cache and brought the amount of queries per page down from a few hundred to below 10.
More or less the Kibana dashboard's way of doing things, which destroys the Elasticsearch machine and forces setting up a cluster with GB of RAM and multiple CPUs even if it sits doing nothing between refreshes. The dashboard could serialize the queries and that web service could combine those queries into one, get the name, date and age and demux the result to the single widgets.
fossil scm is a dvcs built around sqlite. one executable, very few dependencies and atomic transactional safety. workflow wise its more like cvs or svn properly converted into a dvcs. great for small teams, integrated web server, wiki issue tracking. http://fossil-scm.org
I've moved everything to SQLite and couldn't be happier. Not only is it easier to distribute assignments (e.g. a single SQLite file, instead of CSVs that need to be manually imported), it does everything I need it to do to teach the concepts of relational databases and join operations. This typically just needs read-only access, so our assignments can involve gigabytes of data without issue.