Hacker News new | comments | show | ask | jobs | submit login
Writing a SQLite clone from scratch in C (cstack.github.io)
574 points by ingve 11 months ago | hide | past | web | favorite | 146 comments



A bit shameless to ask a question under top HN record, but I always wondered why do not SQL engines give access to their SQL AST to generate queries. Of course, one can combine a query from strings with placeholders, and it takes not so much comparing to actual querying process, but it feels so hacky. Why not provide something like:

    Query *q = db_select(db);
    db_add_from(q, "persons");
    db_left_join(q, "companies", db_expr(DB_EQ, "per_id", "com_id"));

    Query *q = db_create_table(db);
    db_add_column(q, "id", DB_TYPE_INT | DB_COL_NOT_NULL | DB_COL_PRIMARY);
    db_set_table_name(q, "persons");
... and so on? That way we could combine and generate queries for specific engines without that ugly string joining/parsing.


> A bit shameless to ask a question under top HN record, but I always wondered why do not SQL engines give access to their SQL AST to generate queries.

A number of reasons from the PostgreSQL POV (work on it):

- our internal parser representation isn't stable and frequently changes from major release to major release, occasionally even in minor releases.

- we'd need a lot of additional verbose error checking code to make sure only sensible AST can be submitted

- there's two forms of parser AST "raw" and "analyzed". The former is produced without access to the database, the latter has done validity checks, type lookups, etc. The former is what you could validly produce remotely - but it's also more cumbersome to deal with.

I think there's arguments for supporting a useful different representation than SQL, but I think doing this with the already existing AST isn't going to fly. And a new proper language / representation would be a lot of work...

Edit: formatting


> And a new proper language / representation would be a lot of work...

I don't think a new language would be needed, and most of the work has already been done in projects like LINQ. You'd just need to expose a stable AST front-end, which would be a lot easier than a whole new parser. I mean, it's just relational algebra right? ;-)


> You'd just need to expose a stable AST front-end, which would be a lot easier than a whole new parser.

"Just". You still need a parser for that AST if it comes from the client, you need to convert that into the actual internal representation with validation and everything.

> I mean, it's just relational algebra right? ;-)

It's really not. A lot of SQL can't conveniently expressed in relational algebra (could luck with recursive CTEs) and a lot of other queries aren't as efficiently representable.


> "Just". You still need a parser for that AST if it comes from the client, you need to convert that into the actual internal representation with validation and everything.

Yes, just: http://lambda-the-ultimate.org/node/5375

> A lot of SQL can't conveniently expressed in relational algebra (could luck with recursive CTEs) and a lot of other queries aren't as efficiently representable.

Clearly some language is being used to reason about these things, and evidently, this language is reflected in some SQL-like surface syntax. It would be no more a burden to provide an extensible AST than an extensible SQL-like language. In fact, it'd be considerably simpler since tokenizing, parsing and verification would all be simpler.


In MySQL's new protobuf-based protocol you can express expressions as Protobuf messages https://dev.mysql.com/doc/internals/en/x-protocol-messages-m...

This isn't usable for all sorts of all statements as, among other things, it's hard to maintain a stable API for all aspects. But might be useful for common cases. If there's more you need I'd be interested in a feature request! (I'm working in MySQL's clients team)


That's awesome! Do you expect significant latency improvements to come from this?


There shouldn't be much of a difference in performance, there are some areas on the server we can improve in the longer run and hope is that clients can write requests more efficiently ... in a typical situation the parsing of the query hopefully isn't the bottleneck, but the actual execution or processing of the result in the application ;-)


This is essentially what an ORM does, e.g. SQLAlchemy [1]. Except instead of providing a slightly nicer API to generate query strings, an ORM tends to be more tied to the host language and use its object system to abstract out most of the SQL complexity and make declaring SQL tables/queries look like normal operations on the data themselves instead of the database. Django's ORM [2] for example autogenerates joins for you under certain query conditions, which is super handy.

[1] http://docs.sqlalchemy.org/en/latest/orm/tutorial.html

[2] https://docs.djangoproject.com/en/1.11/topics/db/queries/


I'm aware of SQLAlchemy Core/ORM (and ORMs in general), my question is more like why do we have to concatenate strings in these project's internals instead of using some API to pass query structures without intermediate composing/decomposing. For small queries that ORMs often do, it feels like needless and annoying overhead.

For now, even '?' placeholder escaping is actually manually-printf'ed under the driver's hood, afaik, so in the end we send plain text to the server. SQLite probably avoids that via sqlite3_bind() call, which allows to pass SQLITE3_STATIC string args if you want, but that's the benefit of in-process engine.


You need some way of serializing your data structures to a wire format, and SQL has worked ok so far.

Besides a potential of a performance gain in parsing at the server side, I'm not quite sure there's much gain you would get - there might not be that much difference in serializing those data structures to "plain text" vs any other format you come up with.


See the part about support-costs in a different link.

But even more, I think it's "in the nature of programming" that for a given system you have some parts are regularly structured and externally visible but other parts that seem like they ought to be visible and changable from outside but which aren't and can't easily be made so.

This might be because they use intermediate variable or it might be because they have changed the implementation over time or for a variety of reasons.

And this ultimately comes down to the difference between the way intuitively you'd think X feature should be implemented and real, much-more-messy way X feature is actually implemented - the difference internals and an API.

Having internals and externals be equally easy to work with was the dream of the object orientation but sadly that has remained a fantasy and unfortunately looks like it will remain so.


I highly doubt that placeholders are interpolated into the SQL statement. Doing so will defeat the purpose of having them in the first place. Rather, they are tranferred out-of-band to the SQL "engine", as it were, and are used as straight up variables.


I checked it for PostgreSQL right now, and it seems that libpq can send them OOB via PQexecParams() and networked integer buffers. But that's not the case with OpenResty's pgmoon driver, which doesn't have such an interface. E.g. Lapis framework built on top of that simply escape_literal()-izes all arguments and interpolates right into query text. So, YMMW with specific (or generic) drivers.

Placeholders are there for those who tries to concatenate queries by hand; having them properly escaped in protocol should not defeat client-side security purpose.


You can never properly escape user input, so this escape_literal() is a ticking timebomb.


How come it's impossible to properly escape user input? I get that it's hard if the format is complex, but I don't see what makes it categorically impossible, especially for a simple format like an SQL string. You just double every single quote and then surround the whole thing with single quotes.


> ...every single quote...

And then you come across \u0027 and you're screwed.


Do you know any decent server that interprets 6-char \uxxxx sequences as part of its top-level query syntax? I mean, how do you safely pass '\u0027' literal to it anyway? If you can't, then its literals and escaping are broken by design, so it is good to know from the start of using it.

Edit: removed double backslash from literal to not confuse it with host language's escaping.


If you use a broken RDBMS, maybe. The standard is quite clear, and that certainly doesn't work in PortgreSQL.


Depends. Server-side prepare works the way you described while client-side prepare interpolates placeholder values. Both are widely used.


Having the interpolation done in one central place and coded by someone experienced sounds as good as OOB to me.


With that central place being?


Lapis db:query("? ?", a, b)

Edit: this call is most low-level db interface in lapis apps.


I know a lot (a lot) of non-technical SQL users. SQL is a widespread abstraction from the underlying databases. It's very powerful.

What you describe is useful from a programmer's point of view, but maybe the wrong abstraction. You could write such a thing which actually generates SQL from an API like you've defined. You could write an alternate ODBC engine.

But there is a lot of overhead for database authours to actually provide an API like you describe. It's far simpler to provide something SQL-compliant with an ODBC or JDBC driver and now you can plug in almost anywhere.

I have used a database engine that exposes primitives to build filters and similar programs to what you describe, and selectively push them to the server. It's not a very deep abstraction.


Part of the reason is support. Long ago Britton Lee[1] in fact provided an AST API mainly for tool vendors. They stopped doing that when one of their customers called them asking for help with the "GUIDE BY" clause. Nobody at BL knew what a "GUIDE BY" was since it was something implemented by the frontend, but that didn't matter to the customer.

https://en.wikipedia.org/wiki/Britton_Lee%2C_Inc.


Product Manager for MySQL here. Yes, I agree with you that the reason is most likely support.

Consider SQL; it's declarative and highly expressive. Your users can craft queries that you did not consider, and you have to make sure they work consistently between versions. It's a very hard problem.

Creating more public APIs? It's a hard sell that vendors are sensitive to. Reducing the surface area makes it easier to add future improvements.


KDB lets you do this.

    q)-3!parse"select last id from t"
    "(?;`t;();0b;(,`id)!,(last;`id))"
Indeed you can use this to modify a user-entered query easily:

    p:parse"select last id from t";
    @[p;2;'[1 rotate;,];enlist(=;`user;enlist u)]


This is worse than a useless comment. It is downright harmful to getting people to use KDB. Maybe you should explain it to people?

For those reading:

parse is a function that takes a string and returns the parse tree and -3! (gotta love APL) is execute that takes a tree and turns it back into a string.

The parsed syntax it spits out allows you to compose pieces of queries or modify them (there are easier ways such as queries return tables that can always be in inputs to other queries).

Maybe the author would like to add more context?


> This is worse than a useless comment.

Oh no, please don't attack others like this in comments here. Even if you clearly mean well (which I'd say is the case here), it can only cause harm.

geocar's comments are consistently excellent (including for explaining the niceties of kdb and array languages to the community) and having fewer of them would make this place worse. Not everything needs to be spelled out, and besides, sometimes there just isn't time to do so. It can be a lot more work!


As a long time K evangelist (before KDB 2 even) I understand that but this post just played into all the bad stereotypes of KDB+ being jibberish and line noise. Even for somebody veered K it took me a minute to figure out -3!. He isn't a@kx.com posting to the K mailing list.

I did vote the comment up before responding to him as KDB, but I'm not sure why really.


Well, "companies" is a string, isnt't it? Ultimately, there will be strings down there.

Working with pure AST is a huge pain. One can trivially construct something that can not realistically be represented in an SQL dialect of choice.

LINQ is the (IMO) one of the best ways to do query composition, and with the nameof() operator you can really get away without a single hardcoded string in your application-level code.


>LINQ

Ehhh...something as simple as a left-outer join is near ungrokkable for me in LINQ. Cross apply / lateral join on a table-valued function? Forget about it.

Not to mention - due to the way LINQ works a lot of errors that seem like they should be caught at compile time don't get caught until run time (e.g. you wrote something in a where clause that can't be translated to SQL).


LINQ is like flying a plane. It will take you where you need to go extremely fast, but will also crash you into the ground if you don't know what you're doing.. Personally I get immense value from it daily, but I know when not to use it as well.


I didn't know about these shortcomings (I don't use LINQ much). Could you give examples for each one?


Here's a left outer join (taken from https://stackoverflow.com/questions/3404975/left-outer-join-...)

    from c in categories
    join p in products on c.Category equals p.Category into ps
    from p in ps.DefaultIfEmpty()
    select new { Category = c, ProductName = p == null ? "(No products)" : p.ProductName };
You're doing something called a 'group join' (second line) 'into' a new identifier (ps) above. I'm guessing it stuffs each matching result in 'p' into a list, which sorta makes sense. The second part mystifies me though - somehow 'from p in ps.DefaultifEmpty()' flattens the list back out into a denormalized result set while simultaneously preserving entries in 'c' with no matches.

What does 'p' point to now? Why is it seemingly created twice? Is 'ps' still available? For a simple left join this is way, WAY too convoluted IMO.

The second thing I'm talking about you can read about by finding out when EF throws a 'NotSupportedException' at runtime. (e.g. https://stackoverflow.com/questions/25337974/entity-framewor...). Basically not everything you can do in LINQ can be translated to SQL. But it won't tell you at compile time. You have to wait till run time, which is garbage IMO. The example I linked is trivial but it's easy to miss this kind of thing when you're optimizing ORM use (and trying to move as much computation to the DB as possible).


Dynamicaly adding filters or parts of the query, be able to have different parties to construct query, do transformations, replacing objects,... that would be great. And having it in well known, standard format and supported in languages, and have execution plans the similar way, that would be the dream... That would really simplify the life.


I started to write something a bit like this a while ago for SQLite and C++, https://github.com/Gloopio/SQLighter we also have a more complete/full featured version that we use internally, although I was not sure if there are enough people that would want it - for it to be turned into an open source project.


IIRC, stored procedures let you do something like this. If you compile your query as a stored proc, there's usually an API using which you can pass values to it as parameters in their "native" form, i.e. without having to convert them to strings first.


Yes please! A couple years ago I wrote a sql-like layer to federate queries across a couple different databases, with the intent of unifying the data we had spread out over postgres, cassandra and S3. The best decision I made was not writing a real query parser and just using s-expressions, which is pretty much what you're asking for. It was great.


Generating an AST client side, then serializing it, can take a substantial amount of CPU usage especially in languages like JavaScript or Python. If you do include the ability to make an AST, it would be good to have a facility for parameterized queries so that you only have to build and serialize the AST once.


If you prefer Java to C, I recommend taking a look at the source for H2, a pure Java SQL database engine. It is surprisingly easy to follow.

I've contributed to the H2 source code a few times, and in the process I got to learn much about how a database engine is implemented.

H2 home: http://www.h2database.com/

H2 Github repo: https://github.com/h2database/h2database


Allen Holub did something like that for his patterns book, btw :)


Hi, TUM has a lecture for this in which the students build a database in C and learning fundamentals along the way.

I attended a few years back and learnt an awful lot, this is the course page:

https://db.in.tum.de/teaching/ws1516/imlab/index.shtml?lang=...

Haven't found the accompanying slides yet, but they must be somewhere on their website.


Updated lecture with slides https://db.in.tum.de/teaching/ss17/moderndbs/


Thanks for sharing this!


The language to access a relational database is always SQL. So much than a lot of people doesn't make any difference between SQL and a DBMS. Aren't we in the same situation as Javascript on the web? I always had wondered why there is no effort like Webasm for relational database. I would love to see a more vibrant scene with different syntax, paradigm and ideas.

What would be the rust equivalent of a relational language, the Go equivalent?

As an example, I had a lot of pleasure to read that blog : http://www.try-alf.org/blog/2014-12-03-what-would-a-function...


For rust it would literally be Iterators


Off topic: I appreciate how fast that site renders.


To be honest, it always shocks me how long it takes for the average modern webpage to load. All the build up of things, the time it takes, and how unexpected it still is that you think you can click somewhere, and just 0.1 second before you click something else renders and moves everything up or down, making you click something else, which causes another page to load etc. I have a relatively new Mac with 8GB RAM - it should just work, but no...


pingdom's insight is interesting:

  Other 	43.1 % 	20.36 KB
  Script 	30.9 % 	14.62 KB
  Image 	15.5 % 	7.33 KB
  CSS 	5.3 % 	2.52 KB
  HTML 	5.1 % 	2.40 KB
Split out between:

  Image 	25.0 % 	2
  Script 	25.0 % 	2
  Other 	25.0 % 	2
  HTML 	12.5 % 	1
  CSS 	12.5 % 	1
So just 8 requests to build the page. 14.05 KB of the javascript is google analytics, so likely cached in the browser already.

Nice, small, simple = quick :)


I'm also shocked. About two things: by the pure speed, and for me being surprised how fast it loads.

Being surprised about webspeed in 2017 sounds like heresy.


Still waiting for SQLite4 with it's awesome LSM engine :P (shameless plug I ported the engine over to windows https://github.com/maxpert/lsm-windows and it works :D). I wish somebody can do a detailed architecture about SQLite4, that kind of knowledge is scares and valuable.


FWIW, I'm presenting a talk on exactly this topic tomorrow at 12:30 at the http://www.db-tech-showcase.com/dbts/tokyo - unfortunately I do not think it will be recorded.


Slightly off topic, but thank you for your work on sqlite and fossil. I've used both daily for years and am always impressed by how consistently fantastic they are.


LSM engines tend to have great average but inconsistent performance. How does sqlite4's LSM fair?

Last time I looked at benchmarks, sqlite4 with an LMDB backend was faster on average and had much lower variation in performance. Is that still the case?


That's pretty cool really.


Is there anything even close to SQLite in terms of reliability? Its pretty much the standard for embedded systems.


If it's for embedded systems, Polyhedra from Enea is used in quiet a few high reliable and rugged systems.


Interbase is similar in purpose and has a good reputation. Not free or open source though.


There is Firebird which is essentially an opensouce InterBase (IIRC it is forked from IB5 source). It is probably more widely used than new commercial versions of IB. One interesting feature is that it is probably only opensource DBMS that not only works with multiple engines accessing same database stored on Windows share but it is actually supported usecase.


what about wiredtiger?


This comes right at a time when I started attempting something similar (though I don't care that much about documenting it)

You might want to read some overviews on:

  * postgres atomicity: https://brandur.org/postgres-atomicity

  * postgres disk format: http://rachbelaid.com/introduction-to-postgres-physical-storage/
I have not read much on replication, though. Does anyone have any pointers?


Typical approach to replication is to somehow record changes made to database and somehow push these to slave replicas. There are different approaches to both "somehows".

For the format of the changes themselves there is continuum of different levels of abstraction that range from propagating the user's commands to slave servers directly (this is essentially what mysql does) to propagating commited writes to on-disk files (which is what default pgsql replication is about). Both extremes have their tradeoffs: in the SQL-level case the user has to make sure that results of transactions are deterministic (which is more complex than just not calling functions like rand() and now()) and in the physical page case all involved servers need to have exactly same on-disk layout (ie. same cpu architecture and you have to copy the actual on-disk files for initial synchronisation, dump and restore usually does not work)

For the actual propagation there are two approaches: write the change records somewhere and let the slaves read that at whatever pace they can or push the changes to slaves before the transaction is reported as commited to the client.

The full spectrum of PostgreSQL replication solutions in fact completely fills the space of these two choices and you probably should select carefully what matches your usecase best. (The two significant questions to ask is whether you can live with replicating the whole database cluster as one indivisible unit and whether you can live with possibly stale state on readonly replicas. When both answers are yes then streaming replication is what you want. If answer to second one is no, then you should probably redesign your application such that the data that requires this kind of guarantees is as small as possible)


> Does anyone have any pointers?

Sure, here you are:

    void *x= malloc(16384)
(Couldn't resist)


Welcome to systems programming, home of bad jokes and worse problems...


"I called the janitor the other day to see what he could do about my dingy linoleum floor. He said he would have been happy to loan me a polisher, but that he hadn't the slightest idea what he had done with it. I told him not to worry about it - that as a programmer it wasn't the first time I had experienced a buffer allocation failure due to a memory error."

---

"Q: Why did the concurrent chicken cross the road?

A: the side other To to get"

(These were copied from the S.O. website)


One piece of advice I'll never forget: Never grep a yacc by the i-node.


I thought replication pointer would come from strdup() at least.


Memcheck: 16384 bytes probably lost.


The most common strategy for database replication is to copy over an initial snapshot then synchronize the replica by replaying transaction logs. So essentially replication is continuous incremental recovery on another server.


This strikes me as an interesting way to define a syllabus for comp sci / internship

What other build X from scratch would be needed to cover much of what we think a grad should know?

- build a language (compiler) from scratch

- build a graphics engine from scratch

- build a network stack (and firewall ?)

Err?


I built webserver from scratch as a term project at university. Due that the fact that it actually did something self-contained and useful I got away with turning in C99 project for C++ course :) (and probably also because I wrote my own talloc/apr_pool-style malloc wrapper to base that on, when presenting the thing that was about as far as I got and then was told, "well, this is C++ course, but you got A and now go away")

For another MSc. level course the whole term project assignment was: "choose or design some language and try to somehow conjure compiler and if required VM for that". "somehow conjure" is significant because nobody said you have to do that from scratch, hacking tcc/gcc/lcc/llvm to accept your language was fine and basing your language on top of some Smalltalk VM (preferrably ST/X ;)) or Common Lisp was essentially encouraged.


Adding to the 'from scratch' list...

- build a simple virtual machine (pair with compiler)

- build a distributed storage system (perhaps like S3)

- build a traditional file system, perhaps as a FUSE (Filesystem in Userspace)

- build a job scheduler (bonus: distributed cluster work scheduler)

- build a memory allocator and/or a garbage collector

- build a web server

- build a crypto package (for learning only; don't really do this!)


Realistically as an undergraduate people might have time for one "build X from scratch", apart from tiny examples. My own attempt was a filesystem.


- invent a wheel from scratch

... oh, nevermind, sorry, I failed to notice that you said coursework/internship, not portfolio.


This sort of question could make a really cool Ask HN topic!


Collaborative text editor (e.g. Etherpad/Google Docs)


Very nice! If you wanted it to save to disk instead of living wholly in memory, how would you do that in C?


i think that topic is larger than the presentation thus far. hopefully memory mapped flash will obviate the need for alot of this but:

   o you need to make an on-disk structure (i.e. btree) to let you search your tables (indices)
   o that has to be fronted by an in memory cache
   o your on-disk structure should maintain consistency even if the power fails and some writes get lost
     - often but not always this is facilitated by keeping a separate write-optimized structure called a write-ahead-log
     - if you use a WAL, you'll also need to implement the replay mechanisms to get the primary indices back up to date on a failure
   o because the disk operations are expensive and high latency you'll need to start managing concurrency explicitly
  
back ends are alot of work. unfortunately this is a place where the lack of decent concurrency mechanisms in your language and OS interface can really cause alot of headache.

pedagogically, i guess you would start with a completely synchronous non fault tolerant btree? or a maybe just the log, and introduce trees as a read accelerator?


What are some good readings / places to start on this topic?


thats a really good question, unfortunately everything I learned was by working with people who knew more than I did. its pretty sad that alot of systems work is that way.

this book looks to be pretty comprehensive, and explicitly discusses on-disk structures, write ahead logging, and recovery.

  http://infolab.stanford.edu/~ullman/dscb.html
relational languages and transactions get alot more playin the academy..i think its because you can make sense of them in some abstract way without getting sucked into involved discussions about caching heuristics and OS write guarentees, scheduling and fussy performance characteristics


I like http://dataintensive.net/, though it's a slightly different focus.


> Very nice! If you wanted it to save to disk instead of living wholly in memory, how would you do that in C?

The idea would be to map your memory space to disk space (using random access as provided by C stdlib); the problem would then be:

1. Caching - i guess you should then provide your cache implementation. And i would guess this opens a can of sync problems...

2. Finding a way to efficiently write the data on disk (i.e. which data should stay contiguous?; how much "slack" space should i leave on "extents" (oracle slang for a container for many data blocks)? Should you do column-store?

3. Compacting the data (removing slack)

etc etc.

I think it's not easy at all !


mmap!

I mean, not really, but it's a surprisingly viable starting point for simple problems. I've shipped it. Some production-grade software still uses mmap, albeit with a bunch of additional complexity to make sure it works okay in weird cases.

See this old comment thread I dug up that might be relevant: https://news.ycombinator.com/item?id=3982514

It seems like Redis might currently make use of mmap for some of its data, but I couldn't find an up-to-date source for that, just some old blog posts by the developer.


The trouble is that naive mmap doesn't give you the control over write ordering that you need to implement reliability.

Sadly, about 50% of database design is trying to ensure the thing has a decent chance of starting up again if it crashes or loses power. This mostly consists of fighting file systems and disk caches.


I used to work on a system that used mmap for basically everything, including a proprietary database that processed financial transactions. The original production system ran on a commercial Unix. (It's been about 15+ years, but I think it was AIX.)

At one point, we ported it to a different Unix platform which had slightly different mmap semantics. It looked like everything worked, except for one minor detail: the data was never actually synced to disk. Ever. Until shutdown. Unfortunately, since the system ran 24/7, it effectively never synced. First time the system had a power failure, there was massive data loss on startup.

Oops.

(We were able to recover through log replay...)


> (We were able to recover through log replay...)

I call that a win! What was the logging mechanism--something bespoke? What was your solution?


It was bespoke: a proprietary system that essentially used an append-only file containing each transaction request and response.


Very nice. Good planning.


That sounds like a terrible idea even if it were 100% reliable. When you shut down it will have to flush the entire memory at once. If you're syncing to a hdd it can mean several minutes of downtime.


A correctly working system will "flush behind", so there should be no difference between a clean shutdown and a power loss. If there's several minutes of unsynced data that's data that will be lost on power off.


Just going to chip in here to mention LMDB, which essentially provides an mmap btree database with single writer + multi-reader snapshot concurrency, ACID transactions, and is extremely robust.

If you think something may benefit from a shared memory data store, lmdb may be worth considering as a fast, reliable, high concurrency alternative to reinventing wheels with raw mmap + manual sync.


Not the OP here, but why the downvotes? Is this person off-base, or is that a poor solution?


The poster's comment reads a bit like an advertisement, but LMDB would be a good solution for persisting data, in my opinion. It has a pretty simple C API and it's been described by its authors as "crash-proof".


There's `msync`


This is why msync is for. You can sync page by page and ensure ordering yourself.


You think you can, but you can't! Linux always sync's the whole file; see https://lwn.net/Articles/502612/ for a discussion of why fixing this horrible bug is too dangerous.


"You think you can, but you can't!"

Welcome to systems programming....


This is not true after Linux 3.16, commit 7fc34a62ca44 ("mm/msync.c: sync only the requested range in msync()").


Why not really? From the sqlite docs:

  Beginning with version 3.7.17 (2013-05-20), SQLite has the option of accessing disk content directly using memory-mapped I/O and the new xFetch() and xUnfetch() methods on sqlite3_io_methods.


I would say its less that you can’t or shouldn’t use mmap, and more that its not as simple as throwing mmap at it - if you care about crash recovery/not corrupting data (which most databases should care about), but that certainly doesn’t mean that mmap can’t be a part of the solution.


C++ solution (the solution you didn't ask for, sorry) for saving to disk from memory (as some sort of periodic expensive sync) would be to write a stream operator for your objects and stream them to disk. Loading would involve reading them back; would only work on the same OS and architecture.

Alternative approach: For Windows, you could just use disk but pass FILE_ATTRIBUTE_TEMPORARY to CreateFile to force it to be in memory if the file is small enough.


Lmdb ?


Following with interest. I would also LOVE to see a SQLite clone in pure Go. That would be amazing.


Sort of related:

  https://github.com/elliotchance/c2go
> The goals of this project are:

> .. The ultimate milestone is to be able to compile the SQLite3 source code and have it working without modification. This will be the 1.0.0 release.

So if that project does work out, there will be a Go SQLite implementation of sorts. No idea how messy/usable the result would be as a base for development though.

Note - haven't personally been tracking the project at all. More just that your statement prompted remembering it. :)


It's not a clone per se, but there's https://github.com/cznic/ql


I think this one is sqlite: https://github.com/cznic/sqlite


Thanks geodel. Almost a win here, but if you look in sqlite/internal/bin/ you'll see cznic embeds the _entire_ SQLite c source.


I like short variable/type names much as the next guy, but this project takes that to a difficult-to-read extreme.


What use case do you have such that the current crop of SQLite go bindings do not suffice?


Use case is mostly to learn how to be a Go ninja by building something as powerful as SQlite 100% in Go. For anything production grade the existing Go drivers or wrappers for SQLite are perfectly good. It would be nice to get a single binary, no dependency Go program though.


Especially given how battle tested (and unit tested!) SQLite is, while a clone.. would not be.


On a related note (and this seems a good forum to ask), are there any typed databases available similar to sqlite?


I'd just write protobuf "blobs" or other serialization format n in an sqlite text column. If you need to query across the typed data, you could build an index in another column from the data in your protobuf blob.


What do you mean by typed? :-?


SQLite will ignore most type information, instead choosing to focus on "type affinity": https://www.sqlite.org/datatype3.html


I didn't know SQLite was dynamically typed. That's a strange choice for a database, to say the least.

I would look into Berkeley DB (BDB) which is one of the most widely-used embedded databases, albeit not a SQL-based one. Otherwise take a look at the list of embedded databases on Wikipedia: https://en.wikipedia.org/wiki/Embedded_database


Initially (v < 3), all data fields in sqlite were text internally - type information was accepted for SQL compatibility, but ignored. This is actually an improvement.

I'm not sure of the rationale; perhaps it's easier to keep the code / database files portable amongst different systems this way, and since it was designed for embedded use, many systems (and thus incompatibilites) were envisioned?


Berkley DB just stores byte arrays, so it is not 'typed' either.


Yes SQLite with type check constraints.


Could you expand on this? I've not come across this before, and I'd be very interested in using it, if it's easy...


SAP (formerly Sybase) SQL Anywhere


I'm very interested in seeing DROP implemented in this. It will be cool to see the author handle fragmentation.


I don't know why but unfortunately the various parts are effectively unreadable on an iPhone.

The screen zooms in and out as you scroll, so that when you scroll the text is a little too tiny and when you stop the text is cut off on the left edge.

I'm going to give it a try on my iPad. Just a heads up.

Edit: works fine on an iPad. Interesting.


But the reference SQLite implementation is a single c file...


It's actually written as several files and the "amalgamation" you download from the website represents those files concatenated together.

See this explanation: http://sqlite.org/amalgamation.html


> The amalgamation file is more than 180,000 lines long and over 6 megabytes in size.


Yes, but the article also says that file is generated by a makefile from more than a hundred .c and .h files, and that it's done for ease of distribution and because optimizers do better with a single compilation unit.

It's not like the authors spend a lot of time working on the amalgamation and that's the way the project is developed.


To be fair, only 125,367 lines of that is actual c.


Does anyone know what software they used for creating those colored diagrams?


They are copied from http://sqlite.org/arch.html, which is built from (see http://sqlite.org/download.html) data in Fossil at https://www3.sqlite.org/cgi/docsrc/timeline, which leads me to e.g. https://www3.sqlite.org/cgi/docsrc/artifact/d22a2c9642d584b8, which seems to contain the right strings, so presumably is the file describing https://www.sqlite.org/images/arch2.gif and its copy on https://cstack.github.io/db_tutorial/assets/images/arch2.gif

That's fig source. So, I guess this was made by some GUI that uses fig as its native file format. My guess would be xfig (http://mcj.sourceforge.net)


This is exactly what I wanted. Thanks for sharing.


Isn't SQLite an application calling for a clean implementation in Rust?


Why? SQLite is one of the highest quality codebases out there! For fun, sure. But I think you are underestimating the size of the SQLite project and it's tests. Just because the library is small it doesn't mean it's simple.


It's massive testing means that it would be easier to implement in another language; you'd be pretty sure of conformance with a port.

That said, it's one of the most reliable projects in existence so why rewrite it?


To reboot WebSQL?

IIRC Mozilla's main gripe with it was the fact that there was only one implementation, and no standard beyond "Whatever Sqlite3 does".


How would a rewrite change that? Addressing that objection would require some sort of open standards process and commitment to follow it, which is a rather orthogonal concern and would be somewhat expensive as well.


You'd also need to turn "whatever sqlitev3.x does" into a spec, indeed which is a lot of work that will never happen as there's only one implementation that passes the test suite.


Same reason as always: https://www.cvedetails.com/vulnerability-list/vendor_id-9237... I'd guess around half of those wouldn't happen in rust.


It would be a colossal waste of time. At 125KSLOC code + 91MSLOC tests, the engineering effort is near-prohibitive. There is no single area where SQLite3 would be severely lacking. Just look at SQLite4 and say if you care about any of the differences.


It's been well-tested enough not to matter. Plus that testing would need to be duplicated. Very costly. Uch.


I think you could repurpose the original test suite as is. Rust can interface with C seamlessly.


I wish someone would show how to rewrite Redux for React Native.


I can't tell if this was a joke or not, I'm assuming that it is.

If not: Redux works the same way for React Native as it does for React or anything else.



SQLite had a very rigorous testing methodology to comply with aeronautics standards. Cloning the tests would be a decade worth of work. But cool idea nonetheless. https://sqlite.org/testing.html




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: