
LearnDB: Learn how to build a database - niklabh
https://learndb.net/
======
rodw
For complicated reasons I was involved in a successful project to develop an
open-source, Oracle-SQL-compatible, transactional-integrity-preserving,
extensible database designed for in-memory performance in Java. In the process
of this development "suddenly" the reasons for a whole bunch of performance
and syntax oddities in databases like Oracle and PostgreSQL became very clear.

I was shocked to learn that was 15 years ago when I looked up the link to
share, but if you are interested in the topic of "how to implement a database"
it may be worth a look.

For what it is worth it was listed (not by me) on the C2 Wiki on the "Programs
to Read" Page, where it was described as "[A] database written in Java with
good unit tests and ShortMethods." [1]

Both statements are true, for a complete working example of a production
database (it supported a commercial product for at least 10 years) it is
actually a pretty accessible and well documented code-base.

The project is called AxionDB and can be found at [2].

[1] [http://wiki.c2.com/?ProgramsToRead](http://wiki.c2.com/?ProgramsToRead)
[2]
[http://axion.tigris.org/source/browse/axion/](http://axion.tigris.org/source/browse/axion/)

~~~
emmanueloga_
On the subject of readable database source code:

* LMDB is legendary for its performance, but it is source code is kind of hard to read. Found there's a pretty decent port to Java which looks a lot more readable [1].

* LMDB doesn't support SQL. But there's this SQL implementation: Apache Calcite [2]. Creating a DB using llmdbjava + Calcite sound like an interesting project.

1:
[https://github.com/lmdbjava/lmdbjava/tree/master/src/main/ja...](https://github.com/lmdbjava/lmdbjava/tree/master/src/main/java/org/lmdbjava)

2: [https://calcite.apache.org/](https://calcite.apache.org/)

~~~
snaky
Maybe it would be better to start with [https://github.com/leo-
yuriev/libmdbx](https://github.com/leo-yuriev/libmdbx)

~~~
emmanueloga_
This looks super interesting! Although it still looks more complicated to
understand than lmdbjava. I feel any C codebase will be more difficult to
follow since there's gonna be a lot more code related to memory management
that happens automatically in a GC language like java.

~~~
snaky
JVM and .NET ports -
[https://github.com/castortech/mdbxjni](https://github.com/castortech/mdbxjni)
and
[https://github.com/wangjia184/mdbx.NET](https://github.com/wangjia184/mdbx.NET)
are listed there as well.

~~~
sorokod
Wrt JVM, not a port, but a JNI interface.

------
Mr_P
How is this #2 on Hacker News? There's literally nothing here about how to
actually build a database (yet?)

Instead, there's just a key-value store implemented on top of a javascript
hashmap and a filesystem.

~~~
capkutay
This has to be a joke? This looks more like 'how to use a k-v store 101 for
non programmers'

I don't want to put more content under a terrible post, but the best resource
for this material Jennifer Widom's MOOC.

[https://cs.stanford.edu/people/widom/DB-
mooc.html](https://cs.stanford.edu/people/widom/DB-mooc.html)

~~~
pedrosorio
I took this back in 2011 and don’t remember there being much on “building a
database”. Is there more advanced content now?

------
dymk
Why would a pedagogical project choose JavaScript of all languages?

Unless the project _specifically_ needed to leverage features of the language,
or a web browser, it's an incredibly poor choice for building anything with
well maintained abstractions. Or anything at all ready, when the language is
covered in warts.

I imagine the author hasn't yet discovered for themselves why it's a poor
choice, given only a key-value store has been implemented (using
JSON.stringify, no less)

~~~
codezero
JavaScript is extremely accessible and the warts can easily be worked around.

I spent a lot of my life being a JavaScript hater and after working with it
for four years I can definitely see how it is really a simple starting point
for a lot of concepts. Sure it has warts. What doesn’t?

Have you worked with a JS language for any extended period of time? What about
that made you not even want to consider it for anything? I’m really curious,
I’m not trying to set you up - I’m really not exeperienced enough to do so :)

Edit: to be clear because of general curiosity and because I hate bringing
work home, most side projects I work on are in some other language.

~~~
dymk
> and the warts can easily be worked around

Not really. The author doesn't leverage any kind of static typesystem, which
would at least mitigate some of the warts. He's chosen Node.js instead of
leveraging the browser, so no free visualization layer for doing anything
interesting with, unfortunately.

> Have you worked with a JS language for any extended period of time?

Most of my day job is working in a _really, really big_ Javascript codebase. I
can say with confidence that the language is something that we're absolutely
stuck with, and I still have no idea why somebody would implement a
pedagogical database with it (well, then again, they haven't; they've
implemented a key-value store, which is trivial).

To offer an alternative, they could have chosen something boring but
everywhere like Java, which is just as accessible to those with less
experience. Then they'd have the possibility of doing fine-grained
parallelism, file access, designing abstractions that fit within a statically
typed language.

~~~
codyb
No stupid questions time, but given all languages, would an array based
language like J be a decent choice for building a database from scratch?

I mean, tables seem like doubly indexed arrays.

In my mind it wouldn't be readable but it'd be maybe less lines of code than
other languages?

I have never thought about building a database that takes into consideration
CAP theorem and ACID compliance, sounds super tough. I'm usually happy enough
when I learn something neat about Postgres.

I thought this would be neat but it's just a kv store in JavaScript.

------
dominotw
I realized how little i knew about how databases work until I watched this
lecture series.
[https://www.youtube.com/channel/UCHnBsf2rH-K7pn09rb3qvkA](https://www.youtube.com/channel/UCHnBsf2rH-K7pn09rb3qvkA)

Does anyone have DB internal book recommendation that inst' boring as hell.

~~~
johan_larson
You might try this collection of papers about DB technology.

[http://www.redbook.io/](http://www.redbook.io/)

Fair warning: this is not for beginners. If you find the going too hard, start
with a DB textbook.

Also, that site doesn't seem to include the papers themselves. But most of
those papers are very famous, so if you search for the titles, you should find
copies. Worst case, you might need to visit a university library.

~~~
interesthrow2
> Fair warning: this is not for beginners. If you find the going too hard,
> start with a DB textbook.

which one can teach how to build a DB system from scratch? I'm not talking
about SQL theory or implementing a SQL parser but the actual persistence,
indexing part.

~~~
johan_larson
Both of them, really. There's a lot to know if you want to build a modern
database system. But no work I am aware of addresses the specific question of
how to build a database system from scratch.

A reasonable approach might be to start with this high-level paper, and follow
papers they reference until they get specific enough to address your specific
questions:

    
    
        Joseph M. Hellerstein, Michael Stonebraker, James Hamilton. 
        Architecture of a Database System. Foundations and Trends in Databases, 1, 2 (2007).
    

For the sort of low-level issues that you are interested in, it might be
useful to study a well-regarded persistence package, such as Berkeley DB.

[https://en.wikipedia.org/wiki/Berkeley_DB](https://en.wikipedia.org/wiki/Berkeley_DB)

Another system that might be worth studying is SQLite.

[https://www.sqlite.org/index.html](https://www.sqlite.org/index.html)

~~~
SQLite
If the Hellerstein/Stonebraker/Hamilton paper is the kind of overview you are
looking for, then a better link for the SQLite equivalent is

[https://sqlite.org/arch.html](https://sqlite.org/arch.html)

------
hugofirth
Not a lot of content yet, but building databases (even toys) is an incredibly
interesting exercise which I encourage many more people to try out, so +1!

For anyone else who is interested in learning how to build a database, can I
thoroughly recommend following along with Andy Pavlo's Advanced Database
Systems course from CMU[1]. Every lecture is accompanied by reading lists,
notes, and assignments. Whats more, I find Andy's style to be very easy to
parse even on complex topics.

Even if you think you know a fair bit about this domain, you will likely learn
a lot!

[1][https://www.youtube.com/playlist?list=PLSE8ODhjZXjYplQRUlrgQ...](https://www.youtube.com/playlist?list=PLSE8ODhjZXjYplQRUlrgQKwIAV3es0U6t)

------
alphabettsy
Why the votes? There’s nothing there.

------
truth_seeker
Since you are creating a new file for each entry, I think BTRFS would be a
good choice.

[https://en.wikipedia.org/wiki/Btrfs](https://en.wikipedia.org/wiki/Btrfs)

------
pictur
just vote for the title.. this place is really very funny..

------
theyoungwolf
why the hell would you share this when its nowhere near done

