I was shocked to learn that was 15 years ago when I looked up the link to share, but if you are interested in the topic of "how to implement a database" it may be worth a look.
For what it is worth it was listed (not by me) on the C2 Wiki on the "Programs to Read" Page, where it was described as "[A] database written in Java with good unit tests and ShortMethods." 
Both statements are true, for a complete working example of a production database (it supported a commercial product for at least 10 years) it is actually a pretty accessible and well documented code-base.
The project is called AxionDB and can be found at .
* LMDB is legendary for its performance, but it is source code is kind of hard to read. Found there's a pretty decent port to Java which looks a lot more readable .
* LMDB doesn't support SQL. But there's this SQL implementation: Apache Calcite . Creating a DB using llmdbjava + Calcite sound like an interesting project.
It's been a very long time so I'm not sure I can name many specific examples, but IIRC some of the topics that became very clear were things like:
* the importance of the order in which JOINs and WHEREs are applied (to limit the number of rows being accessed per step)
* the relationship between columns that are selected and those that appear in WHERE and ORDER BY clauses.
* the value of tables with a small number of columns (limiting data that must be read per row)
* the cost/complexity of variable-width columns (VARCHAR vs a fixed-length string), again because of the time and complexity required to do something like "skip ahead three rows" in a data file
* the behavior of CLOB/BLOB types (stored external to the "main" table content)
* the various types of transaction isolation levels and the conditions in which it becomes hard or impossible to guarantee isolation without table or row locking
Also, there's a post at http://heyrod.com/articles/radio-blog/pleasures-of-profiling... that describes a session of performance-tuning on the index implementation that gets into some of the nitty-gritty implementation topics. (But a lot of the issues addressed there were an artifact of Java's primitive vs. object representation, which has been reduced by things like generics.)
The more we used HSQL the more it became clear that it was more like a SQL-parser wrapping a simple key/value store than a full-on ACID database. We created AxionDB precisely because we needed the kinds of capabilities you mention and at the time HSQL did not provide them and wasn't remotely architected to support them.
To be honest though, creating a moderately robust RDBMS from "scratch" turned out not to be the most ambitious or complex part of the overarching project that spawned AxionDB. The harder part was trying to use Java's primitive, built-in HTML-renderer to create something approximating a fully featured browser. The effort and complexity behind something like Gecko, WebKit, Edge, Blink, etc. is very easy to underestimate. It's a hard problem, made much harder by having to tackle the kinds of content you find "in the wild. Frankly building a database was a much more straightforward problem than that.
I don't want to put more content under a terrible post, but the best resource for this material Jennifer Widom's MOOC.
Build a working Tetris game from literal logic gates up.
Unless the project specifically needed to leverage features of the language, or a web browser, it's an incredibly poor choice for building anything with well maintained abstractions. Or anything at all ready, when the language is covered in warts.
I imagine the author hasn't yet discovered for themselves why it's a poor choice, given only a key-value store has been implemented (using JSON.stringify, no less)
Have you worked with a JS language for any extended period of time? What about that made you not even want to consider it for anything? I’m really curious, I’m not trying to set you up - I’m really not exeperienced enough to do so :)
Edit: to be clear because of general curiosity and because I hate bringing work home, most side projects I work on are in some other language.
Not really. The author doesn't leverage any kind of static typesystem, which would at least mitigate some of the warts. He's chosen Node.js instead of leveraging the browser, so no free visualization layer for doing anything interesting with, unfortunately.
> Have you worked with a JS language for any extended period of time?
I can say with confidence that the language is something that we're absolutely stuck with, and I still have no idea why somebody would implement a pedagogical database with it (well, then again, they haven't; they've implemented a key-value store, which is trivial).
To offer an alternative, they could have chosen something boring but everywhere like Java, which is just as accessible to those with less experience. Then they'd have the possibility of doing fine-grained parallelism, file access, designing abstractions that fit within a statically typed language.
I mean, tables seem like doubly indexed arrays.
In my mind it wouldn't be readable but it'd be maybe less lines of code than other languages?
I have never thought about building a database that takes into consideration CAP theorem and ACID compliance, sounds super tough. I'm usually happy enough when I learn something neat about Postgres.
Maybe you’re right, but this page doesn’t actually teach you databases, and most of those courses don’t actually teach you programming.
At least not efficiently.
I have worked with JS for a long time, and I don’t hate it, but it’s such a terrible language and environment that the most popular part of it is literally a strict syntactical superset of of it.
I don’t particularly like Typescript by the way, and I don’t think it’s really that useful, but you can’t deny that most people do.
Of course on the client side, all the innovations and all the talent lies with JS, and there is some advantages for using JS for your whole stack. Those advantages end at the DB though, at least in my opinion.
This isn’t a problem, you can use Prisma for Postgres, and there are decent drivers for mssql, but I’d never advise people to use nosql unless they had a very specific reason for doing so, and I can’t think of one.
js really is the most accessible
One of the best examples of this is CS50x from Harvard, which uses it (and other cloud services) to give students an IDE, automatic helpers, debuggers and code checkers.
But I guess it depends on the circumstances, and not knowing yours, then perhaps JS is a good choice.
Even a simple function in JS taking one variable abstracts a good deal of that away from you because it handles things like memory allocation for you.
A lot of modern hopeful programmers that make it into our interviews can barely explain what the new keyword of a OOP language actually does, and way too many have no idea what a stack is.
This isn’t useful when you write CRUD web-applications for a few thousands users, which is arguably a lot of modern programming is, but it’s extremely useful if you ever want to build something original.
Which is actually the key issue. You seem to think building a water pump isn’t a beginner project, but why isn’t it? It’s one of the most basic programs you could write. In fact it’s so basic that you could solve it mechanically, without the use of programming, if you have running water to power the timed open close mechanism of the pump and pull the water.
By contrast, a web site is infinitely more complex. Only you think it isn’t, because other programmers have build most of your tools for you. Which is great, you should stand on the shoulder if giants every time you can. What isn’t going to be great is when you’re tasked with solving a problem no one has solved for you first.
The abstractions are what's important, memory layout is an implementation detail. Even when you're writing C or assembly, you're still thinking in terms of data flow and logic, you're just having to do a lot of the work manually. Learning the abstractions without the baggage of the implementation detail will get 90% of people 90% of the way. Once you have that solid foundation, you can go on to learn assembly or C if you need it because it's not that big a leap from a coding mindset to a hardware control mindset. But to go straight from no coding experience to hardware control is going to be more difficult for a new student to wrap their head around.
I’m not saying you shouldn’t use node packages. But learning to do that as your first steps into programming is robbing you of learning how to use code to solve problems.
I don't think the JS part is the issue. Node.js does I/O, files and co. The issue is that the article doesn't teach how to build a database at all.
It doesn't explain how to efficiently persist and fetch data from a file, indexing strategies with trees, concurrent file access,locking, basic transaction... that's what I expect from a tutorial about how to build a basic DB system.
Why choose a language plagued with warts and a half-baked ecosystem for something pedagogical? A language with all sorts of peculiarities from the '95 browser era, and not at least leverage browser technology? A language which gives you very non-interesting coarse control over resource, and has no concept of parallelism?
EDIT: I meant to add: But, I think this might be beside the point. If you are looking for a language than a large number of people can more-or-less read and write, JS is a pretty good choice. If your objective is teachablity/readability rather than production-quality performance or capabilities, I think JS is probably on the short-list of candidate languages.
A lot of people without a formal educational background start with established scripting languages.
Using coffeescript or typescript would be a worse choice for learning, even if they counter some disadvantages of the base language in question.
Does anyone have DB internal book recommendation that inst' boring as hell.
Fair warning: this is not for beginners. If you find the going too hard, start with a DB textbook.
Also, that site doesn't seem to include the papers themselves. But most of those papers are very famous, so if you search for the titles, you should find copies. Worst case, you might need to visit a university library.
which one can teach how to build a DB system from scratch? I'm not talking about SQL theory or implementing a SQL parser but the actual persistence, indexing part.
A reasonable approach might be to start with this high-level paper, and follow papers they reference until they get specific enough to address your specific questions:
Joseph M. Hellerstein, Michael Stonebraker, James Hamilton.
Architecture of a Database System. Foundations and Trends in Databases, 1, 2 (2007).
Another system that might be worth studying is SQLite.
Transaction Processing: Concepts and Techniques (The Morgan Kaufmann Series in Data Management Systems)
It's specific to MS SQL Server but I found the chapters on log file management, indexes & isolation levels to be accessible and enjoyable.
For anyone else who is interested in learning how to build a database, can I thoroughly recommend following along with Andy Pavlo's Advanced Database Systems course from CMU. Every lecture is accompanied by reading lists, notes, and assignments. Whats more, I find Andy's style to be very easy to parse even on complex topics.
Even if you think you know a fair bit about this domain, you will likely learn a lot!