
Ask HN: CTM gave CS the “kernel language”. Is there a “kernel database”? - bordercases
In &quot;Concepts, Techniques, Methods of Computer Science&quot; (CTM), Peter Van Roy takes the approach of teaching programming not as one paradigm or another, but rather a hierarchy of features for which each new feature set or stratification, maps the whole set to a new paradigm. (He calls this abstract set of features the &quot;kernel language&quot;.) Thus you build up something like Object Oriented Programming from simple parts, the lowest of which is Declarative&#x2F;Functional Programming. Then you can negotiate the tradeoffs effectively.<p>Currently I&#x27;m diving into databases, and I&#x27;m nothing but confused. It seems easy to get lost in the myriad of technologies and their features. I&#x27;m having a difficult time negotiating the tradeoffs, particularly without a real, central model of what a database is or what it&#x27;s design space is. Is there an equivalent to a kernel language in the database world?<p>The cover graphic of the MIT OCW for databases (labeled 6.830 &#x2F; 6.814) seems to give away that there is a basic set of problems that databases need to solve. So I think that a &quot;kernel description&quot; of databases can exist. I&#x27;m wondering if anyone here has come across any.
======
nostrademons
I think you're looking for a set of roughly-orthogonal concepts that together
make up a "database", right? Basically, you're trying to understand the
different dimensions under which database products might make different trade-
offs, defining the design space? If so, take a look at these concepts:

 _Schema_. This defines how the raw bytes of a data record are mapped into
semantically-relevant pieces of data for your program. The relational model
(behind SQL) is the most commonly used schema for commercial database
products, but also check out things like XML databases, JSON data-stores,
serialization formats like Protobuf or Thrift, etc.

[https://en.wikipedia.org/wiki/Data_model](https://en.wikipedia.org/wiki/Data_model)

 _Indexing_. This defines what data structures are used to quickly _find_
records on disk. Almost all the major products use B-trees, but these are not
the only options: you can also have hash indexes, sorted string tables,
bitmaps, bloom filters, etc.

[https://en.wikipedia.org/wiki/Database_index](https://en.wikipedia.org/wiki/Database_index)

 _Concurrency control_. This defines how the database deals with multiple
clients trying to modify the same piece of data at once. Options include row-
level locks, table-level locks, MVCC, STM, CRDTs, etc.

[https://en.wikipedia.org/wiki/Concurrency_control](https://en.wikipedia.org/wiki/Concurrency_control)

 _The memory hierarchy_. This defines where the data is stored, physically. Is
it on disk, like most data-warehousing stores or distributed filesystems? Is
it in memory, like memcached or Redis? Is it on Flash memory? Is there some
sort of caching scheme where parts of the data are in memory and parts on
disk?

[https://en.wikipedia.org/wiki/Memory_hierarchy](https://en.wikipedia.org/wiki/Memory_hierarchy)

 _Distribution_. How is data split across multiple machines, and then how are
failures in the network handled?

[https://en.wikipedia.org/wiki/CAP_theorem](https://en.wikipedia.org/wiki/CAP_theorem)

Hope this helps. Most data-management solutions (even ones that often aren't
considered "databases", like memcached or Redis or flat CSV files or MapReduce
or the Google Search indexes) can be mapped onto this space.

~~~
bordercases
Yes, this is close to exactly what I was looking for. Between this and a
database management systems book (as per paperwork's suggestion) I think I
would get both "a lay of the land" and "a cathedral to build on" when it comes
to understanding database systems. Thank you.

~~~
kephra
Old but classic is C.J.Date's "An Introduction to Database Systems". This book
covers most concepts required for databases, and the reason why relational
databases had been an improvement over prior key-value, hierarchical, and
network databases.

One could consider relational language as a database kernel language.

~~~
bordercases
OK, I thought relational modelling was more niche than that, thanks. But as
long as we're on the topic of the kernel language model, I'm getting reminded
of the fact that every time we add a stratification to the KL, we don't get a
_better_ language; instead we get a _different_ language, with its own set of
strengths of weaknesses.

For example, if the relational language is the database kernel language
(model?), why does the "degradation" towards key-value pairs benefit a cache
like Redis?

I'm all for principles books on databases right now, so thanks, I'll take a
look.

EDIT: oh, CJ Date wrote "Database in Depth: Relational Theory for
Practitioners", which seems newer. Do you know how they compare?

~~~
pjungwir
I just finished _Database in Depth_. :-) It is short and readable. It's
targeted to working database programmers with a few years of SQL already. I
learned a bit, but it was a bit of a bait-and-switch: the book is really a
polemic about why SQL is not relational enough and people should use the
author's own Tutorial D. Every page belabors that point. More examples are
written in Tutorial D than SQL. I was hoping for some math & theory, and there
was some theory, but there was little effort to connect it to SQL except to
show SQL's shortcomings. For instance I never once saw the term "outer join"
(because Date wants to forbid NULLs). If you are interested in database theory
and trying new things, it might be very interesting. If you are a working
programmer who wants to solidify the foundations, there are probably better
choices (I hope).

~~~
dragonwriter
On the kernel language topic, though, Date and Darwen's work [0] on the
specifications for an ideal class of database languages unifying OO and
Relational models (D; of which Tutorial D is one realization intended
principally for pedagogical purposes, hence the name) and their lower level
work on an abstract relational algebra (A) to underlie D is probably useful in
that regard, as frustrating as some of their work may be to people who just
want to get down to using current SQL-based relational DBs.

[0] available at
[http://www.thethirdmanifesto.com/](http://www.thethirdmanifesto.com/)

------
paperwork
I've wondered this as well. However, I couple of notes. CTM is indeed an
awesome book, but the idea of language kernel is older. I believe academic
papers often start with a minimal language, like the lambda calculus, then
show how it can be extended with one feature or another. If I recall
correctly, I don't think CTM spends any time showing you how to implement
these features using a parser/interpreter/compiler/etc.

If you are looking for a theoretical model of relational databases, you may
look up relational algebra. I'm sure others will chime in about how sql is not
a very good representation of relational algebra; however, if you are not
aware of it, RA is certainly something to study.

I suspect that you may be referring to a kernel database, the way there are
simple implementations of operating systems--written expressly to study how
OSes work. I wish something like that existed for databases, but I'm not aware
of it.

Finally, there are several text books on relational database management
systems. I haven't used one in a while so can't recommend a specific one.
However, these books often show you how to implement various parts of a
database, such as b-treee for storage, indexes, how to design a pipeline to
implement selection/projection/group by, etc. Make sure that the book is
showing you how to implement a DBMS, not how to design a datamodel.

------
bonobo3000
This may be useless, but here is how I would do it. Its kind of related to the
"kernel" idea in that I try to discover the heart of the problem a technology
solves, what the traditional problems are and how a given technology overcomes
them.

Figure out the problem something solves, try a bunch of simple solutions and
see why it fails, ask questions about why things are done a certain way.

Example is much more helpful:

Q.Whats a database?

A. thing to store data

Q.why not use a simple text file?

A.well there is no way to enforce a schema that way. whats to stop me from
adding a line "a,b,c,d" when the schema of the database has only 3 fields?

its probably not good for binary data either.

ok so well use some specialized format

Q.whats an index? why do the use indexes?

A.<do research - find they are B-trees that allow you to maintain an order
over data>

Q. so what happens if multiple people try to write at the same time

A. we could global lock the whole table. so only one write is allowed at a
time.

we could lock on the specific row.

with something like CRDTs, order of writes doesn't matter.

what if we didnt lock at all?

data will be corrupt. this is where consistency models like ACID etc. come in.

So my advice is think deeply about the problem being solved, what the
difficult sub-problems are and then see what kinds of trade-offs different
databases use to deal with them.

The more questions you answer, the more questions you will have, but about
increasingly detailed slices of the DB/framework/etc. This is learning.

------
shotgun
Although tangential to your question, I feel you might enjoy presentations by
Rich Hickey wherein he describes traditional database architectures and
presents the whys and wherefores behind Datomic. For example:

[1]
[https://www.youtube.com/watch?v=Cym4TZwTCNU](https://www.youtube.com/watch?v=Cym4TZwTCNU)
[2] [http://www.infoq.com/presentations/datomic-functional-
databa...](http://www.infoq.com/presentations/datomic-functional-database)

------
wcarss
This is a very naive (and perhaps unsatisfying) answer, and I haven't read CTM
so I may misunderstand the real concept of the kernel language, but you may be
looking for either the Relational Calculus or Algebra[1][2] or, for less math
and more application, straight up database-independent SQL[3].

These pretty much describe the underlying set of the things you can do with
database systems, and different databases provide different implementations of
them with different tradeoffs and abstractions.

    
    
        [1] https://en.wikipedia.org/wiki/Relational_calculus
        [2] https://en.wikipedia.org/wiki/Relational_algebra
        [3] https://en.wikipedia.org/wiki/SQL

~~~
frik
BeFS (BeOS) and "Cairo"/WinNT4 NTFS (with Object extension) offered a database
API on top of the filesystem. And IBM stored its system files in a database
like filestorage too, in its mainframes.

~~~
protomyth
and the Newton had a db / query API on top of an object store (soup)

------
bhntr3
Van Roy's book is an awesome way to explain computer languages and computing
itself. I'm not sure there's anything so approachable yet mind blowing for
DBs.

I think Foundation of Databases, "The Alice Book", lays out the mathematical
basis well. The relational algebra might be the kernel language. I think
Ullman's books on databases provide more implementation details. Caveat: I
haven't finished either. The Alice book is pretty dense.

I'd describe it this way myself:

A database is a persistent repository of facts about some domain.

The simplest set of facts might be a set of values. You can't do much
interesting with that. This might be main memory or a hard disk.

Beyond that, the facts might be tuples (K, V) where K is some name or key and
V is some unstructured value. Key value stores and filesystems are like this.
You can index K and provide fast lookups

If the tuple has more structure, it might be a relation. Relations might be
described as an ordered tuple of values where each value is selected from some
domain. (Name, Age, HourlyWage) might be a relation where Name comes from
Strings, Age comes from Ints and HourlyWage comes from Floats.

So, we might have a couple facts like: (Bob, 45, 14.50) (Jill, 19, 112.95)
(Jim, 7, 1337.0)

Now, the repository of facts can provide a solver that answers questions. You
define some constraints like HourlyWage > 100.0 AND Age < 10 and it finds the
facts that match the constraints. It's more complicated than that and the
Alice book explains the mathematical foundations well.

And the database remains a repository. You can add data to it. The repository
may enforce the relations and reject facts that don't fit the domain of their
values. Maybe it has uniqueness constraints on certain fields, etc. etc.

Beyond that it's mostly additional features. More complicated relational
operators. Better query abstraction. Optimizations to solve queries more
efficiently (query planning, etc.) Atomicity, Consistency, Isolation and
Durability guarantees. Clustering and networked operations. So on and so forth
in the spirit of CTM.

My intuition is that the mathematical foundation is fairly pure. So even "non-
relational databases" could be viewed as implementing some non-standard form
of the relational model.

------
brudgers
Off the top of my head, I'd say the kernels of database systems are the CAP on
the operational side and ACID on the design side. More abstractly, what really
matters is the _particular_ business logic [what problem are we trying to
solve?] a database models not the implementation details [what code are we
using?].

That said, the bigger abstraction is information storage and retrieval. For
some purposes a printed book on a shelf is most appropriate, and it's helpful
to think of the continuum of storage technologies as including hard copy
simply because it filters out the noise of implementations: key-value stores,
versus document stores, versus relational stores, versus column stores, versus
CPU caches. The what and why and when come before the how and where.

For traditional relational databases, I found _Database Systems: The Complete
Book_ very informative. But my intuition is that RDMS's are increasingly less
likely to be the best choice for a lot of common applications.

Good luck.

------
Animats
I thought this was going to be about why an OS needs a database for its own
data, instead of a collection of flat files without locking or atomic updates.

~~~
bordercases
This is an interesting idea, I wonder if it has any merit.

------
boyaka
Warning: I am neither a skilled nor experienced programmer. I would say the
kernel of programming languages is more practically something like C,
assembly, or machine language. Kernel implies containing the core
functionality, the building blocks. Van Roy's kernel language does seem to
contain core ideas for ideal programming languages, even though it doesn't
seem to translate well to existing hardware / machine language.

After reading a bit about it I started to think of it as pseudocode for
programming languages, and I thought I'd share some results from my searching
based on that idea [1].

"In terms of the kernel language itself, it is in fact its own programming
language, a sort of "runnable pseudocode" that is executable in the Mozart/Oz
platform."

Here's an interesting link I found by searching for database pseudocode [2].
This delves into the topic of "NoSQL", which might be helpful in your database
basics research, but it also covers other database basics.

I am also not very skilled nor experienced with databases, but based on your
request, one database that comes to mind for me is Berkeley DB. It is the
building block for lots of other technologies, and is also considered one of
the origins of the NoSQL concept. Here's a recent relevant discussion that
probably made me think of it [3] (that and it is the basis for a distributed
filesystem I'm studying). Also, check out the Wikipedia for it, and (maybe
obvious to you already) searching site:news.ycombinator.com on Google can get
you a lot more interesting discussions on any topic.

Here's one more random link I found [4], just because it has a lot of
summaries on different types of NoSQL databases.

Again, this is all far from expert advice, which is probably really what you
need, but I thought I'd just share with you what I found in my attempt to
understand this "kernel language" concept and apply it to databases.

[1] [http://michaelrbernste.in/2013/02/23/notes-on-teaching-
with-...](http://michaelrbernste.in/2013/02/23/notes-on-teaching-with-the-
kernel-language-approach.html)

[2] [http://www.jeffknupp.com/blog/2014/09/01/what-is-a-nosql-
dat...](http://www.jeffknupp.com/blog/2014/09/01/what-is-a-nosql-database-
learn-by-writing-one-in-python/)

[3]
[https://news.ycombinator.com/item?id=10148242](https://news.ycombinator.com/item?id=10148242)

[another old one]
[https://news.ycombinator.com/item?id=3613638](https://news.ycombinator.com/item?id=3613638)

[4] [http://bigdata-madesimple.com/a-deep-dive-into-nosql-a-
compl...](http://bigdata-madesimple.com/a-deep-dive-into-nosql-a-complete-
list-of-nosql-databases/)

