

A Brief History of Databases (2014) - benbreen
http://avant.org/media/history-of-databases

======
jasode
This is not a quality article. As a sibling comment already noted, the
author's purple prose[1] is tedious. As for the content, I can't figure out
the logic of the particular technologies he's highlighting.

In between the timeline of RDBMS and NoSQL, he doesn't mention object-oriented
databases (Poet, Gemstone, etc) and he skips the rise of data warehouses
(OLAP, star schemas, etc).

Lastly, NoSQL is driven by at least 2 orthogonal forces. #1) the ease-of-use
API mapping "documents" to programming language objects (e.g. Javascript json)
without using mappers such as ORMs. #2) the horizontal scale out of data nodes
(HBase, Cassandra, etc)

The author is talking about NoSQL in terms of #2 (big data) even though I
suspect that the vast majority of motivations for NoSQL used in github
projects is #1 (no joins is easy programming and I'm not saving terabytes of
data anyway).

One can do horizontal scale out using traditional RDBMS but it's complicated.
That's what Facebook did with MySQL -- they created a custom app layer for
sharding. One can also favor a document oriented programming interface without
aspirations of a horizontal terabyte scale using a NoSQL like MongoDB. A lot
of blog engines programmed in a weekend use MongoDB as a backend because the
programming API is easy and productive.

[1][http://en.wikipedia.org/wiki/Purple_prose](http://en.wikipedia.org/wiki/Purple_prose)

------
Someone1234
That article's writing style isn't for me. It is more like reading poetry than
a history of anything. For example:

> The history of data processing is punctuated with many high water marks of
> data abundance. Each successive wave has been incrementally greater in
> volume, but all are united by the trope that data production exceeds what
> tabulators (whether machine or human) can handle.

I thought it would just be a history, not some philosophical exploration of
databases and what they mean to humanity. If that's what you're into then read
the article, if you want a history then look elsewhere.

~~~
theallan
Any recommendations?

~~~
BlackJack
[http://redbook.cs.berkeley.edu/bib4.html](http://redbook.cs.berkeley.edu/bib4.html)

It's the list of papers in the book 'Readings in Database Systems'. Most are
classics in the fields. Not easy to read, but it sure is fun.

------
kittyfoofoo
“Poor Faulkner. Does he really think big emotions come from big words? He
thinks I don’t know the ten-dollar words. I know them all right. But there are
older and simpler and better words, and those are the ones I use.”

------
temuze
There are so many things wrong with the last sentence of this article:

> "Is anybody actually handling data big enough to merit a change to NoSQL
> architectures?"

1) NoSQL doesn't necessarily mean it automatically scales well. It means what
its name implies - storing data in a non-relational database. There are plenty
of crappy NoSQL databases out there.

2) You can scale relational databases really freaking well if you know what
you're doing. Scalability and "big data" is more about how you use your tools
than choosing the right tools.

~~~
daleharvey
Another mistake that is constantly repeated when people discuss `NoSQL` data
stores.

Performance / Scale is not the only reason to choose an alternative ('NoSQL')
data store, they all have varying feature sets / capabilities and little or
large points of interest which may be perfect or terrible depending on your
use case.

(Author of a datastore which most certainly performs terribly compared to the
equivalent in a SQL store)

~~~
_dark_matter_
It is widely accepted in the Database community that a primary reason people
choose NoSQL data stores (read:Hadoop) is because it's free, compared to the
costly enterprise alternatives. Just get your cluster, download the files,
install, and go.

~~~
daleharvey
Sure thats another reason people may use alternative data stores. Again its
not 'the' reason its one of various reasons.

------
JBiserkov
Ctrl+F("datomic") in article -> 0

Ctrl+F("datomic") in comments -> 0

Fixed: Datomic is a distributed database designed to enable scalable, flexible
and intelligent applications, running on next-generation cloud architectures.
Read more at
[http://www.datomic.com/rationale.html](http://www.datomic.com/rationale.html)

~~~
leetNightshade
What does Datomic have to do with a brief history of databases? Datomic only
came out in 2012. Your comment seems like a shameful plug that has no place
here.

------
hxrts
Hi HN,

Editor-in-chief of avant.org here (& editor of said piece). Very
surprised/pleased to see this link pop up while browsing the front page.

The author had a good deal of additional material that we cut down to form
this brief survey, and I'm sure all of you have some great resources as well.
If so, post them here. I'd love to share with our readers!

We are a (soon to be) non-profit that publishes critical, cross-disciplinary
essays, frequently about science and technology. If that's your thing,
consider finding us on twitter: @avantdotorg

Also, a few other pieces that have cropped up on HN before if you're so
inclined:

• [http://avant.org/media/stealth-
infrastructure](http://avant.org/media/stealth-infrastructure)

• [http://avant.org/media/75k-futures](http://avant.org/media/75k-futures)

Looking forward to your comments! Always a thrill to have one of our pieces
circulate here.

------
alecco
IBM System R (1974) jumping straight to NoSQL. This is ridiculous.

~~~
hxrts
What do you think are some of the important milestones in between? We've
gotten similar criticism before but it would be helpful to fill in those gaps.
The piece was edited for length, hence the 'brief,' and we knew there would be
quite a bit missing.

~~~
alecco
Oh, nothing. Well, maybe just that little thing, indexes.

But who needs indexes when we have MapReduce and Mongo. That is awesome and
makes everything fast.

------
astine
Completely skipped graph and object oriented databases. Makes it sound like
the NoSQL folks invented distributed data-stores.

The fact is that most of the innovations that happend during NoSQL already
existed and really deserve more attention than, "Oh, and NoSQL happened."

------
dnfehren
A similar, though more scholarly, work
[http://scholarworks.umass.edu/cpo/vol1/iss1/4/](http://scholarworks.umass.edu/cpo/vol1/iss1/4/)

------
gjmulhol
The key thing to remember is that MongoDB is web scale. MySQL is not web
scale.

------
sixdimensional
Unfortunately, because of the article's brevity, I find the article attempts
to (but ultimately fails to) emphasize the details/reasons behind a key point
enough. It might be this article was written for a more general audience.

The key point I'm referring to is that many of the developments of modern
databases are really expansions on ideas that were invented back in the 1960's
but have been re-invented in the distributed world, to handle higher data
volumes and advances in hardware/software capabilities.

For example, the article overlooks (and most seem to overlook) a critical
technology known as multivalue databases, which are mid-1960's technology,
predate relational systems, and share a lot in common with more recent NoSQL
systems in terms of the persistence paradigms. The main difference is that
multivalue databases were centralized systems and not distributed - but data
volumes were a lot smaller back then compared to the volumes we have now.

I bet most would be surprised to know that Pick/D3, a legacy multivalue
database, was an integrated operating system, database and programming
language, and that it contained a "SQL" like query language (without joins,
aggregates, etc. - very simplistic) as well as attempts at natural language
query-like utilities. Pick/D3, in particular, lost out to the relational
players in the 70s and beyond for a multitude of reasons. That said, it and
technology like it (MUMPs, for example) are still alive and running systems
today. Intersystems Cache, for example had interesting concepts that mixed
object-relational mappings and SQL queries on top of a system that started as
something resembling a persistent multi-dimensional array, MUMPs.

In the context of the history that I understand, everything is an evolution in
thinking for data storage, data volumes, persistence, networking, data access
techniques, etc.

NoSQL, for instance, is a terrible misnomer - for many such systems suffered
in adoption because of a lack of a good general purpose query language
(although many now provide their own forms of query languages to make working
with data they store easier). That said, having a structure which is non-
relational in form has never meant you could not use SQL with it - just that
some of the relational constructs do not fit as easily into those realms.

We need not look further than Hadoop to understand how relational paradigms
can be extended to any form of data/system with the right combination of
technology components (think Hive, Spark, etc. etc).

We are in a world now where we have a multitude of data engine paradigms,
where no one model truly and completely dominates the other - they are
complementary and each have strengths and weaknesses. And we continue to get
closer and closer to combinations of techniques that allow for general
purpose, distributed by default, data systems/processing/storage.

