
Does the Database Community Have an Identity Crisis? - luu
http://www.bailis.org/blog/how-to-make-fossils-productive-again/
======
gfody
RDBMSs are as powerful as ever and are still improving rapidly. If you look at
"what's new" in Oracle 12c, SQL Server 2016, VoltDB and Vertica; you can see
they're not maintenance mode. I think calling them fossils that should be
burned down to fuel the next thing is way off.

IMO the main thing RDBMSs have failed to do is to capture the new generation
of developers who don't have the desire or the patience to learn SQL,
relational concepts, various modeling techniques, and whatever sharding or
shared nothing architectural fundamentals just to scale out. This generation
wants to "npm install magic-database", write everything in javascript, and
push to the cloud without ever having to think about scaleout. This isn't
necessarily a problem with RDBMSs but RDBMSs could address it.

[1]
[https://docs.oracle.com/database/121/NEWFT/chapter12102.htm](https://docs.oracle.com/database/121/NEWFT/chapter12102.htm)
[2] [https://msdn.microsoft.com/en-
us/library/bb510411.aspx](https://msdn.microsoft.com/en-
us/library/bb510411.aspx) [3]
[https://docs.voltdb.com/ReleaseNotes](https://docs.voltdb.com/ReleaseNotes)
[4] [https://community.dev.hpe.com/t5/Vertica-Blog/What-s-New-
in-...](https://community.dev.hpe.com/t5/Vertica-Blog/What-s-New-in-
Vertica-7-2-2/ba-p/235496)

~~~
Retra
>IMO the main thing RDBMSs have failed to do is to capture the new generation
of developers who don't have the desire or the patience to learn [...]

As one of this 'new generation of developers' (who happens to also work on
industry database internals), I would say that I've been enormously
unimpressed by the amount of specialized knowledge needed to implement an
optimized, portable, flexible, scalable database. Do you know what tables in
your DB need indexes and why? Do you know how to optimally set up your
partitions? Do you know why your one query runs fast for a week and then
suddenly runs slow, and what you can do about it?

The database API is just too low-level for most people's comfort zone. Most
people who want to use a database are just trying to stuff organized bits
somewhere and be able to get them out quickly. Databases are smart, but the
admin still has to learn and learn and learn -- all this esoteric, domain
specific knowledge -- just to store some damn bits.

Learning to use a database feels like learning machine code in that if you
suck at it, you're going to pay the price of not having the best performance.
Databases market themselves as data storage applications, but they are complex
programming APIs at heart. People turn to databases to quickly solve data
storage problems, not because they want to interface with some unsexy, cryptic
API.

~~~
gfody
Knowing what indexes you need and why is far from being some kind of esoteric
specialized knowledge of database internals. That and some basic relational
modeling techniques are the very minimum prerequisites to being able to
leverage an RDBMS competently. Learning these things is not as difficult as
trying to use an RDBMS without learning them.

I can't say I get the comparison to machine code. SQL is probably the highest
level language there is.

~~~
tmptmp
I agree.

>>Learning these things is not as difficult as trying to use an RDBMS without
learning them.

Very aptly put. But I guess mastering SQL is a rather difficult task as SQL is
a very high level language (may be even higher than Haskell minus the great
type system of Haskell). It may be because of such a high level SQL seems
difficult for many people. No wonder, the ORM crap sells so much, they sell
almost impossible dreams (read snake oil) to naive people who don't understand
much about SQL, data modeling, database system and OS.

Also the relational modeling requires a lot of deep thinking to get yourself
an effective schema. The magic art of getting indexes right to a large extent,
relies on your understanding of the particular domain and the basics of
relational modeling, the database internals don't come in picture too much. Of
course, getting enough performance gains and squeezing time/space here and
there requires some knowledge of database internals but when you reach to
those levels of optimizations, NoSQL based solutions may as well require you
to know enough details of their implementations.

Take home lesson I learnt: these things (very high performance, efficiency,
scaling) don't come for free.

~~~
tome
> SQL is a very high level language (may be even higher than Haskell minus the
> great type system of Haskell).

I have a solution for that :)

[https://github.com/tomjaguarpaw/haskell-
opaleye/](https://github.com/tomjaguarpaw/haskell-opaleye/)

~~~
tmptmp
Looks great, will try it out.

------
genericpseudo
The central tension alluded to here is "database people" vs "big data people".
Which is a real dichotomy – as a "big data person" I've got a lot of leverage
out of going "c'mon, we don't need Hadoop, we just need a relational database
here, let's use Postgres", which is an option which gets _culturally_
dismissed rather than on _technical_ grounds. Me, I'm happy that I get to do
less work and look clever while I'm doing it, but...

Marketing and cultural positioning matter. That really isn't news. But when
people have identity invested in denying that, on both sides, it's difficult
to overcome.

~~~
_pmf_
> The central tension alluded to here is "database people" vs "big data
> people".

Then there's the group who think they're big data people, but actually fall
within the domain of SQLite.

~~~
gglitch
Which, in fairness to and out of love for SQLite, is an enormous domain.

~~~
genericpseudo
I may have perpetrated things like this. (20GB working set for a recommender?
No problem.)

------
mwhite
It seems like NoSQL could be left behind except for the .001% of use cases
that actually require it and can't be easily replaced with extensions or
(hopefully) automatable configurations of Postgres, but it would require
application-level abstractions, and the database community doesn't value those
enough, as evidenced by SQLAlchemy [1] not being highlighted on the homepage
of every RDBMS project because of the awesome power and flexibility it gives
the developer.

Specifically, a JSON column should be used to store everything other than
primary keys and foreign keys, and views and indexes should be automatically
created based on the schema defined in the application (i.e., get the schema
from the ORM at deploy time and post the data to a schema/migration management
system) using something like
[https://github.com/mwhite/JSONAlchemy](https://github.com/mwhite/JSONAlchemy)

It is entirely possible to implement the CouchDB or MongoDB API on top of
Postgres JSON, for instance.

[1] [http://www.sqlalchemy.org/](http://www.sqlalchemy.org/)

~~~
oneweekwonder
I have been interested in a couchdb api for postgres. But thus far could not
find anything.

One option is PouchDB, with levelDOWN* to push it to levelUP* to store it in
postgres. But the level of abstraction feels just to much.

I also found another project that basicly keeps a copy in couchdb and sync it
over to pg, but stuff like attachments does not work.

* I'm not 100% sure about the projects and how to accomplish it.

------
escherize
I was hoping this would be about PLace Oriented Programming (PLOP) which
confuses identity with state. It's a problem for OOP in general, where the
identity of objects is predicated on their particular state. The talk
[http://www.infoq.com/presentations/Are-We-There-Yet-Rich-
Hic...](http://www.infoq.com/presentations/Are-We-There-Yet-Rich-Hickey) goes
over the idea in more detail.

~~~
seanmcdirmid
Huh? The identity of an object allows for mutable state, it doesn't change
with its state. I am the same person I was yesterday even if I'm in a
different place or I cut my hair. State is predicated on identity, not the
other way around (and given identity, you can have any kind of mutable state
even in a language that doesn't support mutation, since an identity can be
used as a key in an immutable map, changing the map changes the state where
identity provides for a constant frame of reference).

~~~
aarpmcgee
"I am the same person I was yesterday even if I'm in a different place or I
cut my hair."

Think so?

~~~
seanmcdirmid
My identity is constant even if my state isn't.

------
filereaper
'The article titled “Architecture of Database System” should be considered
harmful'

Not sure why that article is considered harmful, I read through all of it, it
was a fantastic database resource.

~~~
makomk
Traditional database design is considered unfashionable these days. You're
meant to use more modern designs that eschew old-fashioned ideas like SQL and
not causing massive data loss.

------
Dowwie
"And it works on real problems, like combating human trafficking"

well since he put it _that_ way, why not?!

------
sgt101
I have a counter argument.

There is now too much data system innovation - there are scores of projects
each of which implements a particular idea and very few of which have a large
enough community to move beyond version 0.3

We can't put our pipelines onto these because the cost of understanding and
adopting them overwhelms the benefit of ditching our internal code. We are in
the same situation as the NPM / Javascript folks, but our data is an
enterprise asset and we have to consider the impact of a project just going
away.

The vendors have held us to ransom for years and we've had to break out
because it had got to an industry destroying level - really the bills are
_material_ to the stock price and time and again it turns out that despite
handing over $100m the features and machines you need are not included and you
need to send _just_ another $5m over. The money stopped being used for R&D
when the consolidation of the 00's happened. The vendors set themselves up as
vertically integrated solution providers with the technology as a lock in
factor rather than a competitive differentiator. This would have worked if
nothing in the economy ever changed again but oddly it turns out that our
needs are radically different when competing / collaborating with Facebook &
Google vs "old corp who has gone bust now" who were our traditional enemy.

So we're in a fork. The old route of using a trusted technology partner has
gone, they have all betrayed their customers every quarter for the last 20
years, not only can't we trust them but we can absolutely predict when and how
they will screw us. We flag it for every project to our execs, it's a built in
assumption. On the other hand the big hope of opensource alternatives is not
arriving in the way that Linux arrived, or maybe it's arriving in the way that
Linux on the desktop arrived.

The solution is that the middle tier of corporates have to get more real about
opensource. At the moment there's a handful of companies in the Forbes 500 who
are significantly involved. CIO's and CFO's don't see the benefit, there is no
case at the moment, but if there were 500 >$1m opensource programs running in
the corporate world rather than 20 things would be different.

There needs to be a standarization effort and a co-ordination layer. We also
need to think through the value chain as well. At the moment leaving it to the
market and "revenues from professional services" are not working so well, and
I am very cautious about the trend to paid for bits and bobs on top of open
source.

The worst outcome would be to go through all this and find that we are back to
vendor hell but chained up in a different basement wondering where the
bastards in suits have gone. Instead we're going to be guessing that the guy
in a cool t-shirt and designer jeans who's fiddling with a blow-torch is going
to want something soon.

It's up to us to work out a new model, but what?

~~~
sqldba
The problem is anti-competition through bundling. If you're big enough on the
MS side you'll get SQL and all its doodads for free.

Great for my job but an impossible fight to get anything else in there.

~~~
sgt101
Agree - it's a notional form of free that includes your CFO paying through the
nose without realizing it.

The office up strategy is really good for MS. I've always wondered why they
haven't made more of Exchange as a data platform given that they have a near
corporate monopoly on that and there is vast latent value in the data that is
in it.

Or maybe that's why!

------
kempe
RDBMS are great for a lot of situations. Alternatives like NoSql which stands
for Not Only Sql, can be handy for other situations like how easy it is there
to recursively find relations of x^n that are way more annoying to write in
t-sql. Identity crises might be because RDBMS is not the only approach
anymore?

------
throwaway_exer
I read the link.

As a working DBA, it's breathless drivel.

Unless you're an academic trying to get a grant or tenure, of course.

~~~
koverstreet
What a singularly useless response.

Disagreeing is fine, but "breathless drivel"? Come on, this isn't reddit.

------
formula1
Very cool article. Tbh, I think instruction is not nearly as helpful as
action. We sll can complain about what other people should do but wjat can we
do to show how what we do. Nonetheless, I think its an important articke since
databases come and go with few dticking

