
Databases have failed the web - ash
https://josephg.com/blog/databases-have-failed-the-web/
======
agentultra
I serve many, many millions of rows, collect real-time statistics, push real-
time updates, and maintain data integrity and consistency... all from a single
database server. If our workloads require it we're prepared to scale out
horizontally. I'm really looking forward to PostgreSQL 10's new parallel query
features for some of our analytics work.

Stack Overflow runs everything across what, 4 MS SQL Servers in total?

How has that "failed the web?"

~~~
dpark
The argument is not that DBs failed the web for scalability, but that they
failed the web for features. I don't fully agree but it is an interesting
argument.

DBs have auth systems already, but we mostly bypass those and reimplement auth
in the stateless layer. DBs have programming capabilities but we mostly bypass
that and build business logic in the stateless layer. DBs are great at storing
and maintaining relations and relationships but much of the other
functionality is little-used.

~~~
okket
> DBs have auth systems already, but we mostly bypass those and reimplement
> auth in the stateless layer.

This is sane and good design. The db auth layer is for db management, not
frontend auth (are you seriously advocating this?).

> DBs have programming capabilities but we mostly bypass that and build
> business logic in the stateless layer.

This is just (initial) design / programmer laziness. I bet if you start to
scale up, you will think a lot about moving your business logic further to the
real data for safety and for performance reasons. This is the point when you
realise that db systems are very different and switching from one to another
is not that easy...

~~~
fragmede
> The db auth layer is for db management, not frontend auth

Yes, but _why_? The database already has a table of database users, and the
different permissions they're allowed. (Well, hopefully your webapp doesn't
just do everything as the root DB user with * privs.) Is there that much to be
gained from the database having a _second_ table of users that you've created,
and then you get to reimplement a bunch of logic in your app server on top of
the database so that you have... users and permissions?

~~~
taeric
You gain making users of your system a first class and distinct part of your
application from those that maintain it.

Or, would you rather swap a regular user entry into and away from a
maintenance role when you hire/fire someone? (Certainly doable, but seems
extreme.)

You also gain the ability to severely lock down destructive permissions to the
database behind fairly rigid authentication rules.

~~~
dpark
> _Or, would you rather swap a regular user entry into and away from a
> maintenance role when you hire /fire someone? (Certainly doable, but seems
> extreme.)_

How is this more extreme or more cumbersome than deleting an admin account?

For that matter, you could also just give admins a separate admin account even
if they are also users. When I worked in "IT support", I had a separate admin
account even though I had a user account in the same system.

~~~
taeric
You probably already have a separate system that is regionally and logically
separated from your application to control access to company resources.

The argument seems to be that the logical separation is not desired. But that
implies that if I were an employee of Google, I should have a specially
flagged Google account. And if I were to leave, I would lose that flag. Which
means it should probably not be a simple flag, anymore, but a date gated
property which may go off an on several times.

And already, that sounds more complicated than just keeping your application
accounts distinct and completely separate from your organizations accounts.

~~~
dpark
> _You probably already have a separate system that is regionally and
> logically separated from your application to control access to company
> resources._

The separate system your company's employees are tracked is is also likely not
integrated with your app database's auth system. I kind of doubt that the
various Postgres/MySQL/Whatever databases at Google have accounts in them for
all of Google's employees who might need access. Instead they build more
tooling on top of the same accounts the app uses to access the DB. Or maybe
they create JIT accounts for admin functionality. As it stands today, though,
most places don't use the DB auth at all really.

> _But that implies that if I were an employee of Google, I should have a
> specially flagged Google account._

No. You have two accounts if you work at Google. Your personal account is not
your corp account. That doesn't mean your corp account doesn't exist in the
same systems as your personal account, though.

> _And already, that sounds more complicated than just keeping your
> application accounts distinct and completely separate from your
> organizations accounts._

Maintaining two entirely separate auth systems does not sound less complicated
to me.

~~~
taeric
I wasn't planning on really diving on this. However, two scenarios easily come
to mind. Either selling a product to another company, or getting acquired. In
either case, it would be nice to not have to merge two identity management
systems just because corporate structure changed.

~~~
dpark
You're arguing for a separation of customer and admin accounts. These
scenarios are both _easier_ if admin and user are rolled together. If
administration is tied into your corporate user system, then transferring a
product means it's a pain to enable administration in the destination
environment.

What you're actually arguing for is keeping app auth separate from corp auth.
And that's pretty reasonable, but app admin is app auth, and must move with
the app.

More generally, though, anyone buying your app/company will want it integrated
with whatever auth they already have anyway, so you can look forward to a big
user migration regardless.

~~~
taeric
Apologies, I was always arguing for a separation of corporate and customer
accounts. My assertion is essentially that all administration of the
application is done by corporate users, not application users.

Now, it is not uncommon to actually do this in three parts. A set of users
that covers administration. Often made by tools that guarantee corporate level
access to these accounts. And then a separate system built on top of them that
does the application.

If you have truly "admin" users in the system; then yes, they should be
separate from corp. And should be a flaggable part of a normal user's account.
Mixing these with who can create the tables that are required for the system
to run seems awkward, at best.

Consider, why not support all users being able to just login to the systems
that run the application? It is ultimately the same argument, no?

~~~
dpark
> _Consider, why not support all users being able to just login to the systems
> that run the application? It is ultimately the same argument, no?_

I don't know what this means.

I don't think there's a good argument for having multiple auth systems in
general, much less three. It's a lot of redundancy for little reason. It ends
up complicating every app as they end up needing to support multiple systems
to support both end users and administrators.

I wouldn't be surprised to find that Google has a single auth system for all
users, including employees, and that they've wired up their databases to honor
the single auth source.

~~~
taeric
My point is this doesn't end at the database. So, you can successfully let the
database manage users and access permissions to the data in the database. But
then, why not let the operating system that the database is running on control
that?

And how is that a different argument than what you are proposing here?

~~~
dpark
All I was saying is that we already have auth systems and don't use them. The
DB has auth. As you noted, your OS has auth. And yet we insist on
reimplementing auth for our users again and again (often with terrible flaws).
It does feel like something is wrong with this.

------
pbnjay
I really disagree with this - separation of concerns is incredibly important.
Why does the database server need to do everything my application server does
now?

I don't want to have to know about how my database works internally just to
implement a new feature in my application. I don't want to worry about a
junior dev corrupting data while building a login page.

My "simple old-school" database is reliable and consistent BECAUSE it's been
tested for years. My application server is not because it's solving a new
problem (And that's OK, I won't lose customer data if someone can't login).

~~~
dom0
Fully agreed. The observation that SQL still works as well as it does today as
it did in the 70s suggests that it's a good and valueable abstraction.

~~~
geon
> SQL still works as well as it does today

Or... It sucks as badly now as it did back then.

~~~
dasil003
Well, no. There were a zillion databases competing and SQL is what survived.
The NoSQL movement has created a bunch of useful DBs for specific scaling
workloads, and of course there are graph DBs and other special purpose DBs,
but SQL still dominates, and with good reason: it's the right balance of
performance, flexibility, structure and integrity that makes it an excellent
hedge for the early phases of application development when you're not yet sure
what exactly the workload is going to look like.

~~~
geon
HTML and CSS "won" on the web. No one in their right mind would call them good
or even suitable for their main use case.

~~~
dasil003
Of course they are. Most things on the web are documents. The fact that it is
abused for SPAs means that yes they are contorted and abused, but that is
inevitable because Gall's Law.

It's easy sit in an armchair and point out all the warts on specific tech, but
if you were to build a better solution to broad technology segments like this,
and magically get adoption, you'd find that you only understood 5% of the
application space, and your better version is subject to whole swaths of
criticism that you didn't anticipate. Better to find one small/simple thing
you can do well and build from there.

------
thehardsphere
The conceit of this article seems to be that all web applications are merely
CRUD apps that do nothing but talk to the database, thus databses are "failing
the web" for... not being more than just a database.

I see a number of problems with this:

1\. Not every web app is merely CRUD. Some of us work on apps that do quite a
bit more than CRUD, and the CRUD part is relatively small and boring.

2\. There's very little respect here for the idea that good utilities should
do one thing and do that thing very well. Lots of people need relational
databases. Not everybody needs a relational database that can automagically
filter out XSS attacks and serve HTTP responses.

~~~
josephg
Author here.

I think CRUD apps are the most boring example. But think about something like
Slack. There's some data and queries, sure - but the app also needs realtime
feeds. It needs to have triggers running notification tasks off changes made.
There's lots of stuff that need to happen _as a result_ of data changes.

Doing all that stuff purely at the application layer is an architectural mess.
What you really want is an event log of messages, with workers and search
indexes listening on _that_. Maybe later you'll want to add full text search
via elasticsearch - and that needs to be kept in sync via the event log as
well.

Rolling your own version of all this stuff correctly is really hard. Either
you do it inside postgres using triggers and stored procs, or you use kafka
and invent your own complicated system to handle conflicting writes. Or you
add hooks in all the places in your app where _you_ do writes, but that
introduces its own set of consistency problems. Samza, Datomic, Bottled Water,
Couchdb and firebase all solve at least some of this stuff better than classic
SQL stores. Its time for relational dbs to get on board.

Being no more than a database isn't good enough anymore.

~~~
cookiecaper
Your complaint is that SQL databases aren't message brokers? SQL servers have
been abused as message brokers for decades. It's not the use case they're
designed for, and while a plugin that integrates this type of functionality
into PgSQL would be interesting, there's no reason to expect them to assume
that problem space.

If you want the database to handle the incoming message streams, then set your
consumers to call a stored procedure. That should alleviate your concern with
conflicting data models.

------
masklinn
> Access control on modern databases is too course. You want to whitelist
> which queries a user is allowed to make and you want fine-grained
> permissions around updates

Only allow direct access to stored procs, not queries. Or restrict access to
specific _views_ and use rules
([https://www.postgresql.org/docs/current/static/sql-
createrul...](https://www.postgresql.org/docs/current/static/sql-
createrule.html)) but intuitively that seems more dangerous (with CTE, I
believe SQL is turing-complete) and completely unnecessary.

> Databases only talk custom binary TCP protocols, not HTTP. Not REST. Not
> websockets. So you need something to translate between how the server works
> and how the browser works.

[https://postgrest.com/](https://postgrest.com/)

> You want to write complex logic for user actions

[https://www.postgresql.org/docs/current/static/plpgsql-
struc...](https://www.postgresql.org/docs/current/static/plpgsql-
structure.html)

> with custom on-save triggers

[https://www.postgresql.org/docs/current/static/plpgsql-
trigg...](https://www.postgresql.org/docs/current/static/plpgsql-trigger.html)

> and data validation logic.

[https://www.postgresql.org/docs/current/static/ddl-
constrain...](https://www.postgresql.org/docs/current/static/ddl-
constraints.html)

~~~
spacemanmatt
I gotta plug [http://openresty.org/](http://openresty.org/) whenever I see
postgrest mentioned. OpenRESTy is built on nginx, and is fairly battle-tested.

------
shanemhansen
I think databases have been so wildly successful because they were actually
based on some reasonable research about data access models and set theory. I
encourage anyone who wants to criticize the relational model to start with
E.F. Codd's "A relational model of data for large shared data banks".

------
sbuttgereit
The PDP-11 were considered minicomputers, not mainframes, like the IBM
System/360\. The notion of midrange computing still exists with IBM's iSeries
(AS/400), as well mainframes like IBM's zSeries. One could argue that some of
these definitions get mushy, but I haven't ever heard disagreements about
where the PDPs, AS/400s, and System/360s of the world live in this hierarchy.

This seeming error was an early red flag in this article; having some good
facts but having an incomplete picture. While some of the historical
observations of what computing was is warranted, this same sort of not having
a complete enough picture to draw correct conclusions shows up again in some
of the main theses presented.

Consider the statement about the coarseness of permissions. In many modern
database systems this simply isn't true. What is true is that is the fully set
of security features offered database systems are 1) not widely understood; 2)
not bothered with by developers that choose to implement security elsewhere.

Clearly the author spent some time with article and it is well structured,
it's simply a matter of the author not having spent as much time getting to
the heart of the matter as he/she should have.

~~~
majewsky
> What is true is that is the fully set of security features offered database
> systems are 1) not widely understood; 2) not bothered with by developers
> that choose to implement security elsewhere.

That statement applies to the full feature set of modern RDBMSes, not just the
security features in particular.

I'm always surprised that I seem to be about the only one (or at least one of
very few) on my devops team of ~30 who's comfortable with issuing SELECT
statements on the production DB to investigate issues (and sometimes UPDATEs
to clean up a mess that users made). And that's not even touching all the
crazy stuff (stored procedures, triggers, access permissions), just CRUD (or
in this case, INSERT-SELECT-UPDATE-DELETE).

------
jandrewrogers
In the mid-1990s, some of the most complex web applications _were_ implemented
inside the database! I was the technical lead for a production web app written
in... 250,000 lines of PL/SQL. This model had some significant advantages and
worked surprisingly well considering that Oracle was not designed with that
(ab)use case in mind.

So why didn't this model become common given that it was relatively elegant
and capable? A few reasons:

\- It required developers to be sophisticated at both using the database and
creating web front-ends, since they were inextricably mixed. Even today, most
developers are strictly one or the other, not both.

\- The tooling inside the database was not designed for this use case, so
while the architectural model was elegant, the development environment was
piggybacking on functionality designed for reporting systems to drive
interactive websites. This got better with time but by then no one cared. For
a minimal website, hacking together a couple Perl scripts had a lower learning
curve but was less capable.

\- At the time, only a couple databases had the level of sophistication and
features to make this feasible. Like Oracle. The upfront licensing costs were
outrageously high, so there was no cheap way to bootstrap or incrementally
grow into your application.

A 2017 version of this would work very well, database engines have much more
sophisticated capabilities than back then that would make the development and
operations experience pretty efficient and nice. As a practical matter no one
designs web apps this way any more so there is no market for it.

~~~
josephg
> As a practical matter no one designs web apps this way any more so there is
> no market for it.

Well, the problem is that I'd say PL/SQL isn't a good choice of language to
implement a web templating engine. And the web framework selection is ...
lacking. And its hard to hire engineers who know PL/SQL.

But even if the database has a native JS engine or something there's also a
separation of concerns problem here. But if the database simply exposed an
event stream and versioning information from queries, we could write our
applications in whatever language we want. They can listen on the database's
event stream and invalidate caches or eagerly re-render the app as data
changes. And if you have a kafka-style event log they can live on other
machines and run in separate processes.

From any angle I think the samza/datomic model is simply a much nicer
architecture to do this sort of thing. More ergonomic than the pl/sql
approach, and certainly much nicer than what we have now using databases as
dumb stores.

------
Spooky23
I think the author failed to educate himself.

Databases are a miracle product. If you think of an application as a car, the
database is the engine.

The idea that you have a platform that can do everything without the
abstraction of a separate data storage/query platform, that exists too. I'd
argue that FileMaker, Lisp, MUMPS, and a few others basically do this in
different ways. I used to be a DBA at a company where the entire company ran
on Informix 4GL code (which was sort of like the Informix version of PL/SQL)
within the database. Also a similar approach.

But... they also have significant drawbacks. You're permamently married to
that app/database stack. If any component of the system doesn't scale...
you're fucked.

By chunking out the solutions to include databases, app tiers, etc, you gain
complexity but lose a lot of risk. If you cannot afford Oracle anymore, you
can invest in labor to move to Postgres. If you're hitting a limitation with
MySQL, you can move to Oracle. If you wrote your app in PHP, it goes viral,
and you cannot scale it, you can migrate to a Java Application Server layer.

~~~
josephg
Post author here.

> Databases are a miracle product. If you think of an application as a car,
> the database is the engine.

Yeah - you're totally right. Thats kind of why I'm complaining about them -
because we use them so much I think the potential benefits from improving more
them are massive.

The modern web stack is a mess, and changes all the time. But there's a
handful of features we need in just about every application - login, update
feeds, data-based rendering (computed views), etc. But stored procs aren't
good enough - they're inaccessible, hard to debug and usually require their
own (non-standard, barely known) language to develop in. So lots of developers
write functions outside of the database through multiple queries, and in doing
so will often break the database's transactional consistency model.

Databases have managed to stick around with us throughout the evolution of the
web. I'm being critical, but I'm doing it out of love. I'm saying that
databases have the potential to provide so much more to the modern web
ecosystem than just data storage. They could do access control, and run stored
proc-equivalent functions or computed views from whatever languages my app is
written in. If I need some complicated data flow where data needs to be
updated in realtime in a secondary db (like elasticsearch), that should be
dead simple to do too. I didn't get into it in that article but I believe we
should be able to extend our databases to share data flow logic between
applications. Or just build a lot of this stuff into databases so its reusable
between projects.

------
ericHosick
> All because we're programming against a frozen database spec.

Relational databases, unlike XML, JSON, Key/Value stores and ORMs, do not pre-
suppose document structure. On top of that, it is very easy to create new
relations (entities) using Views. On top of that, you get a real algebra to
play with: relational algebra.

SQL makes it crazy easy, in real time, to see your data in any hierarchal
manner you like (via denormalized entities).

The one thing SQL 'lacks', and JSON shines at, is a way to return data in a
hierarchical format (aka: to return JSON directly). I have 'lacks' in quotes
because there are SQL solutions that can consume/spit out JSON.

An interesting idea then is to provide a way to easily convert between SQL and
JSON. To that end, there is an open source project
[https://github.com/erichosick/sql-json](https://github.com/erichosick/sql-
json) that attempts this. The results are promising but there is a lot of room
to grow.

~~~
josephg
Hi, post author here.

My point about the frozen spec doesn't have anything to do with the data
format. Its that modern databases don't expose the API that modern web
applications actually want, like:

\- Integrated access control

\- Realtime updates (sorta, kinda)

\- Resumable event logs

\- The ability to do computed views in _my language_ (eg, make a computed view
using a nodejs react static rendering process)

Because databases lack these features, modern web developers often end up
rewriting them at the application level, for every app.

------
cookiecaper
I don't really understand his point. He says that SQL has been around for 40+
years, as if that's a _bad_ thing, and doesn't really say anything else.

I guess he is complaining about the idea that the user's input doesn't get
insert _directly_ into the database, i.e., the connection would be
browser->db, and instead we need an application component that reads from the
browser and shapes to the DB.

At this point, we must ask if the author has ever worked with one of the
thousands of applications that implements their "API" through stored
functions. I won't condemn this method wholesale, but people have mostly moved
on to more flexible representations for good reason.

The whole article seems predicated on a belief that most work should be on the
consumer workstation, which is why he starts by talking about how much better
it is that people now have desktop computers instead of clients that connect
to a mainframe in the basement.

The author apparently doesn't grasp that the web acts just like those old
terminal clients he refers to in beginning of the article. There's a big
server running the application in someone's datacenter, and _your browser_ is
a thin client over the top of it, an interface into its inner workings.

He blames this on databases. (???) The browser doesn't provide the mechanisms
to directly connect to arbitrary protocols.

I don't really think there's much substance here.

------
ElatedOwl
I'm admittedly bias, I love SQL.

That said, I think having the database server worry about being a database is
a good thing. In my career I've had a few projects that I've been too
ambitious with; by trying to do too many things it failed to do any of them
well.

>Databases only talk custom binary TCP protocols, not HTTP. Not REST.

Let's pretend that the database can now talk via HTTP/Rest. Is the database
now responsible for handling business rules? Is it responsible for per row
authorization/authentication? How does this impact performance? What if we
want to export the data in another format, say into an excel spreadsheet,
should it be responsible for that as well and the formatting? Where is the
line drawn?

>protect against SQL injection attacks.

I mean, how would the database know the difference between a legitimate
request that should be allowed and one that shouldn't? This is the point of
parameterization.

>Check for XSS

I think it's plenty legitimate for a database to return some HTML data, how
would the database know when it's malicious or not?

\-------------------

In full I think the grievances the author raises are with middleware, not a
problem of the database.

------
cr0sh
I haven't read all the comments here yet, but I'm going to throw this out
anyway...

My confidence in the author took a hit at the point he called a PDP-11 a
"mainframe". One would think a computer scientist would know what computers
fall into what "generation". I guess "history of computation" is just not
taught as part of such a degree anymore...?

But...I decided to read on, thinking maybe things would get better, and to
give him the benefit of doubt. I think somewhere in there was maybe a few
points to think about, but ultimately it almost looks like he has some weird
problems with "separation of concerns", and maybe doesn't understand why that
would be a bad thing for scaling...

...which again, I find odd for a computer scientist.

Furthermore, he seems to ignore the great amount of improvements and changes
which have occurred in the database software/engine and server world; today's
DBs and DB systems are -nothing- like they were back in the early 90s when I
started my career (as a fresh high-school graduate). Yeah, we still used
VT-100 terminals (then quickly transitioned to PCs - running VT-100 terminal
emulators, of course), and things were starting to transition to PC apps
communicating to the servers - and I am sure there were SQL injection issues
(and no, we didn't think about that) - but things have, over the decades
(yeesh - getting old here!) have changed for the better!

Could they be better? Certainly! Are there things the DB server could be doing
to make life easier for the app? Yes (and some of this has been implemented -
ie, when developed properly, your queries can be "sanitized" at the DB server
level - but you know, you should still do this at the app and browser level
too - just in case). Realtime updates and notifications? That's pretty much
there as well - but ultimately, a lot still has to be done at other levels.

...and that's not a bad thing, imho.

------
PaulHoule
It isn't that "everybody emulates the VT-100 for some reason" but that the
VT-100 was the first terminal to support the ANSI standard for control codes
and that that standard has been evolving ever since.

~~~
takeda
Yep, I also wanted to add that if you use Mac, or Unix you most likely don't
use VT-100.

Yes, they all support VT-100, but that's because it became a standard, and
essentially any device understands it. For example if you connect to a network
switch you know that it will at least support VT-100 so you have a common
protocol to communicate with it, and be able to switch to a modern protocol if
the device supports it.

I feel like the author is a front end developer and just hates the fact that
JavaScript in the browser can't communicate with a database server directly.

I don't understand though, if there is a need for something like that, and it
is also possible that database could provide a generic enough interface that
would work, then there must be an application that you install together with
the database which would do exactly what he wants.

There's no need to add this functionality to a database, because it only adds
a complexity that majority of people who use a database won't need.

~~~
josephg
> I feel like the author is a front end developer and just hates the fact that
> JavaScript in the browser can't communicate with a database server directly.

Author here. I hate the fact that here in 2017 we still write login code,
again, for every application. I hate the fact that if I want to trigger re-
renders, or run workers off database changes all my options suck.

> it is also possible that database could provide a generic enough interface
> that would work

Exactly. It doesn't have to suck - our databases could provide a nice
interface to do all this stuff cleanly instead. We just need a resumable event
stream, with versions and some other API tweaks and we can do computed views
out-of-process of the database, written in whatever language _I_ want. I'm
firmly of the opinion that all databases should provide a Samza / Storm /
Kafka / etc style event log.

~~~
takeda
If the problem was as simple you make it, you could write your backend code
only once, then re-use it in every project.

Why you are expecting a relational database which has many more use cases than
the web programming (and web is not even a primary one) to solve this
difficult task for small use cases?

Anyway, there were attempts in solving this and those were called frameworks,
which shows that hardly there's a one size fit all. If you use a framework and
have complex project you reach a point where the framework will just stay in
your way.

> Exactly. It doesn't have to suck - our databases could provide a nice
> interface to do all this stuff cleanly instead. We just need a resumable
> event stream, with versions and some other API tweaks and we can do computed
> views out-of-process of the database, written in whatever language I want.

Not really, a database as name suggests is meant for storing data, it's an
already a difficult task by itself (if you want durability and still make it
fast). What you need is a backend code that provides this functionality.

> I'm firmly of the opinion that all databases should provide a Samza / Storm
> / Kafka / etc style event log.

All the technologies you described are stream processors, while they work on
data, they operate completely different than a database and they have
different use cases. Postgres does have LISTEN/NOTIFY but that's meant more
like a side channel for communication.

Perhaps what you need is AQMP?

------
obstinate
To be honest, this article seems pretty insubstantial. To the best of my
ability to discern, the only concrete complaint is that access control is too
coarse on modern databases, although it's not really specified in what way
this is true.

~~~
nileshtrivedi
Agree. Postgres now has row-level security and one could always apply custom
access control using functions/triggers.

------
kuschku
This is an interesting article.

And he gets a lot of things right – having a way to do queries against a
database directly, being able to get changes directly is useful.

And GraphQL, for example, offers exactly that – there’s even a plugin to
directly build a GraphQL API from your postgresql database. Although this
still can’t send changes over the net, so you can’t have an always up-to-date
view of the database.

But while this and Firebase¹ solve the problem of offering an API directly for
a database, he’s missing to address the other task frontend servers do: They
can render stuff on the server, in case you actually do have just a dumb
terminal.

And that’s something that’s very useful for websites, as usually servers are
more powerful than smartphones, and you need to do your computations
somewhere. A web service applying deep dream to an image can’t run it in the
phone, nor in the database – it has to run it on a specialized server

But it’s indeed a good question why there’s little academic research into
changing the way web applications work. We’re already making databases
directly open via GraphQL APIs, we’re separating statically hosted content
into CDNs, so how can we combine this, and work truly "serverless"²?

[1]: Firebase is a great tool for prototyping your app, but if you want to run
it it’s usually too expensive, and relying on proprietary Backend-as-a-Service
technology has proven to be a bad idea already when parse.io shut down.

[2]: Serverless here meaning that you have no specialized application server –
you have a general database able to handle all your applications, a general
CDN, and all special code for the app is handled within the database, or
triggers of it.

~~~
eriknstr
>there’s even a plugin to directly build a GraphQL API from your postgresql
database

Would like a link for this.

~~~
brlewis
[https://github.com/postgraphql/postgraphql](https://github.com/postgraphql/postgraphql)

------
snuxoll
The only part of this I agree with is his comment on database permissions.

Every modern SQL database has a concept of users and permissions that are
divorced from your application, you're left with three options all of which
are flawed.

1\. Handle security inside your application. This is the worst choice if users
need to get a LIST of records they have access to and it's determined by
something more than a simple WHERE owner_id = :user_id. Think multi-tenant
applications where records can belong to a tenant, and users have access based
on a org hierarchy or other criteria. Suddenly you're having to filter a whole
list of records out in your application code, and this makes implementing
pagination awful (do you requery until you fill up a page, or present a
partial page?). You are also taking full responsibility for security, if you
modify your queries to filter records out you open up to human error where
someone forgets to filter this one query.

2\. You implement a method to synchronize your application users with the
database, and use the database engines RLS support to handle access control.
This is probably the best approach for web applications, but the caveats that
come from it still suck. You have to make sure the connection is set as the
user performing the action, this is doable with PostgreSQL, MSSQL and Oracle
at the least (SET ROLE / EXECUTE AS) without destroying your ability to use a
single connection pool - but for all the security you get out of this your web
application user still has all the keys and if that account is compromised or
there is a flaw that can cause your application to not switch security
contexts you just failed at protecting your data.

3\. Just use the database directly, it's handles authentication and you never
change security contexts for a session - you can safely utilize the RLS
functions of your database without any real risk since that database session
is fixed to the user it is handling. Downside, you just lost connection
pooling and while pooling middleware like pgPool can help you still have Y
more connections since you need to maintain a separate one for each user or
tenant at best.

Approach 2 is by far the best we have, and you can make it safer by doing
things like utilizing pg_hba.conf to limit access to the application user to
the servers hosting your applications - but maybe you're using docker and IP
addresses aren't fixed anymore (well, shit!). Also, how are you going to
ensure the database connection is in the correct state when you make a query?
Where are you going to plug that into your request pipeline?

I'd really like to see modern tooling around this problem, I don't know what
exactly it would look like but it would be nice to have SOME improvement in
the area.

~~~
mixmastamyk
Am looking at building a multitenant app with pg, so thanks for this comment.
Know of any resources that discuss these patterns at length?

~~~
snuxoll
A lot of it is technology stack dependent, unfortunately.

Option 1 can be implemented when using an ORM with decent support for
something like Active Record scopes, with LINQ you could have a method on your
model that returns a pre-filtered IQueyrable, with JPA you'd use the criteria
API. If you're doing hand written SQL then have fun adding it to all of your
queries or using some query builder to add the security filters. Either way,
this approach is prone to human error.

Option 2 is a little simpler, assuming you understand the RLS functionality of
your database. At the start of your HTTP request before you touch the database
you SET ROLE to your tenant or user (you'll have to write some code to create
these database users), and in some hook that is called when a connection is
returned to your pool you'll RESET ROLE.

Personally, I've had a hard time finding anything decent on handling security
in multi-tenant applications or where security goes beyond simple ownership
checks. Maybe I'll get around to writing a series of articles on my blog
someday.

------
mybrid
Follow the money. The notion that databases have failed the web was put to
Mike Stonebraker of Ingres fame at a seminar I attended back in the late
1990s. The actual question was this: "Why aren't database vendors building
object oriented databases for the web?"

Object oriented databases store binary objects as opposed to object-relational
databases Mike pioneered with Postgres. The plan was for CORBA, common object
request broker architecture, to shuffle versioned objects around the web and
object oriented databases would store binary code objects.

Mike's answer? "Because the money is in transaction processing. Banks pay
millions of dollars for transaciton systems. Transactions are the meat-and-
potatoes of database sales."

In some sense database companies are wise for ignoring the web. Web companies
are now hell bent on "move fast and break things." A transaction database is
the antithesis of move fast and break. Transactions move deliberately with
consistency. Durability says I can unplug my database at any time and bring it
back up. This is the opposite mind set of move fast and break things.

Here in the valley whole QA departments are being disbanded in favor of LEAN
and move fast and break things. All of this is antithetical to the database
world of transaction processing.

At they say: you get what you pay for. What the enterprise companies are
paying for is exactly what is being delivered. Open Source fits the every
changing, move fast and break things of the web world.

To whit, you might see database vendors move into the web space some day if in
fact companies are willing to pay millions of dollars for them.

~~~
thinkling
Your argument may hold for Oracle and IBM, but the web crowd produced MySQL
and Postgres, investing effectively millions of dollars of developer salaries
and sweat equity, and those systems have evolved more towards features already
in the commercial RDBMSes than towards a new, web-oriented DB flavor.
Apparently web backend developers said "we want more of Oracles features" and
not "we want the DB to handle these web-specific issues". At least, not until
NoSQL systems like Mongo came around.

------
carsongross
Look, kids. You are going to have to have an execution layer you can trust
somewhere, you can't just expose your data store directly to the outside world
or you run into an infinite number of security issues[1]. (Not that this is
stopping people from doing exactly this these days.)

That somewhere is going to be a server side execution environment, and it will
be separate from your data store so you can scale these concerns
independently. Additionally, you are probably going to want a DSL for your
data access as well as a highly tuned indexing system for your data store.

We have systems that do this, they are called databases. Are they perfect? No.
Nothing is. But the idea that they have "failed the web" is so over-the-top
childish that, like the author says at the end of his post, I'm not inclined
to be charitable.

[1] - [http://intercoolerjs.org/2016/02/17/api-churn-vs-
security.ht...](http://intercoolerjs.org/2016/02/17/api-churn-vs-
security.html)

~~~
dragonwriter
> Look, kids. You are going to have to have an execution layer you can trust
> somewhere, you can't just expose your data store directly to the outside
> world or you run into an infinite number of security issues

The security issues your source points to in new, SQL-like (but not SQL)
general query APIs (GraphQL, etc.) are actually already fairly completely
addressed in modern relational databases, which are designed for multiuser
access with users having different permissions. So exposing the datastore
directly solves, rather than causes, the issues it raises with common
generalized near-SQL query APIs.

~~~
carsongross
Yeah, no.

You can't expose generalized query functionality without either encoding the
security constraints in the datastore query layer (which is really, really
hard, we are talking column-level security constraints and worse) or limiting
the expressiveness of your query layer.

That's the trade off and, in most real world systems, the data stores are wide
open to the developer because expressivity wins. Yes, the technology is there
to secure databases at the row level, for example, but people don't use it.
They just use a user/pass with total access.

A lot of folks are tossing HTTP endpoints on data stores without thinking
through the security/expressiveness tradeoff.

~~~
dragonwriter
> You can't expose generalized query functionality without either encoding the
> security constraints in the datastore query layer (which is really, really
> hard, we are talking column-level security constraints and worse) or
> limiting the expressiveness of your query layer.

Column and even row-level security constraints are not "really, really hard"
to use in modern DBs, and you have to do the analysis of what people should be
able to access and effect on that level to build the app, no matter whether
you are implementing the constraints through a separate application layer or
through the DB itself.

~~~
carsongross
A user can only see the salaries of all his direct reports.

Again, there's a reason why almost every web app built in the last twenty
years just used user/pass credentials and didn't implement db security beyond
that: expressiveness.

But I'm not gonna change your mind on it today. Just think about it for a
while.

~~~
dragonwriter
> A user can only see the salaries of all his direct reports.

With an employee-salary table, that's a fairly simple row-level security rule.

> Again, there's a reason why almost every web app built in the last twenty
> years just used user/pass credentials and didn't implement db security
> beyond that: expressiveness.

I agree that there is a reason, but that's not it; there is a combination of
the lack of features in popular (particularly free) databases in the early
part of the period (heck, MySQL/MariaDB _still_ doesn't support row-level
security), inertia, and lack of database knowledge among web app developers.

> But I'm not gonna change your mind on it today. Just think about it for a
> while.

This is not a new issue for me; I _have_ been thinking about it for quite a
long while.

------
mamcx
The traditional RDBMS have failed because are all-or-nothing.

A lot of people here (and elsewhere) think is non-sense to build a full app
with the full business logic inside the "db".

WHY?????

That is a VERY NARROW viewpoint.

But when we say "let's build a full-app inside a Virtual Machine, yeah that
actually is ok!"

And why is ok to build a full app on lisp? Or in a OO language (a grah of
objects)? Or a array language (a array is relation with 1 column!)

If you think that:

print([1, 2, 3])

Id OK. then YOU MUST ACCEPT THAT:

print([Code = 1, 2, 3; Name= Miami, New York, Bogota])

Is ALSO OK.

The relational model is just move from 1-columns arrays to 2 N-Columns (In
rows of columns as internal storage) plus some universal operations.

WHERE THE RDBMS FAILED EVERYONE IS:

Because them (the guys at the DB side) insist in adding: transactions,
triggers, Surrogate-Keys, Inter-Relation dependencies, storage, sub-query
languages, catalogs, views, etc.

So at the end, you get a full half/big semi-OS virtual machine tailored to a
specific niche.

\------

Was only when the artificial divide between the RDBMS and the front-end
language appear (and the death by MS of Fox/Vb to only focus in .NET) that
building database apps start to suck big time.

I have talk about in HN before about this, and instead consider that make even
MORE sense to build the logic inside the DB, however, is necessary to re-think
how it look to make it more useful. Is not a novel concept. The dBase family
was almost that, and the people like me that use it was very happy and
productive.

Why make more sense? Because Program = Data + Algo.

Data is not to be treated as third-class citizen. Must be a first class. The
relational model make it first class (as with lisp model and array).

And what about separation of concerns and all that? That is pure architecture
and is tangential to be or not inside a DB, the same is tangential inside a
VM.

------
lstroud
I would argue that databases tried to do too much for too long. If you want a
database that has a built in web server, rest apis, security, etc...well
Oracle has had that for years. Problem was, that wasn't what people wanted (or
wanted to pay for).

The more recent trends have been to decompose the database into something that
is great at storing and retrieving data...leaving all of the other stuff to
products that do that well.

------
matheweis
It sounds like the author is unhappy with the fact that web apps have a middle
layer that speaks with the database over a terminal emulation.

Ignoring the fact that it happens to make a very nice boundary for an
abstraction layer...

> Databases only talk custom binary TCP protocols, not HTTP. Not REST

I take it the author hasn't heard of SCIM?

------
aug_aug
"Access control on modern databases is too course." \- I think you mean
"coarse" here.

------
brlewis
I agree with the article that modern middleware is too fat, and we could make
more reliable systems by leaning on a full-featured database.

I disagree with the three reasons he gives for why we have fat middleware
today. PostgreSQL solves 1 and 3, and PostgREST solves 2.

------
Animats
And, to fix it, he says, we need _functional programming_ at the database
level! What?

Today's databases are probably the best part of the server side stack. The
parts that talk HTTP, JSON, and do business logic are usually worse.

~~~
spacemanmatt
I find the DB part to be the only sane part of my stack these days. It's funny
though, I got the hang of functional programming by cross-training in SQL
first. Writing attribute expressions is a lot like writing pure functions of
immutable data.

------
mnm1
This rant basically asks: why don't databases do, out of the box, all the
custom functionality we program into our web servers?

In other words, why don't databases program the app for me?

What a waste of bandwidth.

------
elchief
How is a db with row and column security "coarse"? I mean, row+column security
means you can secure down to individual cells.

Sounds like the author doesn't know what he's talking about

------
dan31
No plumbing is required in modern architectures such these of SAP HANA,
Starcounter, or Tarantool. When the application server and database are
combined, the access control is arbitrary, databases talk whatever you want,
no need to torn the code apart into stored procedures, backend and frontend
code. Plus to this, lots of unnecessary moving parts are removed in such
architectures, so that the whole thing runs on 2 servers instead of 20, while
the code is simpler than ever.

The referred article needs clarification. What it really addresses are the
flaws of conventional software architectures. Fortunately, this critique does
not generalize.

------
z3t4
So you want to implement a new feature, and test it locally, then run tests to
make sure you didn't break anything, and when you are done, upload the changes
to the production server. This is sane devops, nothing fancy, yet impossible
with todays databases. You basically have to manually do the same changes you
did in development to the production database, then test if it works on live
data in production. Then your finger might slip when writing an SQL query and
you have to reset all production data from backup.

~~~
majewsky
I for one would not let developers access ny production databases when they
clearly don't have any idea how to copy a snapshot into the QA database.

~~~
z3t4
May I ask how much it would cost, in total man hours, to for example change
the name of a field ? For example make "name" into firstName and lastName.

~~~
majewsky
Anywhere between maybe 1-2 man hours and multiple man months, depending on how
nice the data is right now, and how accurate it need to be split. The optimal
case would be the application already enforcing values to look like e.g.
"last, first", so that the migration can happen via a simple regex.

------
aykutcan
Interesting.

Should humans interact each other over HTTP or REST too ?

------
fs111
failed the web? They made it big in the first place!

