
Simplify: move code into database functions - oskarth
https://sivers.org/pg
======
jconley
Stored procedures, functions, triggers, etc, are very useful tools. They are
used regularly in Microsoft SQL Server and Oracle systems.

But, one has to be very careful to use them when warranted. The out-of-the-box
tooling for debugging, maintaining, versioning, and testing code that lives in
your database is not nearly as robust as what you have for most other
development environments.

The key here is to use the tool when it is most useful.

SPROC's are good when:

1\. You need extreme data security and you are able to pass authentication
tokens from application to database. You can apply authorization rules when
you read/write the data, rather than transporting more than you need over the
network. This eliminates a number of potential attack vectors. I have worked
on HIPAA compliant systems for government agencies that held sensitive data
that required all data access to be through stored procedures with proper
access permissions, in addition to an authorization tier in the application
logic. Very common use case.

2\. You need to churn through a bunch of data and need very close data
locality to pull that off and meet your performance requirements.

2a. Your database nodes are network IO bound and this is due to applications
requesting more data than it needs, and that logic can be performed in the
database.

3\. You have a very strong DBA/Developer that can manage your data access
layer that your application uses as well as code the database. This can
abstract the "how" of data storage from the application developer, which can
be very useful in a data-centric large application.

However, SPROCs are bad if:

1\. You have to switch DB providers due to scalability, or cost issues, etc.
The switching cost can be huge if there is a ton of logic buried in your
database.

2\. Your database nodes are CPU bound. In this case you'd want to do as little
logic as possible in your DB. This is more rare nowadays, but used to happen.

3\. SQL (or whatever-sproc-language) is not a core language you want your team
to have to become experts in.

4\. You don't like adding more code. You'll end up generating and writing yet
another layer of code, this time for stuff that lives inside your database.
This code has to be maintained, versioned, etc.

~~~
dantiberian
_The out-of-the-box tooling for debugging, maintaining, versioning, and
testing code that lives in your database is not nearly as robust as what you
have for most other development environments._

This is the key point for me. I've seen SQL databases with 60k+ LOC in stored
procedures, and functions (thankfully they avoided triggers). This code wasn't
versioned, tested, or commented. While it's technically true that you can test
and version SQL code that lives inside a database, I've never seen it done.

This is mostly a tools issue, although its also partly a cultural issue, where
'stuff that goes in the database' isn't seen to be subject to the same laws of
entropy and software development that all other code is.

~~~
zamalek
> I've never seen it done.

See my comment[1]. We're doing it and it works beautifully.

[1]:
[https://news.ycombinator.com/item?id=9485502](https://news.ycombinator.com/item?id=9485502)

~~~
dantiberian
Awesome, I'd love to try working in an environment like this to see how it
works and experience the tradeoffs.

------
sitkack
The reasoning always offered for treating the database as a dumb store with no
intelligence has been about portability. Which is fine if you are selling
third party software meant to be installed onto an existing customer's
database. But for all other use cases, not using the features of the platform
is a mistake.

PostgreSQL offers so much!

    
    
        * Stored procedures in Java, Python, Lua, Perl, JavaScript etc [1]
        * Multicorn [2], query external data sources
        * Common Table Expressions [3]
    

[1]
[https://wiki.postgresql.org/wiki/PL_Matrix](https://wiki.postgresql.org/wiki/PL_Matrix)

[2] [http://multicorn.org/](http://multicorn.org/)

[3]
[http://en.wikipedia.org/wiki/Hierarchical_and_recursive_quer...](http://en.wikipedia.org/wiki/Hierarchical_and_recursive_queries_in_SQL#Common_table_expression)

~~~
woah
Yea, I've never really understood the portability argument. How often do
people switch between sql dbs?

~~~
tl
One of the advantages is the ability to run a different database during local
development / testing from your production systems (say H2 locally vs. DB2
running in production). Personally, I view this to be an anti-pattern, but
some people have to work with systems like this.

~~~
calpaterson
Yes, this is a very bad idea. Due to the way that query planners work, it's
not even a good idea to vary the shape of the dataset between testing and
production. If you have a different (bigger, higher stddev, etc) distribution
of data in column x, the planner will often complete a query differently

~~~
ams6110
Agree, unless your need for a DB is quite trivial, then perhaps something like
SQLLite for the devs and whatever else for production. However this breaks
down quickly once you grow beyond needing very simple INSERT and SELECT type
operations.

~~~
sitkack
At which point one should just run SQLite everywhere, It is often excellent
for datasets up to about a gig.

------
ak39
It takes guts to write like this. And I fully agree with Derek.

The sheer momentum created by the pontiffs, academics and pundits on this
subject who exhorted the virtues of layering architectures moved us into a
blind alley. The OO crowd (barbarians) were at the gates first. After two
solid decades of this slow exodus of logic from RDBM systems to Java, .NET and
whatever other middle-tier architecture we dreamed up, we were inevitably back
to the same problems we thought we were avoiding: remove complexity, improve
maintainability and portability. And we gave up a lot to get that: we gave up
reliability, we gave up performance, we gave up security and we gave up
simplicity. Ah, but no worries, we thought, we could write faster iterating
loops, introduce threads, write custom access control and authorisation,
custom caching solutions ... yeah, we will do that but do everything to eschew
the simplest of set-based operations on SQL databases! :-(

But there was no lying about how we missed the database in our business tier.
Oh we missed it. We missed the database so much outside the RDBMS that we
REcreated it! And we christened this compromise of an ingrown toe-nail an
"ORM". Two decades have come and gone and we are still agonising this
unwinnable war that Ted Neward infamously called the "Vietnam of Computer
Science".

So, while we jettisoned the relational database onto the scrap heap of
software development tools, ironically the makers of the RDBMS software (both
open source and commercial) continued to advance and mature it. Patiently. We
have CTEs now. We have parameterised tables.

It takes all of 5 minutes to show a developer who has never seen the impact of
doing a set-based business rule implementation vs one in Java etc that we are
too far gone in our craft!

Derek is spot on. Respect.

------
erichmond
It's incredible to me that he watched Simple Made Easy (which is an incredible
talk), and his mind went to "stick more stuff in the database". Not because
that approach wouldn't work, but to me a clear sign of complection would be
conflating the medium of data storage (a database) and the business rules that
govern the data itself. But I guess that's the the sign of a good talk. It's
open to interpretation.

It also complicates your scaling story, your caching story, your ability to
change databases and a great number of other things.

If you're building an app that has no chance of ever possibly needing to
scale, or use a different database for caching, searching, or analytic
reasons, then this approach may be optimal, but if any of those levers need to
move at some point, I think you'd be in for a world of hurt.

~~~
jessaustin
I think we're talking relational DBs here, right? In fairness you already have
business rules in the DB, implicitly, just from the layout of your tables. You
probably also have foreign keys. At that point, a stored proc that translates
business needs like "at customer location X, stop service A on this date and
replace it with service B" to the multiple table reads and inserts required,
seems like a good idea. It's actually not that weird to keep most of your devs
out of the database entirely, in which case they probably appreciate being
able to just call a stored proc to get something done.

------
BurningFrog
A serious problem with this approach is scaling.

When your code is in Ruby/JS/whatever on separate servers, and your traffic
increases, you can easily scale up by adding more stateless servers.

Going from 1 to 2 DB servers is much harder, and can easily mean a major
rewrite, during which your site may not really work.

~~~
y0ghur7_xxx
> A serious problem with this approach is scaling.

Stack overflow runs off one SqlServer. Do you think your app is going beyond
that? If yes, maybe this approach will not work for you. If no, you are in the
99,99999% of all other sites/apps that are not as big as SO, FB or google and
you can use this simple way to make your site or app.

~~~
superuser2
StackOverflow's database doesn't need to work that hard because the vast
majority of traffic is read requests for public, essentially static webpages
that get re-rendered and updated a handful of times in their lifecycle. I'd
guess 90% or more could be served by a reverse proxy without ever touching SQL
Server or even the app tier.

It would be easy to exceed its DB load with an application that serves unique
content to each user, is write-heavy, etc.

~~~
y0ghur7_xxx
> It would be easy to exceed its DB load with an application that serves
> unique content to each user, is write-heavy, etc.

Just curious: did YOU ever reach that point? I mean where a well optimized db
engine reached it's limit? Can I ask you on what website?

------
jalfresi
In most of the example he gives, these solutions are ensuring the integrity of
the data stored, which I personally have no problem with; that is the job of
the database in my view. If I try and store crap the database should refuse,
be that via constraints, triggers or stored procedures.

As for returning the data in JSON, again, I see no problem with that; you were
going to get the data out in rows and columns, what difference does JSON make,
especially if it simplifies things, as he is arguing.

My main concern is business logic; should business logic be stored in the
database. At this point things become a little grey, primarily due to the
difficulty in maintaining and managing SQL code. I suppose that PostGres is
better that most of the DBs I have had to deal with in the past in this
respect in that it supports a wider range of languages for this purpose, but
I'd be interested to hear others opinions and experience in this regard.

~~~
calpaterson
Sadly, to some extent the data integrity is part of the business logic. For
example, if products can or cannot be sourced from multiple suppliers

------
jos
I find myself a bit thrown by encrypting passwords with a stored database
function... which involves communicating with the database server with the
unencrypted password.

The Microsoft SQL Server and Database Management Certifications and associated
course work strongly discourage sending a password in that manner -- no matter
how secure the connection is.

I'm unable to agree with the "simplify" aspect of the article title. My own
personal experience tells me data should be validated where it is accepted
rather than sent on to the next step -- the contrary may raise unintended
additional security concerns or exploits.

Granted, my view stems from the article author's directive given towards "web
or api developers." There are many situations where software makes a
connection directly to a database rather than indirectly.

------
SchizoDuckie
Historyically, I've been against putting logic inside (MySQL) database since
it's a bitch to version and maintain (especially cross-database-user)

Maybe Postgres is a lot better, but for MySQL I'm staying the hell away from
it

~~~
olefoo
It does mean you need to treat your data schema and functions as code.

My short checklist for doing database centric development (postgresql
centered):

* Database schema is kept under version control ( essentially all the DDL for a given database is kept in one place and managed as a unit ). You can use `pg_dump -s` to extract the schema from an extant database.

* data types and functions are kept in separate files under version control

* the database is built and filled automatically for tests, for migrations and for upgrades. Shell script, Makefile, Chef; the tool doesn't matter so long as the process is repeatable and automatic.

* data fixtures for testing are kept separate from the code ( own repo usually, since you want to be able to add edge cases and errors to the test database ).

* migrations on the production database are run immediately after a backup.

Nothing too fancy, just basic hygiene. But staying on top of that means you're
ready to do things like build your application onto a new set of machines; or
make drastic changes to how your data is structured.

There are a number of tools that do much of the scutwork for you ( South for
Django, ActiveRecord::Migration for Rails, etc.) but the tools are not a
substitute for understanding what your database is doing.

~~~
geon
> pg_dump -s

I wholehartedly agree that the schema should be versioned. But the schema dump
is imho unreadable. I edit it manually instead, which has its own drawbacks.

Since I referr to the schema a lot, I think it is worth optimizing for
readability.

------
y0ghur7_xxx
It's 2004 all over again:
[https://web.archive.org/web/20060525094651/http://www.oracle...](https://web.archive.org/web/20060525094651/http://www.oracle.com/technology/pub/articles/odtug_award.pdf)

I love this approach and I used it whenever I could in the last 10 years. It's
just so much easier when all you have is a DB and a JS/HTML/CSS frontend.

------
picardo
Have we learned nothing in the last 10 years? Aside from the siren song of
better performance, there are few other reasons to model your domain in the
database.

Throwing business logic into stored procedures wily-nilly is a terrible idea
for a lot of reasons. First, there is no standard method for testing stored
procedures functions; you have to put a lot more effort into devising a test
strategy for your database functions than your controller functions. In
addition, you've now coupled the business logic with the database, and you
cannot move your data around easily. And, lastly, if you have a large
database, adding new functions and triggers will involve some downtime, too.

To sum up, don't replace your Rails app with a database unless you really know
what you're doing.

~~~
ams6110
Most SQL databases that support stored procedures have some kind of unit
testing framework available these days, in fact have had for many years.
Possibly as a third-party product or project, but the tools exist.

If you design your stored procedures right, they can be an abstraction over
your data model. You then can change the data model without the stored
procedure interface necessarily having to change (though in most cases,
changes to the data model are usually done to support new requirements, which
typically involve new or changing business logic as well).

~~~
picardo
That's good to know. The question then becomes whether it makes sense to
support yet another testing framework. The pros of performance gains have to
outweigh the cons of additional complexity by a considerable margin to make
database functions a viable choice over application level functions.

------
jqm
The article makes some valid points but anyone who thinks databases should be
used for everything and the kitchen sink should come fill out an application
where I work. They could then spend the next 10 years debugging hard to
troubleshoot database issues instead of easy to troubleshoot code issues.
Jamming too much logic in the database is a recipe for disaster in my
experience.

------
AdrianRossouw
I've gone down this road before, and one of my biggest complaints was around
managing and deploying the code built this way.

I just found plain server side code easier to reason about.

------
A_COMPUTER
Having maintained a giant codebase in Oracle PL/SQL, I'm going to have to
respectfully disagree and run screaming in the other direction.

------
lesingerouge
I'm currently reading Martin Fowler's "Patterns of Enterprise Application
Architecture"[0] and I think that one of the main points that stuck with me is
his recommendation for separation between the various layers and functions of
a software system.

Basically the whole system should be a layer cake of smaller modules/systems,
as "opaque" as needed to one another, but interacting with each other solely
through a well defined "internal API".

I think the approach proposed in the article is good when starting or
prototyping projects, but the object oriented and layering approach, even if
it's a bit more challenging to visualize correctly at the start of the
development of an app, is worth its weight in gold as features start piling
up.

[0] [http://www.amazon.com/Enterprise-Application-Architecture-
Ad...](http://www.amazon.com/Enterprise-Application-Architecture-Addison-
Wesley-Signature-
ebook/dp/B008OHVDFM/ref=tmm_kin_title_0?_encoding=UTF8&sr=8-1&qid=1424636059)

later edit: for clarity.

~~~
sitkack
Putting more intelligence in the data layer doesn't break the modularity of
the system, it also doesn't obviate the need for middle layers. The biggest
thing it does enforce is a semantically clean gateway to the underlying data
store. Databases can do a whole lot declaratively in way that is correct by
inspection. Six lines of stored procedure can save hundreds in the middle tier
while allowing other tools to access the database directly, w/o forcing
everyone through the middle tiers.

------
ams6110
I like most of this thinking, and it's the way I've built systems for years.
Data model and an API for it in the database. Apps built to use that API,
never access tables directly (or at least, that is the ideal).

This lets me build systems using different front-end platforms and languages
as needed and appropriate for the task, calling one tested API to interact
with the data.

One caution I would have is with triggers. Triggers are a way to make things
happen as if "by magic" and I avoid them. I prefer to keep things more obvious
and easier to reason about. If inserting a row into some table should have a
side-effect, I code that explicitly and don't make it happen via a trigger.
One exception might be to auto-generate IDs if the database doesn't have a
type that does that for you.

------
alkonaut
> "In 1997, I started in Perl. In 1998, I switched to PHP. In 2004, a rewrite
> in Rails. In 2007, back to PHP. In 2009, minimalist Ruby. In 2012, client-
> side JavaScript. Each time I’d have to re-write all of the logic around the
> database: how to add a new person into the database, how to verify an
> invoice is correct, how to mark an order as paid, etc. But that whole time,
> my trusty PostgreSQL database stayed the same."

And what if during that time instead of switching programming languages, it
was the storage system that had to be switched? You could easily invert the
argument "Don't keep the logic in the DB! I had to switch from Postgres to
MSSQL which meant I painstakingly had to convert all my db-logic, I wish it
had been in the code instead!"

~~~
devdas
Switching programming languages is far more common than switching databases.

~~~
alkonaut
*Citation needed

You wouldn't even need to switch the database, just add a cache between the
storage an the app, or add/switch ORM etc.

There is a huge impedance mismatch between table storage and code, but I don't
see how it gets more responsive to changed requirements by using logic in the
DB.

And don't even get me started on the whole debugging/unit testing issue with
logic-in-databases.

------
tunesmith
I think it can be a good idea for certain very predictable applications -
predictable in terms of scaling and future feature enhancements.

But beyond that this strikes me as a bit too complicated. In particular,
coupling business logic with database choice is risky. Martin Fowler, Bob
Martin, and others underscore that the real cost of development is not the
inability to predict the future, but in how difficult or time-consuming it is
to change, when reality changes unexpectedly.

As soon as something happens that would bring repository choice into question,
or would suggest the need to split parts of the business logic up into
different locations, the choice to couple business logic with database
implementation would start to feel really limiting.

~~~
y0ghur7_xxx
> As soon as something happens that would bring repository choice into
> question

I don't know. Once a project is started with postgres/mysql/orcale, how often
do you change the DB?

We have some projects that started with struts, then we changed the fe to jsp,
the asp.net, then HTML/JS + MVC.net. But the DB was always Postgres. It never
changed. In my experience what changes often is the frontend, not the stuff
that keeps your data.

I would optimize for fe/middleware changes, not db changes.

~~~
tunesmith
Plenty of examples, although they all tend to be higher-scale considerations
that might not apply for your small business or side project:

1) You have "friends" or "friends of friends" modeled in Postgres, but then
you realize that you want/need to do more social graph stuff in particular, so
it makes more sense to use a graph database like Neo4J

2) You're saving a ton of activity data in postgres but your site is slowing
down, and you realize you don't need the archived data, so it makes more sense
to use something like Redis for that chunk of data

3) You're saving a ton of log data but the storage of it is going way up, and
you're only querying it for big data / data warehouse purposes, so it makes
more sense to use something like Cassandra

4) You're wanting to write a new feature set that relies on existing
normalized data but would require new complicated queries and joins and
business logic to deal with it, plus you're running into scaling
considerations, so it makes more sense to convert to a reactive event-based
system, such as micro services consuming a kafka topic, where each micro
service has its own dedicated repository

In each of those cases, if you have used Postgres combined with sprocs, it's
much harder to transition than if you had coded to interfaces and relied on
your db in a dumb sense.

------
webnrrd2k
The biggest reason to use stored procedures to me is that db developers tend
to stick around around a business lot longer than web developers, so putting
most of the business logic into the db is better for the business in the long
run.

Like the many developers, I generally develop custom apps specifically for a
particular business. When I develop new apps I start by by putting all the
intelligence in the serer-side code, so it's easier to change things without
needing to go through someone else.

Once the apps stable then much of the business intelligence gets moved into
stored procedures in the db level.

------
dorfsmay
It depends what you do...

If you need to manipulate a lot of data that ends up travelling on the wire,
you can save a lot of time by running it on the server.

On the other hand, the language on the DB servers are often a bit harder to
reason with.

~~~
yawboakye
_> the language on the DB servers are often a bit harder to reason with_

But can't we say that about every other language? I've been writing a lot of
PL/pgSQL lately and I've enjoyed every bit of it. For PostgreSQL there's also
PL/Python, PL/V8, etc. Those are packages for writing the server code in
python and JavaScript respectively and out the window goes the obscure
language excuse. But that's if you're on Postgres.

------
michaelmcmillan
I always enjoy reading about new ways of doing things, so thanks! How would
you test your logic in isolation with unit tests using an architecture like
this?

~~~
vp89
The same way you test any other piece of code.

A stored procedure is just an encapsulation of logic, with inputs and possibly
state changes and/or outputs.

Expected/actual asserts can be coded in SQL.

The code tends to be verbose, but some RDBMS have evolved to include more
powerful language features. It's probably possible in almost all RDBMS, but
certainly easier in some than others.

------
jakejake
Before the days of GUIs when regular office workers used the command line, the
database was an all-in-one application. Stored procedures were created by the
DB admin so that the non-techies could run reports with simple commands.

I've found it interesting that all of this functionality has remained in
databases, but is almost never used in modern web applications.

------
emmab
I wrote a gem to help test that unique indices match uniqueness validations in
rails [https://github.com/emma-
borhanian/schema_expectations](https://github.com/emma-
borhanian/schema_expectations)

If you only do rails-side validation, you have an (exploitable) race
condition.

------
pw
Is anyone working on better tooling for debugging, maintaining, versioning and
testing code that lives in the database?

------
HarrietJones
So, at one end, we're moving stuff out of middleware and into the database,
and at the other end we're moving all views into the client.

Imagine if SQL returned javascript readable sets of tabulated data... If it
did, then you could completely remove all processing on the web-server.

------
alfiedotwtf
And now you have two problems.

------
tbarbugli
been there done, killed productivity of a team and then stopped doing it :)

~~~
tbarbugli
to expand, DB functions are usually:

* hard to test

* code is harder to maintain

* vendor specific

* often written in very weird programming languages

