
When "clever" goes wrong: how Etsy overcame poor architectural choices - llambda
http://arstechnica.com/business/news/2011/10/when-clever-goes-wrong-how-etsy-overcame-poor-architectural-choices.ars
======
benjohnson
I've learned to _never_ _ever_ badmouth a weird architecture in a working and
profitable company.

Simply - the people that threw it together probably had significant
restraints: Lack of money, lack of knowledge, lake of time. To then come in
and poo-poo their work (that is paying the bills) is bad form.

...

We all can sit back as say it sucks, but I don't necessarily think that
sticking business logic in DB stored procedures is always the wrong thing: If
you think of the DB storage system as separate from the DB procedure system
then there's the split right there - it just happens to be running in the same
process and you just have to be extra careful.

~~~
efsavage
"I've learned to never ever badmouth a weird architecture in a working and
profitable company."

I learned that lesson myself earlier this year. A client had a system that was
just so ... wrong, I found myself thinking I'd made a terrible choice of
project, and initially marvelled at the fact that the company simply survived.

I stuck it out because it was a short-term engagement and eventually learned
that they'd solved an "unsolvable" problem, one that I'd tried myself and seen
dozens of other attempts at, all eventual and apparently inevitiable failures.
But this place, by ignorance or genius, seemed to make the wrong decision at
every step, and did it.

Not to say that I agree with "if it ain't broke, don't fix it" all the time,
but when you see something done wrong in a new way, make sure that you extract
what you can from it and you might be surprised at what you'll learn.

~~~
chime
> they'd solved an "unsolvable" problem

Without giving away any proprietary information, could you elaborate on that?
It sounds very interesting.

------
tptacek
A lot of this makes sense (DBAs gating features! _shudder_ ).

But I'm not totally clear on why a switch from Postgres to MySQL was
warranted.

Just because Postgres can do stored procedures well doesn't mean that it's
effectively a stored procedure server. It also happens to handle SQL pretty
well too. :)

~~~
joevandyk
They say they use master-master replication with mysql so they have no single
points of failures. Back then, I think the only option with postgres was to
use slony.

~~~
mcfunley
This is correct, master-master is necessary to shard as we have implemented
it.

~~~
leandrod
Sounds like the <http://mywiki.wooledge.org/XyProblem>

------
eftpotrm
Interesting, but do we have enough information? To quote Spolsky...

 _A 50%-good solution that people actually have solves more problems and
survives longer than a 99% solution that nobody has because it’s in your lab
where you’re endlessly polishing the damn thing. Shipping is a feature. A
really important feature. Your product must have it._

(<http://www.joelonsoftware.com/items/2009/09/23.html>)

This article doesn't go into details about the history of _why_ Etsy's
architecture is the way it is. It's entirely possible that this architecture
enabled them to launch and iterate faster at a critical point in their
history, without which they'd be nowhere.

Do what you need to _now_ that will work. If your project is successful enough
to experience a few orders of magnitude growth it'll need rearchitecting in
some way anyway; build the heavy engineering when you know what you need, not
think you know what might be the pain point.

~~~
div
It doesn't feel like this applies here. 6-7 years ago a decision was made to
go with stored procedures over using an orm or at least implementing db logic
in the application code.

Along with this, they also chose to split the responsibility for code and sql
across two teams.

6-7 years ago, plenty of other options were available. In fact, the sharded
MySQL solution that they use now was already possible back then.

It sounds more like they made some architectural decisions based on the
companies org chart and it came back to bite them in the ass hard.

edit:typos

~~~
SoftwareMaven
The constraint may have been knowledge. They may have made a very reasonable
tradeoff to forego learning "best practices" and "proper architecture" to just
get something out the door. The original developer may have just graduated
with his philosophy degree and never have written more than a bash script.
Hard to say.

Given the size of the company, the architecture didn't grow along org charts
(that is a very real phenomenon in large companies!). Rather, the org chart
grew along architectural lines. Regardless of cause, it is a pretty
significant smell and can tell you something is (or will be) wrong.

------
dclaysmith
Good case of avoiding early optimization but hard to believe in 2005 someone
thought it was a good idea to build your business rules in your database
layer.

~~~
randomdata
I don't find it to be all that surprising. I remember when Ruby on Rails was
just starting to gain in popularity around the same timeframe. One of the big
arguments against it was that it pushed you to put all of your business logic
in the application. "How will you write another application to use the same
database?" they would cry.

The answer turned out to be write one application tier that provides a web API
to the database for all other applications to use. Interestingly, that turned
out to be a pretty popular solution and is now probably the most prevailing
method to get data to your user-facing applications. Looking back, it seems
pretty obvious especially with the rise of mobile apps using that technology.
It wasn't clear to most people in 2005, however.

~~~
raganwald
This seems like an ongoing debate, I remember similar discussions between Java
developers and DBAs before the rise of RoR. The answer back then was to
migrate business logic into business logic servers, with application servers
talking to the business logic serves over a queue. Architectures like that
allow organizations to have heterogenous application architectures: One
company had .NET for internal apps and Java for customer-facing web systems.

Everything talked happily to a business logic server written in Java, and that
server talked to a couple of different databases and some legacy systems, one
of which was written in MUMPs.

------
olavk
I suspect the real problem was social or political. To do anything that
involved the database, the developers had to request the DBA's to write the
SP's. This kind of bureaucracy invariably leads to developers advocating
NoSql/schemaless databases, storing data structures as XML in a blob field or
whatever else they can come up with to route around the obstacle to
development.

They chose a new architecture and platform which made the broken process
impossible for technical reasons, but I wonder if they would have gotten the
same benefit cheaper if they just had fixed the process in the first place?

But I guess it is easier politically to shift platform than changing a process
which reflects turfs and hierarchies in the organization.

------
kogir
I honestly don't understand all the hate on stored procedures. Tracking
changes is trivial, since they're a part of the schema -- use version control.
It's quite possible to use them with a properly sharded DB architecture, and
still enjoy their performance and security benefits.

As a quick example, how else would you do an upsert (update or create) with a
single trip to the database server?

~~~
ars
> Tracking changes is trivial, since they're a part of the schema -- use
> version control.

It's not quite that simple. You can't just "install" the new schema onto an
existing database with data like you can by just copying files. You have to
write your own set of alter statements to keep things synchronized (sometimes
with complicated data migrations to new tables).

It works ok for 1 or 2 databases since you just do it manually - but then you
lose many of the benefits of automatic source control.

And if you have a large number of database servers - or many unrelated
(client) installations - you need a much more complicated system, and it's far
from trivial.

Putting the SQL commands in the code is a lot simpler. There are no security
benefits to stored procedures vs bound SQL statements. Perhaps it's a bit
faster, but I'm not so convinced, you are trading parsing time for execution
time since the stored procedure is now a program instead of a data update
command.

~~~
wiredfool
Stored procedures have different security semantics than bound sql statements.
See the security definer attributes.

Briefly, a stored procedure can run with the calling user's permissions, or
the definer's permissions. If you set up a function as a security definer, you
can do things with data that protect it from disclosure in a way that you
can't in a sql statement. You can do it similarly with views, but they're more
of a read only case.

------
nateberkopec
What are the institutional reasons that "frequent, small" software releases
are never the first thing companies turn to? I mean, iterative development and
TDD have been around since the 50s, but it seems that every single company has
to rediscover it before they try it.

~~~
brown9-2
I think it is because more "control" and "testing" over changes sounds better,
and safer, at first glance than speed-to-deploy does. It's easier to choose
the option that sounds stable.

But then over time, companies realize that their tendency to add bureaucratic
gate on top of gate to the process has begun to harm them because they can't
release simple updates in anything less than a few months.

Frequent and small deployments sound riskier, and tougher, at first.

~~~
nateberkopec
Hmmm...that last sentence doesn't jive totally with me. As a business-guy, I'd
rather diversify my risk over many deployments than over one deployment every
six months.

The word "control" probably has more to do with it. The option that makes
management more important (big requirements documents, for example) is
probably the one management will choose.

~~~
joshhart
I can tell you what the problem is: overzealous postmortems.

The head of engineering wants to be able to say what went wrong and how to fix
it next time. That's probably fine, if the solution is better automation and
failure detection. But once you go into postmorteming a situation where you
dropped a few requests to a minor service, the whole process turns into a pile
of shit where the engineers and operations team are scared, management feels
in control, and the product managers are left screwed because engineering and
operations no longer take risks and everything takes forever.

Engineering management doesn't stop and look at the cost of what they're doing
and when software takes forever to release, they don't look in the mirror for
the blame. I received a weekly technical email once containing info that 10
requests were "throttled" during a deployment. That's great! We don't need any
"next steps", our failover worked fine.

------
systemizer
_Sproter also 'created substantial developer friction,' Snyder added, because
it required DBAs to write stored procedures for nearly every piece of site
functionality—and created a bureaucracy developers had to go through to get
functionality made._

Sounds like a horror story.

And to reiterate the last statement of the article: _if you're doing something
"clever" you're probably doing it wrong._ Or in other words, if you don't know
what the hell you are doing, then copy someone who probably knows what they
are doing.

~~~
lurker19
The insane part is that a startup has bureaucracy, not that code is in a
RDBMS.

~~~
MartinCron
Bureaucracy is just process that you don't like, isn't it?

~~~
RyanMcGreal
It goes beyond that. Process is rules that stop people from being able to do
things they shouldn't; bureaucracy is process that stops people from being
able to do their jobs.

~~~
MartinCron
_Process is rules that stop people from being able to do things they
shouldn't_

People tend to disagree on what should or shouldn't be able to do. Should a
developer be able to release code to production servers? Should a developer be
able to change the database schema? Should a developer be able to access the
database tables through an ORM or ad-hoc SQL or must everything go through
stored procedures?

A healthier way to look at process is that it's about facilitation, not
control.

~~~
artsrc
Bureaucracy is a kind of process group think. It involves adherence to fixed,
externally defined, rules. Each contributor is doing things in a way they
think are cumbersome and difficult to change.

The alternative to fixed rules, is to discuss issues and tradeoffs, and trust
developers to make good decisions based on knowledge.

~~~
MartinCron
_and trust developers to make good decisions based on knowledge_

I would add to that, "give individuals enough room to make mistakes that
everyone, especially themselves, can learn from"

------
brown9-2
I'm a little surprised this sentence hasn't gotten more attention or turned
more heads:

 _The presentation side was driven mostly by PHP on Lighttpd web servers,
chosen at the time because the Etsy team felt Lighttpd was less common and
less likely to be hacked._

~~~
rohit89
Me too. Security through obscurity reasoning.

------
spydum
Yeah, I'm going to assume the technology change was not what improved things,
but the engineers who were working on the problem.

At the end of the day, People solve problems, not technology. Someone with
sufficient skill could probably rewrite the lions share of the environment in
Perl, had they been given the task and enough resources.

It's all about your engineers, and the resources you give them!

~~~
cbs
>Yeah, I'm going to assume the technology change was not what improved things,
but the engineers who were working on the problem.

I agree, this sounded like a technical solution to a social problem. The real
issue sounded like the disconnect between the application folks and the db
folks. If trashing one decent architecture for another was the catalyst they
needed to bring around development culture change, it was a good move, but it
hardly sounds like they had a technology problem.

------
joshu
There is so, so much more to this story that isn't here that it is practically
not worth considering stuff like "mysql vs postgres" etc.

Really.

------
code_duck
To sum up, how Etsy fixed their technical problems: they hired a bunch of
people from flickr.

------
joevandyk
I wonder why the developers couldn't write the stored procedures.

I'm hoping that the postgresql projects for easier master-master replication
get done soon!

~~~
vetler
In my experience it's usually because the DBAs end up "owning" all database
development, both because the DBAs consider it their domain, and developers
don't feel comfortable with PL/pgSQL _and_ they don't want to anger the DBAs.
It shouldn't have to be this way, though.

------
jjm
While I agree that MySQL is great and probably is most anyone really needs, it
does become harder to shard when your dealing with large scale. I feel you can
further extend your agile team with the benefits of NoSQL (speed, ease of use
like quickly adding/dropping columns). Some big data sites I know use both.

~~~
nateberkopec
My background is business-first, hacker-second, and I mainly read everything
on HN to pick up little places not to fuck up when architecting a software
business.

The benefit of NoSQL's ease of dropping/adding schema...that's going to be one
of the better ones.

~~~
lurker19
I hope your code is compatible with mixed schemas. Otherwise your NoSQL is
just moving the problem somewhere harder to test.

~~~
mattmanser
Can't vote this up enough.

The use of NoSQL is for very specific scenarios, don't make the mistake of
thinking it's a good object store replacement. As a business guy this is
definitely not a decision you should be making, let your team use what they're
happy with, they'll get a lot more done that way.

