Simply - the people that threw it together probably had significant restraints: Lack of money, lack of knowledge, lake of time. To then come in and poo-poo their work (that is paying the bills) is bad form.
We all can sit back as say it sucks, but I don't necessarily think that sticking business logic in DB stored procedures is always the wrong thing: If you think of the DB storage system as separate from the DB procedure system then there's the split right there - it just happens to be running in the same process and you just have to be extra careful.
I learned that lesson myself earlier this year. A client had a system that was just so ... wrong, I found myself thinking I'd made a terrible choice of project, and initially marvelled at the fact that the company simply survived.
I stuck it out because it was a short-term engagement and eventually learned that they'd solved an "unsolvable" problem, one that I'd tried myself and seen dozens of other attempts at, all eventual and apparently inevitiable failures. But this place, by ignorance or genius, seemed to make the wrong decision at every step, and did it.
Not to say that I agree with "if it ain't broke, don't fix it" all the time, but when you see something done wrong in a new way, make sure that you extract what you can from it and you might be surprised at what you'll learn.
Without giving away any proprietary information, could you elaborate on that? It sounds very interesting.
While this cleverness probably crimped Etsy at some point in their growth, it also probably saved a lot in hardware costs at a critical time in the business. It's worth nothing that they are still in business, that it was possible to throw hardware at the problem to buy time, and that they now have a solution they are comfortable with. While one can always learn from mistakes, I wouldn't argue too hard with that sort of success.
So was using a monolithic master database, despite all of the pain we have migrating off of it now. The way we work with MySQL now is great at scale, but it does require more work to do many things. It relies on denormalization and presumes the existence of some kind of job server to keep things consistent. If we started out trying to build the architecture we have now, we never would have gotten anywhere.
The python middle layer is kind of a different story. I think it was kind of a blind attempt at scaling to 1000x when we should have been worried about 10x or maybe 100x, and since it wasn't a reaction to a real problem it was closer to being just totally wrong. Rather than allowing us to iterate quickly for a while at our own future expense, building/deploying and dealing with it actually froze new feature development for an embarrassingly long time.
But I'm not totally clear on why a switch from Postgres to MySQL was warranted.
Just because Postgres can do stored procedures well doesn't mean that it's effectively a stored procedure server. It also happens to handle SQL pretty well too. :)
Given their existing investment in Postgresql, I can't imagine they suddenly had more experience internally with MySQL, though the CTO may have been the only person with any horizontal DB scaling experience since Etsy didn't scale before.
I've seen the pattern of "CTO changes tech to something he/she is familiar with" many times. Sometimes that's good; mostly, it annoys the teams who are now less efficient. We certainly don't have enough info here to know. The scaling argument (his knowledge thereof, not Postgresql's "inability") is pretty compelling.
Also, knowing how to scale DB X and not knowing how to scale DB Y often means scaling DB Y is a science project.
I feel bad for baiting people into psychoanalyzing Etsy's team.
You can apparently do that with Galera, but I haven't seen that in production.
What they're referring to is most likely MySQL's half-baked "circular replication". Some people claim it works for them. Personally I tried it once and the usual MySQL replication issues (desync, InnoDB deadlock) would create much more uncomfortable situations than in a Master/Slave setup.
With master/slave you at least stand a chance of having one half of the pair survive through a problem. With circular replication the cluster would lock up hard almost every time for us, and - to add insult to injury - leave the pair inconsistent after recovery.
PostgreSQL has many, many strengths over MySQL, but historically replication wasn't one of them. Built-in replication didn't come to PostgreSQL until version 9, and until then various poorly scalable trigger based log shipping schemes (like Slony) were used. I wouldn't be surprised if PostgreSQL's built-in replication system with version >= 9 works quite well nowadays, but I think at the time Etsy were considering the switch there was still quite a lot of FUD around replication.
A 50%-good solution that people actually have solves more problems and survives longer than a 99% solution that nobody has because it’s in your lab where you’re endlessly polishing the damn thing. Shipping is a feature. A really important feature. Your product must have it.
This article doesn't go into details about the history of why Etsy's architecture is the way it is. It's entirely possible that this architecture enabled them to launch and iterate faster at a critical point in their history, without which they'd be nowhere.
Do what you need to now that will work. If your project is successful enough to experience a few orders of magnitude growth it'll need rearchitecting in some way anyway; build the heavy engineering when you know what you need, not think you know what might be the pain point.
Along with this, they also chose to split the responsibility for code and sql across two teams.
6-7 years ago, plenty of other options were available. In fact, the sharded MySQL solution that they use now was already possible back then.
It sounds more like they made some architectural decisions based on the companies org chart and it came back to bite them in the ass hard.
Given the size of the company, the architecture didn't grow along org charts (that is a very real phenomenon in large companies!). Rather, the org chart grew along architectural lines. Regardless of cause, it is a pretty significant smell and can tell you something is (or will be) wrong.
The answer turned out to be write one application tier that provides a web API to the database for all other applications to use. Interestingly, that turned out to be a pretty popular solution and is now probably the most prevailing method to get data to your user-facing applications. Looking back, it seems pretty obvious especially with the rise of mobile apps using that technology. It wasn't clear to most people in 2005, however.
Everything talked happily to a business logic server written in Java, and that server talked to a couple of different databases and some legacy systems, one of which was written in MUMPs.
It's incredibly arrogant to find it hard to believe that someone thought stored procedures are a good idea.
SPs are one possible solution among many to common some programming problems.
It's also not an either/or choice. You can have a web API and still use some stored procedures.
I don't work on things big enough to think about this, and I would be curious if there were a summary of why/ why-not to put business logic in the DB, and where it SHOULD go.
I remember having that exact argument in 2005. Someone not only thought it was a good idea, but thought it was a good enough idea to aggressively argue with me for.
The whole codebase was this Lovecraftian nightmare where I kept fearing I would go insane if I kept reading.
Like Etsy, I discovered you needed a lot of them, and worse, they weren't very easy to debug or version control. I think I decided to pull them back into the application layer in the face of those problems, and I probably would have avoided the SP path entirely if I'd known about sharding back then.
They chose a new architecture and platform which made the broken process impossible for technical reasons, but I wonder if they would have gotten the same benefit cheaper if they just had fixed the process in the first place?
But I guess it is easier politically to shift platform than changing a process which reflects turfs and hierarchies in the organization.
As a quick example, how else would you do an upsert (update or create) with a single trip to the database server?
For whatever it's worth, you shouldn't need sprocs for security; in fact, sprocs provide one more way for SQLI to sneak into your code (dynamic queries based on tainted input inside the procedure) --- something that comes up a lot when DBAs are forced to be a first line of defense against attacks.
There's nothing wrong with stored procedures per se, but the architecture where every database call is a stored procedure matched to a use case from a higher tier is error-prone. PL/pgSQL is not a great programming language.
Its main feature is that it resembles PL/SQL.
Otherwise, you can choose to write stored procedures in Ruby, Perl, Python, Lua, C, R ...
> There's nothing wrong with stored procedures per se, but the architecture where every database call is a stored procedure matched to a use case from a higher tier is error-prone.
Agreed in general; I prefer to use stored procs tactically. The worst stored proc architecture I've seen is the idea that every table should be wrapped in a stored proc API. It's hideous.
It's not quite that simple. You can't just "install" the new schema onto an existing database with data like you can by just copying files. You have to write your own set of alter statements to keep things synchronized (sometimes with complicated data migrations to new tables).
It works ok for 1 or 2 databases since you just do it manually - but then you lose many of the benefits of automatic source control.
And if you have a large number of database servers - or many unrelated (client) installations - you need a much more complicated system, and it's far from trivial.
Putting the SQL commands in the code is a lot simpler. There are no security benefits to stored procedures vs bound SQL statements. Perhaps it's a bit faster, but I'm not so convinced, you are trading parsing time for execution time since the stored procedure is now a program instead of a data update command.
Briefly, a stored procedure can run with the calling user's permissions, or the definer's permissions. If you set up a function as a security definer, you can do things with data that protect it from disclosure in a way that you can't in a sql statement. You can do it similarly with views, but they're more of a read only case.
To be fair this is an issue with or without sprocs.
I use MySql and use UPDATE ON DUPLICATE KEY ... or REPLACE INTO.
Taking the example further, would you hand write how to verify that email address? Probably not, someone has obviously done that before and you shouldn't reinvent the wheel. Why would you hand write persisting a record to the database then?
I apologize if this comes off as attacking, but stored procedures are one of the greatest time sinks ever created wasting countless dollars that could have been spent fixing real problems. Like this guys, http://news.ycombinator.com/item?id=3067740
But then over time, companies realize that their tendency to add bureaucratic gate on top of gate to the process has begun to harm them because they can't release simple updates in anything less than a few months.
Frequent and small deployments sound riskier, and tougher, at first.
Helping people re-think this perverse risk assessment has become a sort of personal mission for me. One of my most rewarding experiences was setting up automated testing and deployment from day one for a new startup.
The word "control" probably has more to do with it. The option that makes management more important (big requirements documents, for example) is probably the one management will choose.
The head of engineering wants to be able to say what went wrong and how to fix it next time. That's probably fine, if the solution is better automation and failure detection. But once you go into postmorteming a situation where you dropped a few requests to a minor service, the whole process turns into a pile of shit where the engineers and operations team are scared, management feels in control, and the product managers are left screwed because engineering and operations no longer take risks and everything takes forever.
Engineering management doesn't stop and look at the cost of what they're doing and when software takes forever to release, they don't look in the mirror for the blame. I received a weekly technical email once containing info that 10 requests were "throttled" during a deployment. That's great! We don't need any "next steps", our failover worked fine.
Sounds like a horror story.
And to reiterate the last statement of the article: if you're doing something "clever" you're probably doing it wrong. Or in other words, if you don't know what the hell you are doing, then copy someone who probably knows what they are doing.
People tend to disagree on what should or shouldn't be able to do. Should a developer be able to release code to production servers? Should a developer be able to change the database schema? Should a developer be able to access the database tables through an ORM or ad-hoc SQL or must everything go through stored procedures?
A healthier way to look at process is that it's about facilitation, not control.
The alternative to fixed rules, is to discuss issues and tradeoffs, and trust developers to make good decisions based on knowledge.
I would add to that, "give individuals enough room to make mistakes that everyone, especially themselves, can learn from"
The presentation side was driven mostly by PHP on Lighttpd web servers, chosen at the time because the Etsy team felt Lighttpd was less common and less likely to be hacked.
At the end of the day, People solve problems, not technology. Someone with sufficient skill could probably rewrite the lions share of the environment in Perl, had they been given the task and enough resources.
It's all about your engineers, and the resources you give them!
I agree, this sounded like a technical solution to a social problem. The real issue sounded like the disconnect between the application folks and the db folks. If trashing one decent architecture for another was the catalyst they needed to bring around development culture change, it was a good move, but it hardly sounds like they had a technology problem.
I'm hoping that the postgresql projects for easier master-master replication get done soon!
It's based on Postgres 9.0 and Postgres-R.
Also, there is a talk "Total Availability With PostgreSQL & Multi Master Replication" on Friday October 21st at PgDay Denver: https://pgday.consistentstate.com/node/51
If you were doing a million increments per second it would take almost 600,000 years to overflow a 64 bit int.
And who's to say we started at zero.
The benefit of NoSQL's ease of dropping/adding schema...that's going to be one of the better ones.
The use of NoSQL is for very specific scenarios, don't make the mistake of thinking it's a good object store replacement. As a business guy this is definitely not a decision you should be making, let your team use what they're happy with, they'll get a lot more done that way.
"I just want to change this tiny thing, what could possibly go wrong?"
"Nein!" shouts the database. "It will violate this and that and the other rule you told me to enforce!"
"Stupid database, always getting in my way. I'm going to install MongoDB/Redis/Couch ..."
12 months later: "Hang on ... which version of the code were these 40,000 records entered under? Does that integer field refer to the protocol 2 or protocol 3 headers? Why is there some XML in the email field?"