I bet you over engineered your startup

dasil003 · on Feb 13, 2013

The size of the strawman here is gargantuan. It is threatening to topple over all and destroy us all in an explosion of hay and burlap.

Google's architecture is not the same as a startups architecture. Just because there are a 1000 startups using 50 different database technologies means every startup uses them. Every startup developer is not blindly chasing every fad or new idea that exists. Basically the article is failing to see that the forest is made out of individual trees.

Premature optimization exists across any axis you can define, but when you need a service-oriented architecture you really fucking need it or your site is down, period.

jrockway · on Feb 13, 2013

This has to be the most substance-free article I've ever read. It invents some sort of strawman architecture for an unstated use case, and then claims that it could be difficult to add features to that imaginary architecture.

If the author had programmed before, he'd know that adding features is but one task of a programmer. Other tasks include testing the components of the application, maintaining the production environment, scaling, deployment, monitoring, and so on. If you decide on an architecture that makes testing and deployment easier, you'll probably end up with higher velocity than with an architecture that compromises testability (etc.) in favor of user-facing features.

To convince yourself of this, try to estimate how long the following tasks will take to complete: "The app server seems to randomly segfault about four times an hour. Fix this." and "Add a widget to the homepage to show the most popular queries."

The first one is probably hard to gauge: you might find it in 20 seconds, or it might take a month of instrumenting your app server and collecting data. The second task is more limited in scope and it will take about the same amount of time as other tasks of similar complexity took. If you know what tasks you can take on in the near term, you can better focus on doing work in order of business priority. (Startups have even less time than big companies and are able to reduce scope more aggressively, so this is a bigger win for startups than for big companies who you generally hear pushing this stuff.)

Now let's focus on the actual architecture the author proposes. He implies that it will reduce velocity if you isolate components from each other, because they will eventually need to communicate. This could be true, but he suggests impractical refactorings to make this possible.

Imagine you have some component that depends on the database, but eventually you decide that you need real-time notification of updates coming from a message queue. (Perhaps you've scaled from one application process to two, and the processes don't automatically know about the other's writes as they did when they were both the same process.)

You initially have an interface that looks like:

   interface Whatever {

       void writeData(Data record);
       void postUpdate(Data record);
       void registerCallback(Callback<Data> callback);
   }

And an implementation:

   class WhateverDatabase implements Whatever {

       Database(DatabaseHandle dbh, @DatabaseUsername String username, ...){ ... }
    
       void writeData(Data record) {
           ...
       }

       ...
   }

(You'll always have an interface for anything that's critical to your system so you can easily write fake implementations for unit and integration tests, so assuming the interface exists doesn't take much imagination.)

That you then use like:

   class MyFeature {

       private final Whatever whatever;

       MyFeature(Whatever whatever) {
           this.whatever = whatever;
           setupCallbacks(...); // ensure we are notified of changes
       }

       public void theUserDidSomethingCool() {
           whatever.writeData(whatTheUserDid);
           whatever.writeData(whatThisMeansForYourWeekend);
           whatever.postUpdate(transactionSummary);
       }

       private void theDatabaseWasUpdated(Data change) {
           alert("The user did something interesting!");
       }
   }

You're using dependency injection, so MyFeature never knows that Whatever is implemented with WhateverDatabase or that WhateverDatabase requires a DatabaseHandle and the database username. ("Whatever" isn't even a public part of MyFeature's interface.)

Now a million dollars comes your way to make MyFeature work on two machines at the same time. You decide on a message queue on top of the existing database:

    class DistributedWhatever implements Whatever {

        private final MessageQueue messageQueue;
        private final WhateverDatabase whateverDatabase;
 
        DistributedWhatever(MessageQueue messageQueue, WhateverDatabase whateverDatabase) ...

        ...
    }

Now you simply rewrite your injector to create a DistributedWhatever and pass that to MyFeature's constructor instead of the simpler WhateverDatabase. MyFeature doesn't need to know or care about the new architecture. Only the code that directly translates the architecture into terms your application understands needs to change.

(And, of course, your tests can write a fake DistributedWhatever to simulate whatever failure cases you can imagine, so when you deploy to production, you know that your code will mostly work even when things go really wrong.)

The point is: while it sounds complex to rewrite where your app's data comes from, it shouldn't be complicated beyond designing the data access layer interface and actually implementing it. Existing features won't care when you change your mind. Adding a message queue where you had a single process directly writing the database certainly changes the performance characteristics, but you made that decision because the ability to scale across machines and data centers was worth the extra latency and production environment wrangling.

One final thought. If you are complaining about how hard something is to change or use, it's not overengineered, it's underengineered. If every change you make is easy and you never tear your hair out when working on your app, then it's overengineered. And that's often a good thing.

mattgreenrocks · on Feb 13, 2013

There seems to be this persistent stigma that over-engineering is this terrible, terrible thing. 'Architect' has become a dirty word in a world where frameworks promise to solve all your problems for you, and if you're not writing the least amount of code possible, well, you better have a damn good reason for that! (Race to the bottom much?)

We have highly dynamic, expressive languages, and yet we don't want to think about coupling or interface width. But that's what the game is about!

ricardobeat · on Feb 13, 2013

Gist of the post: "we messed up by building three separate apps and had to merge them, so you should merge your stuff too, ok?".

jacques_chester · on Feb 13, 2013

Obligatory "Previous Discussion" link: http://news.ycombinator.com/item?id=4222841

nkohari · on Feb 13, 2013

Like most other architecture-level software problems, it comes down to baking the right layer cake. Small, focused applications at the top that integrate using a shared data access and communication infrastructure is the best of both worlds.

I've never seen a great reason to create multiple databases that support different apps, unless the applications' data access patterns are completely different and/or their data is completely unrelated (and will remain so). It's dangerous to not have a single source of truth, because read/write replication problems are a special kind of hell.

For example, at Adzerk, our delivery system (that actually renders the code that displays ads) is driven by a read-only, in-process, in-memory database because we need to maximize performance performance during ad requests. In contrast, our user interface uses a traditional RDBMS. But, the former is generated from the latter, meaning we have a single source of truth.

I think if you start writing a gossip API between your services, you're essentially re-solving database clustering. I'd rather leave that up to someone smarter than me.