Of the three startups I've worked with, two of the three were ridiculously over-engineered monstrosities that were way over time budget. It was clear that the CIO/CTO wanted to do cool fun stuff and not build a marketable product.
The other was cobbled together with completely shit code, was constantly breaking on releases, and was glued together with perl scripts. They're now publicly traded.
From my own experience, developers, particularly inexperienced developers, when learning something new have an insatiable need to implement their new found knowledge .. no matter if it's a good fit or not for the problem at hand (myself included .. I remember abusing the hell out of recursion for example).
One of the worst examples I've seen is the functional programming paradigm being crowbarred out of PHP .. why.
One of the worst examples I've seen is the functional programming paradigm being crowbarred out of PHP .. why.
Most likely related to the Innter-platform effect: ...the tendency of software architects to create a system so customizable as to become a replica, and often a poor replica, of the software development platform they are using.
That example is full of state. There's nothing to prevent you from calling any of those setX methods at a later point in time and changing the object's state. That interface very well could avoid state, but Swiftmailer doesn't seem to be implemented in that way. 
The stateless way would be to return an entirely new Swift_Message instance with each setX method. In languages that aren't built with immutability in mind, like PHP and Java, you end up instantiating and throwing away a lot of objects. Sometimes it doesn't matter, but when it does you use a mutable Builder to create the immutable instances.
Many devs tend to over-engineer things whether they are at a startup or not simply because they like building things, that's why they are software developers. I don't think this phenomenon is much more complicated than that.
'Over-architected' and 'over-optimized' are the terms I prefer. Whatever term(s) is(are) used, criticism should emphasize 'non-pragmatically-built', instead of emphasizing 'startups don't need any design, engineering, or optimization, whatsoever'.
Sorry, but I think this is really bad advice, it goes about it from the wrong direction. Saying "monolithic" or "services" like one is good and one is bad, or one is complicated and one is simple, is kind of silly.
For example... which is simpler, writing your own search indexing tool in ruby on rails, or installing solr as a service? MySQL is also service, for some reason people tend to forget that. Conversely, if your processes aren't yet resource hogs, why not just let them remain general purpose workers? If you are constantly fiddling with multiple services to make any changes to your app, then yes, you have probably made a bad choice somewhere. But a HAproxy/nginx/rails/memcache/mysql/solr stack is already six services, and not really so complicated to work with. When you write your own services, you should aspire to that level of simplicity.
At the end of the day, the shortest path will be wherever it will be. It's your job as a developer to weigh the pro's and cons on a case by case basis. The hard part is to test drive everything so that you can change it later, and constantly evaluate what choices each decision you make is removing from the table (painting yourself into a corner if you are not careful).
Another way of putting it: if you are picking your architecture before you begin, based on some kind of generalized principle, you are already over-engineering.
The solution is not to say "Services Suck" or "Monolithic yay!", it's to realise that you should start with a monolithic app built in a good framework, then see how your app's internals are used/accessed as you grow THEN split out to logical well designed services.
Or in other words: Premature optimisation is the root of all evil.
After reading the "What Happens" section of the OP's article I can see that he's made the classical mistake of making many things do one small thing, but they're not independent.
The message queue to notify component X of changes to data in Y is endemic of badly designed systems; if system X cares about changes to data in Y it should be designed that (at scale) it caches the data for a suitably short time, otherwise reads-through to the canonical source.
This is a common anti-pattern, and I've seen it built by smart teams at epic scale (millions of uniques per day) and it is still un-manageable.
Feature toggles, hard and soft-failing, together with a baked-in assumption that APIs are asynchronous (that is, unreliable) at as many levels as is feasible is a good architectural move. (And does not necessitate an abundance of architecture over feature code)
Loosely coupled components that expect their counterparts to respond slowly, or not at all are easy to implement and even easier to test. (HTTP, if one wants to use HTTP as the transport medium, learned this, and offers 201 and 202 for CREATED and ACCEPTED).
In my own projects (and I work mostly on near-realtime billing APIs) we bake this assumption (and others) into every transaction, as try to be restful, and transmit the state, and a URL which can be used to get the canonical representation of any resource, at any given moment, and objects in all parts of the system are stateful, relying on the handshakes (accepted/created) (404, 406, 409) to avoid race conditions and to make sure our systems can handle downtime of any component (internal or external)
As a result, we have lightning quick tests, we are very confident in the system's ability to perform, and we have read-through caches which respect the transport medium's headers.
I suspect the OP is right, many do over-engineer the startup, but remember many startups appear to have an abundance of developer potential, until they don't. (Usually through bad design, not over engineering, the two ought not to be confused)
Perhaps I am biased by having seen it happening in a large company, completely unscalable architecture, and at a point as many "architects" as they had developers, desperately trying to keep the wheels turning.
Scalability is about more than operations/second or any single metric, it's about making a service/system that can survive at scale, both technically and logistically. A server that goes 10x faster than it's competitor but cannot work with other servers to go to 100x is not a scalable solution, whether that reason is because of some low level networking limitation or a high level programming or administration limitation.
More semantics, but over-engineering is about more than making things complicated. If a tool doesn't do what it was intended to do, or if you've ignored the proper level of complexity, it's because it was badly engineered, not over-engineered. Good engineering will "over-engineer" things as much as possible within the constraints of the solution.
And if you keep that in mind, then you really can't over-engineer things or make them too scalable.
You're sorta kinda conflating two distinct axes of scalability: vertical and horizontal.
A system which is vertically scalable is one which will run faster (for a given definition of "faster") on chunkier hardware.
A system which is horizontally scalable in theory becomes faster by adding more independent hardware. In this age of spinning up anonymous VPSes by the handful that's an attractive quality.
However, horizontal scalability levies a very heavy architectural tax. Nobody has produced a convincing platform that successfully abstracts away the many, many moving parts and oversight that horizontal scaling requires in the same way that an operating system can abstract away a lot of the complexities of vertical scaling.
So what happens is that you spend less time thinking about the problem domain and more thinking about the solution domain.
I'm not talking about vertical or horizontal, I would put them both in the "technical" category of scaling. I meant to say that scalability is not something that can simply grow to meet a certain load, it is a concept that your system "can survive at scale" in whatever form that scale takes. It encompasses many things like having more servers, bigger servers, more people working on the servers, more users, longer sessions, more activity per user, supports more features, and so on.
It's a semantic point, but it seemed like the author had a very narrow idea of the terms he was using, and was complaining more about his own definitions than the concepts themselves.
And the user cares about all of them, indirectly. They all matter because failing at any one of them are reasons to use something else, whether it's simply slow performance or because your high performance system is so brittle you can't evolve, or downtimes because your high performance, quickly evolving system requires more admins than you can afford, and so on.
Since we're being pedantic, scalability doesn't mean "becomes faster".
Your definitions are generally right if you replace "becomes faster" with "handles more load". Something that scales is something that can handle additional load without slowing down as much as the next thing, or that has a prescriptive method for preventing such slowdowns (like adding more hardware or moving to a bigger server).
You're generally spot on but for that point though.
You can do it Microsoft way: build Windows as a GUI for DOS, capture many users, earn a lot of money, then pay best developers to develop Windows NT and merge it into a new OS. (user experience first, architecture later).
But you can do it the Apple way: make it good from inside out - including architecture, wait loooooooong looooooooong until users recognize all this, then get maaaany users and earn lot of money. (Hopefully you have survived until then.)
If you have luck, you can have both: good architecture inside and very good user experience...
But anyway I agree with the author, that many many solutions are over-engineered instead of just simple...
Here is my simpler version: function calls are the fastest RPCs. They are faster to run, faster to write, and faster to debug if something goes wrong.
By all means conceive of your application as a bunch of well-engineered little pieces. But for now, write them all as modules that you're making method calls to. Should you some day want to, you're free to replace one by a facade around a remote service. But for now it works and you can move on.
Yes, but making those module boundaries loosely coupled and sharing information only through their interfaces will go a LONG way towards enabling you to remote the call at a later date. The biggest mistakes that prevent remoting a module have to do with requiring context that is not part of the interface of the method call (or having a very chatty back-and-forth, but that is less common.)
This isn't so much the symptom of a distributed architecture as it is one of a badly designed distributed architecture.
Distributed architectures are the only way to go once you reach a certain size both in terms of scale and in terms of team size. You can certainly make do without it (Wikipedia) but you'll have a much more robust product with it (Netflix).
The trick is always using the design appropriate for the current needs. It's good to think ahead, but it mustn't come at the expense of the present.
At the beginning—which is the case for most startups, since few make it to the later stages—it's often a good idea to go with a monolithic codebase based on a lean framework. As you grow, you're going to want to start adding components like a message queue for async work, rethinking your data store for scale, etc. As you grow even further, you're going to want to transition to a distributed architecture. I don't know what comes next… I haven't gotten there yet. But I'm sure as you grow even further, your needs are going to change yet again.
I'm all for simplifying, but what exactly does he mean by "monolithic architecture" ? Not even sure I get his overall point, the rant seems to go in many different directions.
That phrase "monolithic architecture" makes me think of one-huge-Java-project and that's not exactly "simple" in my mind. Probably not what he meant though..
Generally, I guess separating things is more work and can actually lead to less flexibility when situations arise that you didn't plan for ("maybe bloggers should be able to advertise too"). OTOH, not separating can lead to entanglement, where you can't really change anything because other things depend on this or obvious and subtle ways.
This seems a bit straw man. "Distributed systems suck because Developer B has to wait for Developer A to add stuff to System A so System B can have a new feature". What you have there is a totally different problem that is unrelated to being distributed or not. Replace the word System with Module and imagine they are in the same codebase, still have the same problem. This post smacks, distributed system are hard and gave us a new set of problems that seem hard, runaway!
The article says you should merge all your databases into one, to avoid setting up a service-api-notification-message-queue mess.
While sharing a database lets you develop the initial system quickly, you'll have problems later on because you've made no distinction between your interface (which other people code against and you commit to not changing too often) and your internals (which you may want to refactor from time to time).
So either you make schema changes at will - in which case other developers do too, and you're spending all your time fixing that instead of developing new stuff - or you rarely make schema changes and do it with advanced warning and approval, in which case the pace of development slows to a crawl because other people are too busy to support the changes you want to make.
With well defined interfaces, only the interfaces have to evolve at a snail's pace; the internals can change as fast as you like, as long as you do it without breaking your interfaces.
Example: If you run a amazon style computerized warehouse and an amazon style shopping website; you want to know if an item is in stock. If the website just goes directly to the warehouse's database tables, the warehouse schema can't change without worrying about breaking the website. A nice simple how-many-in-stock web service would be a lot easier to maintain.
The thing is, prematurely defining interfaces is the worst kind of premature optimization; interfaces are a lot more permanent than any other part of your code. The most successful companies I've seen are those that define architecture as needed; start with everything sharing a database, then extract parts of that database into services /as you need to to scale up/. At that point you'll have a much better idea of what the use cases for those services are, and can define much better interfaces as a result.
Absolutely - I'm sure you've heard of the idea of technical debt [1,2] and sometimes it makes sense to build up technical debt to get a system off the ground, then worry about maintainability, documentation and whatnot in your copious free time later on.
Steve Yegge's Google+ rant argues that Jeff Bezos forcing Amazon to implement everything as internal services allowed a services platform to emerge, and that platforms lead to success (the “because Bezos is smart” argument is a bit weak, though):
It's hard to produce a counter point because it's unclear what the OPs point actually is. What does monolithic mean? I can tell you that I am working at a start up (albeit one on the cusp of being just a 'company') on a monolithic application right now, it isn't even that many LOC, and it is absolutely terrible to work with. We are working hard to split it into the services because it is absolutely impossible to develop for. Now, that is for a number of reasons, but the point is 'monolithic' doesn't actually solve anything, it means you'll be trading one set of problems for another. IME, wrapping things into services isn't that bad and you can always break the abstraction if you really need to and fix it as you go, whereas monolithic apps it's harder to realize you've broken an abstraction, for some value of 'monolithic'.
The counterpoint is reliability. If it's ok to have your whole service fall over when any part of it fails, go monolithic. This isn't snark, there genuinely are a lot of cases where this could make sense.
I'm surprised to see only one comment with the word "reliability" in it. I almost laughed when I saw examples from Google and Twitter. The article and the comments, sadly here as well as on the site, betray a shocking unfamiliarity with technical problems that really big systems face, user-facing or not. Making them distributed (which I guess is comparable to these "discrete services" the author mentions) does indeed have its problems, but... well ask Amazon and Google how much they regret making their distributed systems reliable. I bet there's some other path they would rather have traveled in order to simplify their architecture! Snarkiness aside, those are awful, misguided examples, even if the (I think) main point is true, that startups probably don't need to worry about such scaling issues yet.
Furthermore read onward for some examples from Zemanta itself, which afaik is a startup.
I could also get into the experience I had when using Google App Engine for my startup a few years ago. Horribly over-engineered architecture, nothing worked, had a whole bunch of trouble and everything went all manner of bad to worse very quickly.
But hey, we had awesome scalability! Until we got 200 users and everything started falling apart because of the overhead of keeping all the different parts of the system communicating.
PS: the "saving is taking too long" example is actually aimed at Buffer not Twitter or Google :)
This is bad advice for people who care. Granted, for businessmen it is most import to cash check, but even quick and dirty project slapped together will require rewrite eventually if it take off. And it most likely will be painful and have all kind of subtle bugs.