Hacker News new | comments | show | ask | jobs | submit login
Cogs bad (williamedwardscoder.tumblr.com)
313 points by willvarfar on Feb 22, 2012 | hide | past | web | favorite | 76 comments

Maybe people succumb to the hype. Maybe they want the latest shiny running their own shiny new thing. I'm sure these things contribute greatly.

But I see another aspect to this whole "We Use Shiny Cogs" movement: high-level vs. low-level. As we make tools that abstract us away from the metal, we are able to spend less time thinking about the electrons flowing across silicon and more time thinking about building something John Q. Public will pay for.

We architect higher and higher abstractions for exactly this reason. And it comes with a price: at some point you stop running things as efficiently as possible and there's waste. If we were all studied CS students and could write kernels and compilers from scratch, we might spend five years building a very tight, efficient stack for Twitter that could run on a single box (maybe with a hot failover). But we're not. We're a collection of humans with differing levels of understanding and will power, many of whom just want to Get Shit Done and stop thinking about kernel task queues.

So lets turn his "rich man's problem" around a bit: you build your idea on top of a stack that you understand and keeps you happy, and when you bring in the capital (through revenue or investment - whatever) you put money into deeper, lower-level engineering. Until then, build your idea the way you know how.

There are many people, such as the OP probably, who take pride in the engineering aspect of what they do. It's my understanding that the term "hacker" as used on HN actually does not include this.

Also, not everybody is working on a startup. Most programmers actually work in the "put money into deeper, lower-level engineering" phase of software development, and it's important that companies have a stream of good engineers to hire.

"As we make tools that abstract us away from the metal, we are able to spend less time thinking about the electrons flowing across silicon and more time thinking about building something John Q. Public will pay for."

That makes it sound like there's a complete dichotomy here. It's not like the mailinator guys are doing assembly whereas the rest of the startup crowd is doing Prolog AI. Never mind that I doubt that most backend programmers employed are using the time saved on improving the sheer monetization output of the company (mileage obviously varies for single-person shops).

I do think you hit the nail on your head with your final paragraph. Quite often it's not that the people programming this choose to pick the cogs, quite often they don't know any other way. In which case this isn't really a decision. But it's a sad trend in my opinion. Remember the old saying "Nobody ever got fired for choosing IBM"? I think the level where this kind of thinking is applied has dropped from non-technical upper management to some actual builders. Smart business decision - maybe. But I do think that a lot of us here aren't just in it for the money.

The new people are not in it to perfect their engineering craftsmanship. Overall product craftsmanship maybe, but then half their attention is taken up by visual/UX design, they have less time for beautiful plumbing.

But at the same time engineer craftsmen can specialize in making better cogs. That would be the ideal, better cogs like Stripe, Heroku, Redis, DynamoDB, whatever. Someday most will stop thinking about those problems just as we stopped thinking about design patterns of making procedure calls in assemblers.

In your story "better cogs" sound like "more cowbell". It's still a fundamental decision which will not go away: do you solve a problem in process or do you communicate with anything external: Redis vs. HasMap. It's not comparable to levels in programming languages.

What about a programming language that integrates fancy cogs so they're conceptually in process? Do I want to know the difference between sorting in the native data-structure and Redis? Hopefully not, the language/compiler could make that call.

Programmer could specify how much data he's sorting, how fast he needs it sorted, how much data he'll get after each sort and the language will decide which cog to use for best performance. You don't need to know what Redis, Riak, or native data-structures are.

Hmmm ... aren't these cogs called libraries?

But the question was: can I afford to go through a tcp socket (and that's a long way, possibly even to another machine) or not. A language cannot cover this up.

Yes they're probably libraries, but in the language of the future that covers everything up you wouldn't know such distinctions.

Why can't a lib/lang cover it up? You tell it how much data will be stored and transferred. It will benchmark various approaches, having separate DB nodes or not, determine how many users each setup can support. The developer would need to be aware that 5000 users can fit into a $10 per month box. When he wants more users, he gets more boxes. If he wants more users per box he must cut expensive features.

Amazon DynamoDB does something along those lines by asking the dev to specify how much data to store and how many transfers to expect. The details of setting up nodes, how many nodes, how they're sharded, hashed, are hidden.

The famous "sufficiently smart compiler"?

It's not that smart for these cases. No AI required. Run the app with various cogs, benchmark each run, pick the cog than ran fastest. Compiling for cog picks will take more time. The programmer would be responsible for sending the right dummy data by linking to a data gen cog and writing tests.

Cogs come with lots of metadata for use cases, performance profiles, data sizes, etc. They don't need to be marketed, tried by hundreds of hopeful devs and become resume buzzwords. The programmer no longer needs to read HN and care about the shiniest new cogs. Mental overhead is greatly reduced.

its an interesting thought.

the knee-jerk worry is that optimising well enough for the every-day will make the site lead-balloon when you get a flash crowd i.e. hacker-newsed.

I'm trying to think how you could so dynamically migrate that this isn't seen....

Of course cheap available memristors would change it all .... http://williamedwardscoder.tumblr.com/post/17768181955/compu...

This article, and the Hickey talk to a greater extent, present a coherent but one-sided argument about elevating simplicity and understanding above human concerns in software engineering.

It's true that, as a programmer, you should strive for simple, "correct by inspection" code when possible. And the better a programmer you are, the more you will see and take opportunities to write a bit of code instead of roping in a third-party library, to use a small library instead of a big library, or use a library instead of another process, thus avoiding large swaths of complexity, the bane of software development. On the flip side, poor engineers may make large errors of judgment in this area.

However, a bias against powerful, off-the-shelf tools or a disdain for the "familiar" over the "objectively simpler" is no better. The line between a one-man (or few-man) project and a bigger project is where this really starts to matter. News flash: You can't get that "my code feels correct" feeling (the one that's supposed to substitute for a formal proof your entire system works) when other people are writing it. When putting together a team, using technologies that are "familiar" doesn't seem so intellectually lazy -- and many popular technologies are actually very understandable and well-engineered. Finally, I'm taken by end-to-end testing and Eric Ries's "immune system" metaphor as a way to ensure correctness of a complicated system in practice.

If you're making something big, you might have to put down the microscope. If you're making a tapestry, you need to have multiple threads entwined and stay cool.

Nicely put.

Do you think `new ConcurrentHashSplayTree<String,int>()` is slower and less space efficient than roping in a Redis server as the LRU?

Didn't think so. That was more my rant, though.

I have experienced this dilemma in a different domain: machine learning. You can either go the Hadoop way and commit to simpler algorithms that can be run in distributed manner or you can keep pushing the single machine more and more by use of ever more clever algorithms. Unfortunately, it is hardly ever possible to follow the route of push-single-machine-to-max -> thorw-more-machines-in. The distributed way and shared memory way of doing things often differ in fundamental ways.

Going with big data tool chains from the start is often a overkill for small experiments. But once you outgrow one machine, the pain of undoing all the nice (algorithmic) tricks is also quite severe.

Perhaps it is time to accept that we now produce data at a rate that distributed is going to be the way to process it. But this also means that some of the techniques available for scaling to larger data sets may need to be given up.

I'm hoping the http://graphlab.org/ approach may help bridge the gap a bit here...

Developers (myself included) worry too much about future unlikely pain and suffering, especially the degree of said suffering. Maybe you spend a few weeks here and there rewriting things. Big Deal.

Scalability and performance is complicated. Unless you KNOW your product will have a big splash, premature optimization will kill your productivity. And you will get it wrong. You will get the implementation wrong. You will optimize the wrong things and not really understand your bottlenecks. Especially if you've never scaled anything before and haven't been bitten twice by all the compromises you have to make.

Distributed systems are hard. Multi-threading is hard. Sharding is hard. CAP is a bitch. If you can scale vertically, do it. Avoid the demons of distributed work until you require them.

Most of the services/apps we build today would do just fine with setups like Mailinator or gasp ACID-compliant data stores.

"The first enemy of performance is multiple machines. The fewer machines that need to talk to perform a transaction, the quicker it is."

That is not strictly accurate. He's taking one aspect of performance - communication latency - and expanding that to be a universal truth of performance.

Pixar's render farms are good. Google data centers are good.

When you're CPU bound, more CPUs can make you faster. Note the specific use of the word "can," as in "sometimes."

(blog author) conceded

Listing all the exceptions that prove the rule wasn't quite the kind of programmer that I was ranting against

You might find that Gunther's "Universal Law of Computational Scalability" to be enlightening.

For most startups, embracing a cloud architecture just makes sense. You're building an MVP and want to get it out in front of users. Deploying on something like Heroku and using the add-ons is one of the best ways to focus on your product and not on your server. Then when you have a success, step back and evaluate your tech stack.

To me, the primary advantage of all this new-fangled web server technology is the improved developer experience, not the performance implications.

If you can put together an app on Heroku, you can configure a web app server. I don't really see how Herkou saves much work over AWS and Linode. It might take a few minutes or a few hours, but if you know how to SSH in and run "rails server" you've got a platform for your MVP.

You can then easily evolve your app server in whatever direction you need to.

Heroku gives you a load balancer and auto-scaling. Time not spent in having to configure and maintain that piece of infrastructure is time spent in working on the MVP.

Another benefit of writing code targeting platforms like Heroku is that when your app gains traction you can easily scale since the app would've been (forced to be) written keeping horizontal scaling in mind.

What are the odds that your MVP needs load balancing and auto-scaling? It's far more likely that your app could run in a single process on a single box for a very long time.

As always, it's a matter of thinking about that up-front.

At what point is this a project that can be considered self-sustaining? (i.e. pays for its own development). Can you reach that point with a single box?

And if you can, do you have some leeway for growth after that point so you can actually develop a more scalable solution?

For some projects, that results in being OK with a single proc/single box approach. For other projects, you realize that unless you acquire ${LARGE_USER_COUNT} users quickly, you might as well give up on it - so you have to build in some scalability.

There's also expected growth rate - if for whatever reason you think your project might easily grow from single box to "needs more than one" in a short amount of time, you better plan for that.

It's that whole "horses for courses" thing ;)

Devil's advocate: The app doesn't get traction in the first place because it's slow and lacking features (because you spent time scaling instead of coding features).

It is not an all or nothing proposition. Building an app keeping in mind scaling doesn't mean you go for complex initial setups. All you need to do is make some simple design decisions to build the app so that it can scale to multiple web front-ends if need be. IMHO - this is usually the most common first step in scaling.

Also building your app thinking about a bit of scaling upfront gives you more time to work on your app, because you don't even have to setup and configure your web servers - getting an app running in Heroku is the most painless way of app hosting that I've seen.

I might have misread the article, but does it talk about MVP's? To me it sounds more like Minimum Viable Developers which do the "why don't u just uze ..." whack-a-mole?

It's not about "embracing the cloud" what ever that means, but about being ignorant.

This article starts by saying that there is something badly wrong with modern programmers.

It then details and critiques the way a typical one-man, one-site 'startup' is using discrete 'cogs' to build his system, presumably whilst learning how to market, build customer relationships and develop a beautiful and compelling product that makes enough money to keep him afloat.

I think the author may be missing the point. An elegant and sustainable back-end does not directly correlate with an elegant and sustainable business.

I don't think so.

Is installing and configuring Redis, installing and configuring a client library, and integrating the usage of that client library with your system really a better use of a one-man startup developer's time than writing

  new HashMap
and going back and fixing it later if it matters?

I had a three hour argument the other night with a developer trying to implement some sort of complex logic involving caching for a fairly small blacklist file. I eventually had to make him go away and benchmark it ... at which point he realised what my original point was - loading that file off disk and parsing it on every request it was needed was actually faster than talking to a cache to get a pre-parsed version.

Overengineering is still overengineering, even when most of the components you use to do it are provided by somebody else.

I see your point, and if all you need is a HashMap then use a HashMap.

But if that data in the HashMap must be available to another process, or you need more "features", Redis suddenly looks like a very easy solution.

The point is that you don't need multiple processes and Redis most of the time. Your site can easily be served from one box and one process.

That depends on what you're creating and how. For better or worse[1], not everything follows the one-enormous-process Java way of doing things.

Btw, Redis is extremely fast to set up, but as I sai, if all you need is a HashMap..

[1] - Almost always for the better if you ask me. That Java "enterprisy" way of development is the worst thing I have ever experienced in computing ;)

Even multiple threads - any sort of concurrent access/modification on the same dataset becomes problematic if you just use a HashMap.

Sites that have any kind of download/upload component, even though does not have too much traffic, will need to serve multiple requests using threads/processes.

That is where we normally use relational database. However Redis presents a way to be faster than conventional databases by storing everything in memory. This is as close to a concurrency-safe HashMap that you can get with the least amount of effort.

But Java comes with several currency-safe hash-maps and trees and such - where have you been?

If you think that blocking TCP calls to a redis server that serialises everything is going to be faster than a `new HashSplayTree<String>()` you're absolutely bonkers.

"This is as close to a concurrency-safe HashMap that you can get with the least amount of effort."

No, it's more effort than using a concurrency-safe HashMap in a JVM process.

I think this analysis is not quite as cut-and-dried as the author thinks it is.

1. Latency. Yes, going out-of-process, even to localhost, is very expensive, and the person the author was responding to should realize that. On the other hand, synchronization is also very expensive. How do they compare? I have no idea[0], and I'm not about to guess. The author shouldn't either.

2. Concurrency. Paul Tyma specifically talks about a "synchronized LinkedHashMap" as the implementation of a cache, so I'm going to take him at his word, understanding it might be a simplification. A synchronized Map is a poor implementation for a cache, because reads will block writes when they don't need to. A better implementation would be a ReentrantReadWriteLock protecting an unsynchronized LinkedHashMap. Redis gives you this behaviour for free (even if you don't know of the existence of ReadWriteLock).

3. Memory usage. Let's be honest--Java is a pig for memory[2] compared to C++, and this is nowhere more apparent than indexing and caching the guaranteed 8-bit strings you'd find in an email. If your whole purpose is to fit more lines into your cache it's genuinely worth considering breaking out of the JVM to exploit the smaller memory footprint of C++ strings (and this really only holds for caches).

Was using a LinkedHashMap a good idea for Mailinator? Probably, I definitely don't have any evidence or suspicion to the contrary. Is it sensible to say "COGS BAD! IN-PROCESS GOOD" for every use-case? Not really.

[0] If I had to guess I would imagine that going out-of-process is 1-2 orders of magnitude slower than contended synchronization. Anyone got any figures?

[1] http://docs.oracle.com/javase/6/docs/api/java/util/concurren...

[2] I would be very surprised if Redis was not more than twice as memory-efficient for this task as a LinkedHashMap. I'm pretty sure that one could implement a C++ solution that would be 3-4x as efficient. This is really only of paramount concern in a caching context, but in that context, it's paramount, because the cost of missing the cache is so phenomenal.

Isn't a read-write lock only really useful if you have more readers than writers? I would not be surprised if the amount of incoming spam means mailinator has far more writing to do than reading.

Maybe! My point isn't that Mailinator made the wrong choice (in fact I said they probably made the right one)--it's that the decision for a situation we know nothing about is more complex than "Cogs bad".

Read-write locks also have involve more overhead, so the work that must be done while holding the lock had best be significantly more costly than what's required to take the lock itself.

I'm not aware that simply going out-of-proc solves any synchronization issues :)

You still contend with those. It might be by simply letting the OS serializing the requests, but going oop is always overhead you pay in addition to synchronization.

You're quite right, but paying the costs of synchronization at a hardware/C level rather than at a Java one defrays the cost of going out of process.

Java synchronization is _that_ bad? (Not trolling - I just spend most of my days in C/C++). Any good performance studies you could point me to?

If I were to replace the "synchronized LinkedHashMap", I'd look at Google Guava's in-memory cache: http://code.google.com/p/guava-libraries/wiki/CachesExplaine...

Here is a talk about it: http://www.infoq.com/presentations/Concurrent-Caching-at-Goo...

Paul is the author of these slides: http://mailinator.blogspot.com/2011/05/notes-from-programmin...

(Its in the blog I told you to go read)

So, think he understands how to do it right?

I think performance is far too often used as a reason to add cogs, others are far better. If you often replace parts, more cogs are great. We have a really low-yield setup at one of our clients that is nevertheless splitted: ane process imports media data from a huge number of content providers and is split into 3 parts - an importer that normalizes all data, a queue as a binding and a reencoding process. The reason why we did this is easy: the queue is running for two years straight now, the encoding process was deployed once last year (we changed our logging strategy) and the importer process is deployed around 4-5 times each week. Not having to bring the whole machine to a grinding halt on each of those occurences is a major benefit.

It's about daring to KISS. Only a select few dare to stand up against the hype.

> If you can use a local in-process data-store (sqllite, levelDB, BDB etc) you’ll be winning massively.

Hold on just a sec. SQLite? Isn't that essentially equivalent to saying "If you never have to concern yourself with database locking, you'll be winning massively?" How can we be talking about scalability and SQLite in the same article?

Because if it's in-process, all that means is that you have to ensure you have a single writer thread. There's nothing wrong with that design; it's how a MongoDB node works internally IIRC.

Granular locking is for the convenience of multiple pieces of code that don't know about each others' existence. Within a single process, it's not really that big a problem to ensure that all the relevant pieces of code do know about each other, and collaborate.

If an rdbms suits you, absolutely start with sqlite.

Code cost of migrating to something external when thats worth it? About a connect string...

Well no arguments there. So is your point that small sites should focus on small site performance and not try to optimize prematurely for scalability?

Yes, and furthermore its the complete ignorance of how technology works that could lead to someone imagining that Redis could be a faster LRU than a `new ConcurrentHashSplayTree<String>()` in a single-process Java server (or equiv).

Apart from all the subtlety of not throwing away evicted lines until the last mail that uses them is discarded and such.

How can people think that mailnator is slow because its not using web 2.0 sauce?

Maybe I'm being the kind of developer that the blogger was talking about, but would it be such a bad idea for a Mailinator like site to use Redis to store the email messages but hosted on the same box as the web server?

At least that way, if you started to hit memory limits it would be relatively simple to scale out to more machines by moving Redis to its own box. It would be a configuration change rather than having to re-architect your custom LRU cache.

Another benefit would be that you could get disk persistence for free while still staying fast. If Mailinator needs to reboot all the emails are lost. That wouldn't necessarily have to happen if he was using Redis.

Running Redis on the same server still adds overhead to the system, to store an object in a map that's on the Java heap doesn't require any inter-process communication or serialization. Plus it's another point of failure, what if the Redis process stops but the Java process is still running? If Redis is remote, what happens if the network goes down? Keeping everything within the JVM avoids having to think about those failure scenarios.

I don't know how Mailinator is coded, but I'd guess he hasn't written anything too custom for storing key/value pairs in memory, there are plenty of Java caching implementations, the key thing he has done is work out a way of reducing the duplication, which Redis won't provide out of the box.

Mailinator is a great example of having the minimal set of features required and no more. There's no logins, no setup steps and no guarantees. This allows Mailinator to run on such minimal hardware, adding disk persistence would add complication without making the product better, by definition the kind of emails you send to Mailinator aren't important, so if they're lost during the occasional restart it doesn't detract from the service.

Actually the best way would be to write a facade which would allow him to change between implementations -- so he could switch between his custom LRU, Redis and any mixture he he wrote later (say a custom version of memcache or memcacheDB).

As for the cogs argument, this is all I have to say: silicon is still cheap, and carbon is still more expensive.

I think it makes a lot more sense to focus on 1) what differentiates you, which may well involve lots of custom code, and 2) finding a market that works out for you.

For mailinator, they were already popular, so it makes sense to do some one-off coding to make things faster/more efficient. Perhaps, were mailinator starting up today though, using Redis as a good first step would have beaten whatever they had before, and would have been 'good enough' for longer.

Once you've got to the point where you're getting popular, then worry about making stuff scale up.

Scalablity is nice, but what about redundancy? Not every website can go down for 6 hours while a tech repairs a host in a data center.

There are also challenges with putting all components on a single host. It's simple, but the components are not all going to scale the same way. And depending on how the datastore is partitioned, you'll still make remote calls anyway.

I've seen at least as many outages caused by problems in the additional complexity implemented to avoid having a single point of failure as I've seen outages caused by having one.

Plus given something like drbd having a cold spare that's trivial to spin up isn't that hard to do (and has the nice advantage of being relatively data storage technology agnostic).

The not-so-nice disadvantage that your cold spare can't actually do any work (like serve read traffic) and that if your application itself corrupts data DRBD will dutifully mirror that corruption. Hopefully the spare can still perform well with cold caches, but I guess a slow site is considerably better than a dead one.

If your secondary is doing work, then you'll get a performance degradation from losing the primary anyway.

The difference here is that once the slave's warmed up you're back to full speed, whereas with a hot-spare-being-read-from the performance degradation lasts until you bring the other box back.

Any such corruption is effectively a buggy update - normal replication will propagate a buggy write just as happily, and even if it crashed the node entirely there's a good chance your application's retry logic will re-run the write against the slave in a moment.

First you fail over to another server, then you repair the original failed server; this takes repair out of the critical path. This generally requires no changes to an app as long as it's crash-safe. There's plenty of software to do this (e.g. Heartbeat, Red Hat Cluster) but because it doesn't work in the cloud people forgot about it.

If you have backups and repeatable deployments, it's not any harder to just spin up another host than it is to replace one of many points of failures in a decentralized architecture.

By the same token, I've inherited projects where the developer did not take scalability into account and avoided "cogs" for quick development turn-around. Putting the "cogs" in afterwards was incredibly painful, and a lot of development effort could have been saved if some thought was put into the architecture.

Quips about premature optimization often make the assumption that any optimization is premature. If data or usage growth (or uptime guarantees, for that matter) is an inevitability, then its often worthwhile to have at least some plan to grow your system beyond a single machine.

the trouble is that the Cogs Model (as it is put here) allows you to get shit done with lower skill levels. sure, it won't be perfect. and yes, if you had better programmers you could do it better. but these are pragmatic timse for most of us.

so he's right (in the abstract).

but for the reality most of us operate in he's wrong because Done is Better than Perfect.

that said... the "New is Good" thing needs to die. We're not freaking magpies people.

Is mailinator a profitable business, or a volunteer project accountable to no bottom line? Both are fine, but the difference may inform design choices.

How expensively does the developer value his time? Has he already made in investment in learning certain technology?

You can get these answers by reading the mailinator blog as the article encourages you to ;)

I love posts like this because, love it or hate it, it gives me a checkmark of thoughts to compare my choices to.

I see three possible approaches, all with their advantages and disadvantages. (Of course people may fit between them with a mixture of attributes.)

1. Monolithic - Build it all yourself, purpose built and high performance. This is why mailinator and plenty-of-fish are able to produce high thruput on a surprisingly small number of machines.

2. Confederated - Completely distributed. Each machine is its own monolithic platform with everything from DNS to database, including web server on that node, but a cluster of nodes gives you scalability, and workload is distributed across the cluster. (I'm not aware of any examples of this, which is why I'm building Nirvana.)

3. COGS: You build your cluster of machines by architecting a system whereby you minimize (but not eliminate) single points of failure. You have N web server machines and X database machines and you seek out really high performance open source cogs to keep the number of machines low (e.g.: Redis, MongoDB, etc.)

The COGS approach is often taken with the idea that we need something really fast. MongoDB being fast (and "SQL") are the reasons its often chosen. Redis being fast is given as a key advantage (which is relevant for an in memory database, sure.) Node.js is often chosen for similar reasons.

But the ends result of the COGS approach is a brittle architecture. You may have multiple redundant web servers but the thing that distributes loads is a SPF. More specifically the architecture is complicated- each machine has a different configuration, etc.

With Monolithic, you get performance, and save hosting costs, and you can probably scale pretty well because you know your system really well, and you've squeezed out a lot of the inefficiencies that come from being generic (in the cogs approach) such that you can interoperate.

What I think we should see more of is confederated- no machine is a unique snowflake. Every machine is identical to every other machine. This way configuration becomes dead simple-- just replicate your model node, bring it up and data and load starts going to it.

This can be done with cogs- but they have to be fully distributed cogs. An example is Riak (Which hit 1.1 yesterday) which is open source and written in erlang and probably loses to nodeDB in every single single node benchmark you can come up with (not that the Basho people have designed it to be slow, quite the contrary.) But where's the fully distributed web platform for such an architecture? (If you have an answer, please make this question non-rhetorical. I'm putting a lot of time into building one because I couldn't find one.)

An interesting thing about the confederated approach is, because each machine is identical, it could be built in a monolithic fashion. Thus super optimized for its purpose. I'm using a sorta cogs approach because there are many good erlang cogs to use in my project.

But I think the big mistake is to focus on single node performance these days. Servers are relatively cheap, and you need more than one anyway for redundancy, so might as well have a cluster and no single points of failure.

> 2. Confederated - Completely distributed. Each machine is its own monolithic platform with everything from DNS to database, including web server on that node, but a cluster of nodes gives you scalability, and workload is distributed across the cluster. (I'm not aware of any examples of this, which is why I'm building Nirvana.)

It looks like Tumblr is doing something like what you're describing, at least for their dashboard. [1]

If you go down to "Cell Design for Dashboard Inbox", it seems like the architecture is to have one (maybe a cluster) of machines function as an independent platform and map users into individual "cells". Their "Dashboard Inbox" is the true meat and potatoes of their product.

[1]: http://highscalability.com/blog/2012/2/13/tumblr-architectur...

A fairly good example of the confederated system running website hosting using CPanel. By default every machine hosts its own mail servers, web servers, databases, ......

Surprisingly (or perhaps not) its actually very effective for cheap hosting. Machines can handle quite a lot and rather than having many points of failure and failovers, you have one, and only one, single point of failure for the system (excl network connectivity/power/etc.)

Every couple of years something goes horribly wrong, but it turns out most users can deal with (or rather -- have no choice and are willing to accept) a five-ten hour downtime every couple of years.

>2. Confederated - Completely distributed. Each machine is its own monolithic platform with everything from DNS to database, including web server on that node, but a cluster of nodes gives you scalability, and workload is distributed across the cluster. (I'm not aware of any examples of this, which is why I'm building Nirvana.)

A bit late to this conversation, but the Opa platform appears to be an attempt at this:


The front page talks about MongoDB integration, but when it was first announced it sounded like the DB was integrated into the framework. Maybe they changed it since, don't know. It's popped up on HN a few times last year in case you want to dig up those discussions.

I think the article was talking more about how not every server has to be a webserver. For example, linkedin's social graph is a specialized server that keeps the entire graph in local ram. There is value in building a specialpurpose server when scalability benefits by it.

For the life of me, I can't edit the above post. 5 attempts, all eaten by the HN monster.

When talking about benchmarks, I meant to say: "and probably loses to nodeDB in every single one-machine benchmark you can come up with (not that the Basho people have designed it to be slow, quite the contrary.)"

Finally the conclusion of the post:

Further, the only inter-machine communication should be at the database layer. (very little needed elsewhere) so as a result:

1. Confederation of nodes that talk to each other to keep the databases consistent. With RIAK this means that one update only affects three nodes (with n=3).

2. The whole layer cake, DNS to Database, including web server, application services, queues, etc, is present on every machine. No single point of failure (if you use DNS round robin to spread load, otherwise your load balancer is the SPF).

3. Further, this layer cake, while composed of erlang cogs is mostly monolithic, as far as the programmer is concerned. You write your code in javascript or coffeescript and push it into the database. Each node is like a google App Engine, and you write handlers, and those handlers are run concurrently across the system.

4. Want more performance? Add servers, or upgrade some of your servers. Node went down in the middle of the night? Go back to sleep. (Or write a simple client test you can run from your iPhone to make sure there wasn't some sort of cascading failure and the service is still up.)

5. Almost zero operations, zero cluster architecture engineering work. No worrying about EBS being terribly slow (go dedicated machines for these nodes, btw) or any of that hassles. The engineer just writes javascript handlers and worries about CSS and javascript for the browser, etc.

For me, that idea is nirvana. Nirvana may not be right for you... but confederation has to be the future.

The systems I work on are pretty sizable in scale, right now we are doing about 120 million calls a day on one of core clusters. I don't claim to be the end all authority on scaling, but I do have some observations based on what I've dealt with.

First off, that type of architecture is brittle to logic changes. If you have a completely static architecture that you don't need to change it may be fine, but deploying those changes to every machine in the cluster is problematic.

Second, not all components have the same underlying machine requirements. For instance, our nginx servers don't need much ram, but the HAProxy load balancers with nginx that terminate SSL need a lot of Ram and a good chunk of CPU. Hadoop works better with JBOD (Just a bunch of disks), whereas cassandra seems to work better with a raid 0 configuration. Certain layers like the Nginx through certain paths have real time requirements which means ultra low latency. Other things need to operate against massive data sets and compute answers in a few hours.

So, not every machine in your architecture can have all of the services required by every other part of your architecture. A lot of it depends on workload types and what the underlying requirements are for your system. There are many more reasons, but I'll leave it at those for now. Ultimately the post is right that a single machine can work very well, but it's also misguided in just dismissing the HN commenter. There are many other cases where distributed architectures are required. Guaranteeing robustness and performance in the face of service and machine failures is very difficult, and is essentially impossible on a single machine.

Which approach you apply depends, it depends on the unique situation and the unique constraints you have. Using a single model to solve all problems seems to be worse than using no model. Learn multiple models, learn how and when to apply each (yes - for those of you in the class, I'm taking the model thinking class :) ).

The kind of programmer I was ranting against is the kind who thinks redis LRU is faster than `new HashSplayTree<String>()`

Apart from missing the subtle bit about overriding eviction...

Yeah, I don't see a reason why the entire stack has to be on one machine. It simplifies configuration a little but there's a trade off in the granularity of resources allocation that comes with the tiered approach.

1st, web servers are are distributable by design. If the web app in question can not be distributed, then a reread of Fielding's dissertation is in order:


So if the web tier is designed correctly and is simply a state-less, data transformation engine then what's left is making your data tier be distributable and elastic.

Even better we could put an elastic web service layer between the frontend web tier and the data storage tier to provide an even better separation of presentation and logic.

In this set-up I believe we have the best of both worlds, the elasticity of the confederated design because each tier is designed to be fault tolerant and elastic with the resource allocation granularity we get with the tiered clusters.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact