
Cogs bad - willvarfar
http://williamedwardscoder.tumblr.com/post/18065079081/cogs-bad
======
delinka
Maybe people succumb to the hype. Maybe they want the latest shiny running
their own shiny new thing. I'm sure these things contribute greatly.

But I see another aspect to this whole "We Use Shiny Cogs" movement: high-
level vs. low-level. As we make tools that abstract us away from the metal, we
are able to spend less time thinking about the electrons flowing across
silicon and more time thinking about building something John Q. Public will
pay for.

We architect higher and higher abstractions for exactly this reason. And it
comes with a price: at some point you stop running things as efficiently as
possible and there's waste. If we were all studied CS students and could write
kernels and compilers from scratch, we might spend five years building a very
tight, efficient stack for Twitter that could run on a single box (maybe with
a hot failover). But we're not. We're a collection of humans with differing
levels of understanding and will power, many of whom just want to Get Shit
Done and stop thinking about kernel task queues.

So lets turn his "rich man's problem" around a bit: you build your idea on top
of a stack that you understand and keeps you happy, and when you bring in the
capital (through revenue or investment - whatever) you put money into deeper,
lower-level engineering. Until then, build your idea the way you know how.

~~~
mhd
_"As we make tools that abstract us away from the metal, we are able to spend
less time thinking about the electrons flowing across silicon and more time
thinking about building something John Q. Public will pay for."_

That makes it sound like there's a complete dichotomy here. It's not like the
mailinator guys are doing assembly whereas the rest of the startup crowd is
doing Prolog AI. Never mind that I doubt that most backend programmers
employed are using the time saved on improving the sheer monetization output
of the company (mileage obviously varies for single-person shops).

I do think you hit the nail on your head with your final paragraph. Quite
often it's not that the people programming this _choose_ to pick the cogs,
quite often they don't _know_ any other way. In which case this isn't really a
decision. But it's a sad trend in my opinion. Remember the old saying "Nobody
ever got fired for choosing IBM"? I think the level where this kind of
thinking is applied has dropped from non-technical upper management to some
actual builders. Smart business decision - maybe. But I do think that a lot of
us here aren't just in it for the money.

~~~
Detrus
The new people are not in it to perfect their engineering craftsmanship.
Overall product craftsmanship maybe, but then half their attention is taken up
by visual/UX design, they have less time for beautiful plumbing.

But at the same time engineer craftsmen can specialize in making better cogs.
That would be the ideal, better cogs like Stripe, Heroku, Redis, DynamoDB,
whatever. Someday most will stop thinking about those problems just as we
stopped thinking about design patterns of making procedure calls in
assemblers.

~~~
route66
In your story "better cogs" sound like "more cowbell". It's still a
fundamental decision which will not go away: do you solve a problem in process
or do you communicate with anything external: Redis vs. HasMap. It's not
comparable to levels in programming languages.

~~~
Detrus
What about a programming language that integrates fancy cogs so they're
conceptually in process? Do I want to know the difference between sorting in
the native data-structure and Redis? Hopefully not, the language/compiler
could make that call.

Programmer could specify how much data he's sorting, how fast he needs it
sorted, how much data he'll get after each sort and the language will decide
which cog to use for best performance. You don't need to know what Redis,
Riak, or native data-structures are.

~~~
route66
Hmmm ... aren't _these_ cogs called libraries?

But the question was: can I afford to go through a tcp socket (and that's a
long way, possibly even to another machine) or not. A language cannot cover
this up.

~~~
Detrus
Yes they're probably libraries, but in the language of the future that covers
everything up you wouldn't know such distinctions.

Why can't a lib/lang cover it up? You tell it how much data will be stored and
transferred. It will benchmark various approaches, having separate DB nodes or
not, determine how many users each setup can support. The developer would need
to be aware that 5000 users can fit into a $10 per month box. When he wants
more users, he gets more boxes. If he wants more users per box he must cut
expensive features.

Amazon DynamoDB does something along those lines by asking the dev to specify
how much data to store and how many transfers to expect. The details of
setting up nodes, how many nodes, how they're sharded, hashed, are hidden.

------
dgreensp
This article, and the Hickey talk to a greater extent, present a coherent but
one-sided argument about elevating simplicity and understanding above human
concerns in software engineering.

It's true that, as a programmer, you should strive for simple, "correct by
inspection" code when possible. And the better a programmer you are, the more
you will see and take opportunities to write a bit of code instead of roping
in a third-party library, to use a small library instead of a big library, or
use a library instead of another process, thus avoiding large swaths of
complexity, the bane of software development. On the flip side, poor engineers
may make large errors of judgment in this area.

However, a bias against powerful, off-the-shelf tools or a disdain for the
"familiar" over the "objectively simpler" is no better. The line between a
one-man (or few-man) project and a bigger project is where this really starts
to matter. News flash: You can't get that "my code feels correct" feeling (the
one that's supposed to substitute for a formal proof your entire system works)
when other people are writing it. When putting together a team, using
technologies that are "familiar" doesn't seem so intellectually lazy -- and
many popular technologies are actually very understandable and well-
engineered. Finally, I'm taken by end-to-end testing and Eric Ries's "immune
system" metaphor as a way to ensure correctness of a complicated system in
practice.

If you're making something big, you might have to put down the microscope. If
you're making a tapestry, you need to have multiple threads entwined and stay
cool.

~~~
willvarfar
Nicely put.

Do you think `new ConcurrentHashSplayTree<String,int>()` is slower and less
space efficient than roping in a Redis server as the LRU?

Didn't think so. That was more my rant, though.

------
abhaga
I have experienced this dilemma in a different domain: machine learning. You
can either go the Hadoop way and commit to simpler algorithms that can be run
in distributed manner or you can keep pushing the single machine more and more
by use of ever more clever algorithms. Unfortunately, it is hardly ever
possible to follow the route of push-single-machine-to-max -> thorw-more-
machines-in. The distributed way and shared memory way of doing things often
differ in fundamental ways.

Going with big data tool chains from the start is often a overkill for small
experiments. But once you outgrow one machine, the pain of undoing all the
nice (algorithmic) tricks is also quite severe.

Perhaps it is time to accept that we now produce data at a rate that
distributed is going to be the way to process it. But this also means that
some of the techniques available for scaling to larger data sets may need to
be given up.

~~~
mjw
I'm hoping the <http://graphlab.org/> approach may help bridge the gap a bit
here...

------
jconley
Developers (myself included) worry too much about future unlikely pain and
suffering, especially the degree of said suffering. Maybe you spend a few
weeks here and there rewriting things. Big Deal.

Scalability and performance is complicated. Unless you KNOW your product will
have a big splash, premature optimization will kill your productivity. And you
will get it wrong. You will get the implementation wrong. You will optimize
the wrong things and not really understand your bottlenecks. Especially if
you've never scaled anything before and haven't been bitten twice by all the
compromises you have to make.

Distributed systems are hard. Multi-threading is hard. Sharding is hard. CAP
is a bitch. If you can scale vertically, do it. Avoid the demons of
distributed work until you require them.

Most of the services/apps we build today would do just fine with setups like
Mailinator or _gasp_ ACID-compliant data stores.

------
VikingCoder
"The first enemy of performance is multiple machines. The fewer machines that
need to talk to perform a transaction, the quicker it is."

That is not strictly accurate. He's taking one aspect of performance -
communication latency - and expanding that to be a universal truth of
performance.

Pixar's render farms are good. Google data centers are good.

When you're CPU bound, more CPUs can make you faster. Note the specific use of
the word "can," as in "sometimes."

~~~
willvarfar
(blog author) conceded

Listing all the exceptions that prove the rule wasn't quite the kind of
programmer that I was ranting against

~~~
jacques_chester
You might find that Gunther's "Universal Law of Computational Scalability" to
be enlightening.

------
luigi
For most startups, embracing a cloud architecture just makes sense. You're
building an MVP and want to get it out in front of users. Deploying on
something like Heroku and using the add-ons is one of the best ways to focus
on your product and not on your server. Then when you have a success, step
back and evaluate your tech stack.

To me, the primary advantage of all this new-fangled web server technology is
the improved developer experience, not the performance implications.

~~~
jcromartie
If you can put together an app on Heroku, you can configure a web app server.
I don't really see how Herkou saves much work over AWS and Linode. It might
take a few minutes or a few hours, but if you know how to SSH in and run
"rails server" you've got a platform for your MVP.

You can then easily evolve your app server in whatever direction you need to.

~~~
jasim
Heroku gives you a load balancer and auto-scaling. Time not spent in having to
configure and maintain that piece of infrastructure is time spent in working
on the MVP.

Another benefit of writing code targeting platforms like Heroku is that when
your app gains traction you can easily scale since the app would've been
(forced to be) written keeping horizontal scaling in mind.

~~~
jcromartie
What are the odds that your MVP needs load balancing and auto-scaling? It's
far more likely that your app could run in a single process on a single box
for a very long time.

~~~
groby_b
As always, it's a matter of thinking about that up-front.

At what point is this a project that can be considered self-sustaining? (i.e.
pays for its own development). Can you reach that point with a single box?

And if you can, do you have some leeway for growth after that point so you can
actually develop a more scalable solution?

For some projects, that results in being OK with a single proc/single box
approach. For other projects, you realize that unless you acquire
${LARGE_USER_COUNT} users quickly, you might as well give up on it - so you
have to build in some scalability.

There's also expected growth rate - if for whatever reason you think your
project might easily grow from single box to "needs more than one" in a short
amount of time, you better plan for that.

It's that whole "horses for courses" thing ;)

------
DanI-S
This article starts by saying that there is something badly wrong with modern
programmers.

It then details and critiques the way a typical one-man, one-site 'startup' is
using discrete 'cogs' to build his system, presumably whilst learning how to
market, build customer relationships and develop a beautiful and compelling
product that makes enough money to keep him afloat.

I think the author may be missing the point. An elegant and sustainable back-
end does not directly correlate with an elegant and sustainable business.

~~~
mst
I don't think so.

Is installing and configuring Redis, installing and configuring a client
library, and integrating the usage of that client library with your system
really a better use of a one-man startup developer's time than writing

    
    
      new HashMap
    

and going back and fixing it later if it matters?

I had a three hour argument the other night with a developer trying to
implement some sort of complex logic involving caching for a fairly small
blacklist file. I eventually had to make him go away and benchmark it ... at
which point he realised what my original point was - loading that file off
disk and parsing it on every request it was needed was actually faster than
talking to a cache to get a pre-parsed version.

Overengineering is still overengineering, even when most of the components you
use to do it are provided by somebody else.

~~~
gizzlon
I see your point, and if all you need is a HashMap then use a HashMap.

But if that data in the HashMap must be available to another process, or you
need more "features", Redis suddenly looks like a very easy solution.

~~~
jshen
The point is that you don't need multiple processes and Redis most of the
time. Your site can easily be served from one box and one process.

~~~
jasim
Even multiple threads - any sort of concurrent access/modification on the same
dataset becomes problematic if you just use a HashMap.

Sites that have any kind of download/upload component, even though does not
have too much traffic, will need to serve multiple requests using
threads/processes.

That is where we normally use relational database. However Redis presents a
way to be faster than conventional databases by storing everything in memory.
This is as close to a concurrency-safe HashMap that you can get with the least
amount of effort.

~~~
willvarfar
But Java comes with several currency-safe hash-maps and trees and such - where
have you been?

If you think that blocking TCP calls to a redis server that serialises
everything is going to be faster than a `new HashSplayTree<String>()` you're
absolutely bonkers.

------
hythloday
I think this analysis is not quite as cut-and-dried as the author thinks it
is.

1\. Latency. Yes, going out-of-process, even to localhost, is very expensive,
and the person the author was responding to should realize that. On the other
hand, synchronization is _also_ very expensive. How do they compare? I have no
idea[0], and I'm not about to guess. The author shouldn't either.

2\. Concurrency. Paul Tyma specifically talks about a "synchronized
LinkedHashMap" as the implementation of a cache, so I'm going to take him at
his word, understanding it might be a simplification. A synchronized Map is a
poor implementation for a cache, because reads will block writes when they
don't need to. A better implementation would be a ReentrantReadWriteLock
protecting an _unsynchronized_ LinkedHashMap. Redis gives you this behaviour
for free (even if you don't know of the existence of ReadWriteLock).

3\. Memory usage. Let's be honest--Java is a _pig_ for memory[2] compared to
C++, and this is nowhere more apparent than indexing and caching the
guaranteed 8-bit strings you'd find in an email. If your whole purpose is to
fit more lines into your cache it's genuinely worth considering breaking out
of the JVM to exploit the smaller memory footprint of C++ strings (and this
really only holds for caches).

Was using a LinkedHashMap a good idea for Mailinator? Probably, I definitely
don't have any evidence or suspicion to the contrary. Is it sensible to say
"COGS BAD! IN-PROCESS GOOD" for every use-case? Not really.

[0] If I had to guess I would imagine that going out-of-process is 1-2 orders
of magnitude slower than contended synchronization. Anyone got any figures?

[1]
[http://docs.oracle.com/javase/6/docs/api/java/util/concurren...](http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/locks/ReentrantReadWriteLock.html)

[2] I would be very surprised if Redis was not more than twice as memory-
efficient _for this task_ as a LinkedHashMap. I'm pretty sure that one could
implement a C++ solution that would be 3-4x as efficient. This is really only
of paramount concern in a caching context, but in that context, it's
paramount, because the cost of missing the cache is so phenomenal.

~~~
eridius
Isn't a read-write lock only really useful if you have more readers than
writers? I would not be surprised if the amount of incoming spam means
mailinator has far more writing to do than reading.

~~~
hythloday
Maybe! My point isn't that Mailinator made the wrong choice (in fact I said
they probably made the right one)--it's that the decision for a situation we
know nothing about is more complex than "Cogs bad".

------
Argorak
I think performance is far too often used as a reason to add cogs, others are
far better. If you often replace parts, more cogs are great. We have a really
low-yield setup at one of our clients that is nevertheless splitted: ane
process imports media data from a huge number of content providers and is
split into 3 parts - an importer that normalizes all data, a queue as a
binding and a reencoding process. The reason why we did this is easy: the
queue is running for two years straight now, the encoding process was deployed
once last year (we changed our logging strategy) and the importer process is
deployed around 4-5 times each week. Not having to bring the whole machine to
a grinding halt on each of those occurences is a major benefit.

------
strictfp
It's about daring to KISS. Only a select few dare to stand up against the
hype.

------
mistercow
> If you can use a local in-process data-store (sqllite, levelDB, BDB etc)
> you’ll be winning massively.

Hold on just a sec. SQLite? Isn't that essentially equivalent to saying "If
you never have to concern yourself with database locking, you'll be winning
massively?" How can we be talking about scalability and SQLite in the same
article?

~~~
willvarfar
If an rdbms suits you, absolutely start with sqlite.

Code cost of migrating to something external when thats worth it? About a
connect string...

~~~
mistercow
Well no arguments there. So is your point that small sites should focus on
small site performance and not try to optimize prematurely for scalability?

~~~
willvarfar
Yes, and furthermore its the complete ignorance of how technology works that
could lead to someone imagining that Redis could be a faster LRU than a `new
ConcurrentHashSplayTree<String>()` in a single-process Java server (or equiv).

Apart from all the subtlety of not throwing away evicted lines until the last
mail that uses them is discarded and such.

How can people think that mailnator is slow because its not using web 2.0
sauce?

------
Simucal
Maybe I'm being the kind of developer that the blogger was talking about, but
would it be such a bad idea for a Mailinator like site to use Redis to store
the email messages but hosted on the same box as the web server?

At least that way, if you started to hit memory limits it would be relatively
simple to scale out to more machines by moving Redis to its own box. It would
be a configuration change rather than having to re-architect your custom LRU
cache.

Another benefit would be that you could get disk persistence for free while
still staying fast. If Mailinator needs to reboot all the emails are lost.
That wouldn't necessarily have to happen if he was using Redis.

~~~
cmarshall
Running Redis on the same server still adds overhead to the system, to store
an object in a map that's on the Java heap doesn't require any inter-process
communication or serialization. Plus it's another point of failure, what if
the Redis process stops but the Java process is still running? If Redis is
remote, what happens if the network goes down? Keeping everything within the
JVM avoids having to think about those failure scenarios.

I don't know how Mailinator is coded, but I'd guess he hasn't written anything
too custom for storing key/value pairs in memory, there are plenty of Java
caching implementations, the key thing he has done is work out a way of
reducing the duplication, which Redis won't provide out of the box.

Mailinator is a great example of having the minimal set of features required
and no more. There's no logins, no setup steps and no guarantees. This allows
Mailinator to run on such minimal hardware, adding disk persistence would add
complication without making the product better, by definition the kind of
emails you send to Mailinator aren't important, so if they're lost during the
occasional restart it doesn't detract from the service.

------
chaostheory
As for the cogs argument, this is all I have to say: silicon is still cheap,
and carbon is still more expensive.

------
davidw
I think it makes a lot more sense to focus on 1) what differentiates you,
which may well involve lots of custom code, and 2) finding a market that works
out for you.

For mailinator, they were already popular, so it makes sense to do some one-
off coding to make things faster/more efficient. Perhaps, were mailinator
starting up today though, using Redis as a good first step would have beaten
whatever they had before, and would have been 'good enough' for longer.

Once you've got to the point where you're getting popular, then worry about
making stuff scale up.

------
buckeyeCoder
Scalablity is nice, but what about redundancy? Not every website can go down
for 6 hours while a tech repairs a host in a data center.

There are also challenges with putting all components on a single host. It's
simple, but the components are not all going to scale the same way. And
depending on how the datastore is partitioned, you'll still make remote calls
anyway.

~~~
mst
I've seen at least as many outages caused by problems in the additional
complexity implemented to avoid having a single point of failure as I've seen
outages caused by having one.

Plus given something like drbd having a cold spare that's trivial to spin up
isn't that hard to do (and has the nice advantage of being relatively data
storage technology agnostic).

~~~
spudlyo
The not-so-nice disadvantage that your cold spare can't actually do any work
(like serve read traffic) and that if your application itself corrupts data
DRBD will dutifully mirror that corruption. Hopefully the spare can still
perform well with cold caches, but I guess a slow site is considerably better
than a dead one.

~~~
mst
If your secondary is doing work, then you'll get a performance degradation
from losing the primary anyway.

The difference here is that once the slave's warmed up you're back to full
speed, whereas with a hot-spare-being-read-from the performance degradation
lasts until you bring the other box back.

Any such corruption is effectively a buggy update - normal replication will
propagate a buggy write just as happily, and even if it crashed the node
entirely there's a good chance your application's retry logic will re-run the
write against the slave in a moment.

------
srdev
By the same token, I've inherited projects where the developer did not take
scalability into account and avoided "cogs" for quick development turn-around.
Putting the "cogs" in afterwards was incredibly painful, and a lot of
development effort could have been saved if some thought was put into the
architecture.

Quips about premature optimization often make the assumption that any
optimization is premature. If data or usage growth (or uptime guarantees, for
that matter) is an inevitability, then its often worthwhile to have at least
some plan to grow your system beyond a single machine.

------
swah
This reminded me of <http://teddziuba.com/2010/10/taco-bell-programming.html>

------
malachismith
the trouble is that the Cogs Model (as it is put here) allows you to get shit
done with lower skill levels. sure, it won't be perfect. and yes, if you had
better programmers you could do it better. but these are pragmatic timse for
most of us.

so he's right (in the abstract).

but for the reality most of us operate in he's wrong because Done is Better
than Perfect.

that said... the "New is Good" thing needs to die. We're not freaking magpies
people.

------
Drbble
Is mailinator a profitable business, or a volunteer project accountable to no
bottom line? Both are fine, but the difference may inform design choices.

How expensively does the developer value his time? Has he already made in
investment in learning certain technology?

~~~
willvarfar
You can get these answers by reading the mailinator blog as the article
encourages you to ;)

------
nirvana
I love posts like this because, love it or hate it, it gives me a checkmark of
thoughts to compare my choices to.

I see three possible approaches, all with their advantages and disadvantages.
(Of course people may fit between them with a mixture of attributes.)

1\. Monolithic - Build it all yourself, purpose built and high performance.
This is why mailinator and plenty-of-fish are able to produce high thruput on
a surprisingly small number of machines.

2\. Confederated - Completely distributed. Each machine is its own monolithic
platform with everything from DNS to database, including web server on that
node, but a cluster of nodes gives you scalability, and workload is
distributed across the cluster. (I'm not aware of any examples of this, which
is why I'm building Nirvana.)

3\. COGS: You build your cluster of machines by architecting a system whereby
you minimize (but not eliminate) single points of failure. You have N web
server machines and X database machines and you seek out really high
performance open source cogs to keep the number of machines low (e.g.: Redis,
MongoDB, etc.)

The COGS approach is often taken with the idea that we need something really
fast. MongoDB being fast (and "SQL") are the reasons its often chosen. Redis
being fast is given as a key advantage (which is relevant for an in memory
database, sure.) Node.js is often chosen for similar reasons.

But the ends result of the COGS approach is a brittle architecture. You may
have multiple redundant web servers but the thing that distributes loads is a
SPF. More specifically the architecture is complicated- each machine has a
different configuration, etc.

With Monolithic, you get performance, and save hosting costs, and you can
probably scale pretty well because you know your system really well, and
you've squeezed out a lot of the inefficiencies that come from being generic
(in the cogs approach) such that you can interoperate.

What I think we should see more of is confederated- no machine is a unique
snowflake. Every machine is identical to every other machine. This way
configuration becomes dead simple-- just replicate your model node, bring it
up and data and load starts going to it.

This can be done with cogs- but they have to be fully distributed cogs. An
example is Riak (Which hit 1.1 yesterday) which is open source and written in
erlang and probably loses to nodeDB in every single single node benchmark you
can come up with (not that the Basho people have designed it to be slow, quite
the contrary.) But where's the fully distributed web platform for such an
architecture? (If you have an answer, please make this question non-
rhetorical. I'm putting a lot of time into building one because I couldn't
find one.)

An interesting thing about the confederated approach is, because each machine
is identical, it could be built in a monolithic fashion. Thus super optimized
for its purpose. I'm using a sorta cogs approach because there are many good
erlang cogs to use in my project.

But I think the big mistake is to focus on single node performance these days.
Servers are relatively cheap, and you need more than one anyway for
redundancy, so might as well have a cluster and no single points of failure.

~~~
nirvana
For the life of me, I can't edit the above post. 5 attempts, all eaten by the
HN monster.

When talking about benchmarks, I meant to say: "and probably loses to nodeDB
in every single one-machine benchmark you can come up with (not that the Basho
people have designed it to be slow, quite the contrary.)"

Finally the conclusion of the post:

Further, the only inter-machine communication should be at the database layer.
(very little needed elsewhere) so as a result:

1\. Confederation of nodes that talk to each other to keep the databases
consistent. With RIAK this means that one update only affects three nodes
(with n=3).

2\. The whole layer cake, DNS to Database, including web server, application
services, queues, etc, is present on every machine. No single point of failure
(if you use DNS round robin to spread load, otherwise your load balancer is
the SPF).

3\. Further, this layer cake, while composed of erlang cogs is mostly
monolithic, as far as the programmer is concerned. You write your code in
javascript or coffeescript and push it into the database. Each node is like a
google App Engine, and you write handlers, and those handlers are run
concurrently across the system.

4\. Want more performance? Add servers, or upgrade some of your servers. Node
went down in the middle of the night? Go back to sleep. (Or write a simple
client test you can run from your iPhone to make sure there wasn't some sort
of cascading failure and the service is still up.)

5\. Almost zero operations, zero cluster architecture engineering work. No
worrying about EBS being terribly slow (go dedicated machines for these nodes,
btw) or any of that hassles. The engineer just writes javascript handlers and
worries about CSS and javascript for the browser, etc.

For me, that idea is nirvana. Nirvana may not be right for you... but
confederation has to be the future.

~~~
NyxWulf
The systems I work on are pretty sizable in scale, right now we are doing
about 120 million calls a day on one of core clusters. I don't claim to be the
end all authority on scaling, but I do have some observations based on what
I've dealt with.

First off, that type of architecture is brittle to logic changes. If you have
a completely static architecture that you don't need to change it may be fine,
but deploying those changes to every machine in the cluster is problematic.

Second, not all components have the same underlying machine requirements. For
instance, our nginx servers don't need much ram, but the HAProxy load
balancers with nginx that terminate SSL need a lot of Ram and a good chunk of
CPU. Hadoop works better with JBOD (Just a bunch of disks), whereas cassandra
seems to work better with a raid 0 configuration. Certain layers like the
Nginx through certain paths have real time requirements which means ultra low
latency. Other things need to operate against massive data sets and compute
answers in a few hours.

So, not every machine in your architecture can have all of the services
required by every other part of your architecture. A lot of it depends on
workload types and what the underlying requirements are for your system. There
are many more reasons, but I'll leave it at those for now. Ultimately the post
is right that a single machine can work very well, but it's also misguided in
just dismissing the HN commenter. There are many other cases where distributed
architectures are required. Guaranteeing robustness and performance in the
face of service and machine failures is very difficult, and is essentially
impossible on a single machine.

Which approach you apply depends, it depends on the unique situation and the
unique constraints you have. Using a single model to solve all problems seems
to be worse than using no model. Learn multiple models, learn how and when to
apply each (yes - for those of you in the class, I'm taking the model thinking
class :) ).

~~~
willvarfar
The kind of programmer I was ranting against is the kind who thinks redis LRU
is faster than `new HashSplayTree<String>()`

Apart from missing the subtle bit about overriding eviction...

