
You Are Not Google (2017) - gerbilly
https://blog.bradfieldcs.com/you-are-not-google-84912cf44afb
======
013a
The big issue I think we miss when people say "why are you using Dynamo, just
use SQL" or "why are you using hadoop, a bash shell would be faster" or "why
are you using containers and kubernetes, just host a raspberry pi in your
closet":

The former examples are all managed! That's amazing for scaling teams.

(SQL can be managed with, say, RDS. Sure. But it's not the same _level_ of
managed as Dynamo (or Firebase or something like that). It still requires
maintenance and tuning and upkeep. Maybe that's fine for you (remember: the
point of this article was to tell you to THINK, not to just ignore any big
tech products that come out). But don't discount the advantage of true
serverless.)

My goal is to be totally unable to SSH into everything that powers my app. I'm
not saying that I want a stack where I don't have to. I'm saying that I
literally cannot, even if I wanted to real bad. That's why serverless is the
future; not because of the massive scale it enables, but because fuck
maintenance, fuck operations, fuck worrying about buffer overflow bugs in
OpenSSL, I'll pay Amazon $N/month to do that for me, all that matters is the
product.

~~~
beering
For anyone who's not used these "managed" services before, I want to add that
it's still a fuck ton of work. The work shifts from "keeping X server running"
to "how do I begin to configure and tune this service". You _will_ run into
performance issues, config gotchas, voodoo tuning, and maintenance concerns
with any of AWS's managed databases or k8s.

> I'll pay Amazon $N/month to do that for me

Until you pay Amazon $N/month to provide the service, and then another
$M/month to a human to manage it for you.

~~~
rlander
Exactly. There's no silver bullet, only trade offs.

In this case you're only shifting the complexity from "maintaining" to
"orchestrating". "Maintaining" means you build (in a semi-automated way) once
and most of your work is spent keeping the services running. In the latter,
you spend most of your time building the "orchestration" and little time
maintaining.

If your product is still small, it makes sense to keep most of your
infrastructure in "maintaining" since the number of services is small. As the
product grows (and your company starts hiring ops people), you can slowly
migrate to "orchestrating".

~~~
adamlett
_There 's no silver bullet, only trade offs._

I see this a lot and it bugs me, because it implies that it's all zero sum and
there's nothing that's ever unconditionally better or worse than anything
else. But that's clearly ridiculous. A hammer is unconditionally better than a
rock for putting nails in things. The beauty of progress is that it enables us
to have our cake and eat it too. There is no law that necessitates a dichotomy
between powerful and easy.

~~~
placebo
> _it implies that it 's all zero sum and there's nothing that's ever
> unconditionally better or worse than anything else_

Not necessarily - to start with, a good trade-off is unconditionally better
than a bad trade-off.

Also, progress brings with it increasing complexity. Recognizing the best path
includes assessing many more parameters and is far more difficult than
deciding whether to use a hammer or a rock to nail things. The puzzling nature
of many people to over-complicate things instead of simplifying them makes the
challenge even more difficult.

By the way, the closest thing I've ever found to a silver bullet in software
development (and basically any endeavor) is "Keep it simple". While this is a
cliche already, it is still too often overlooked. I think this is because it
isn't related to the ability to be a master at manipulating code and logic,
but the ability to focus on what's really important - to know how to discard
the trendy for the practical, the adventurous methods to the focused and solid
- basically passionately applying occam's razor on every abstraction layer. If
this was more common, I think articles like "You are not Google" would be less
common.

~~~
mcguire
Ever wonder how the Go team seems to get stuff done more efficiently than
other groups? It's not Go, it's that they simplify (perhaps oversimplify).

------
chadash
> _As of 2016, Stack Exchange served 200 million requests per day, backed by
> just four SQL servers: a primary for Stack Overflow, a primary for
> everything else, and two replicas._

This was the most enlightening piece of the article for me. Their alexa rank
today is 48 (globally, 38 in U.S.), so whatever your site is, you are
_probably_ not dealing with as heavy a load as them. What techniques do you
have to employ to serve this many requests from a single database server?

~~~
wnevets
and they're using that very un-hip microsoft .net

~~~
sergiotapia
C# was great in 2008, then it got kind of enterprisey and closed - not it's
ridiculous and getting steam into all avenues of software engineering. It's
arguably the most powerful language in the world.

~~~
manigandham
How is it closed? It's now better than ever with the new cross-platform .NET
Core with all kinds of runtime and language advancements.

~~~
sergiotapia
s/not/now

------
ummonk
Good example for which I’d highlight a particular thing to note: The kinds of
systems that minimize failure probability at a large scale are often not the
kinds of systems that minimize failure at a small scale.

At a large scale (e.g. hundreds or especially millions of hardware nodes) the
most common faults will be due to individual nodes / services / whatever
failing, so you want a complex fault tolerant system to deal with those
faults.

At a small scale (e.g. stuff that can fit on one or several servers) the most
common faults are from the system itself, not from individual nodes. Here,
using a complex system will drive up the likelihood of failures, especially
when you don’t have a large team to manage the system.

~~~
quickthrower2
Our longest outage was our cloud provider (one of the big guns) turning off
our entire account for 10 hours due to suspicious activity.

~~~
pojzon
If im not mistaken, Azure went down recently in one region for a wooping 24
hours.

That's pretty scary for anyone having at least three nines SLA

~~~
YjSe2GMQ
Grandparent speaks about a different scenario. In a similar vein: imagine your
credit card blocks AWS payments, you don't notice, and then AWS payment
reminders land in your spam folder. Boom, services out.

~~~
eximius
Yea, sure, but you should also consider getting a different credit card AND
email provider.

Why would a credit card suddenly block charges from a well known company
you've been recurring paying for X months or Y years.

Why would my email block one of the biggest service providers on the planet?

~~~
Udik
I think the point is that bank payments, email spam filters, the risk of
getting marked as "suspicious activity" as in GP, these become risk factors
with a cloud infrastructure as much as, say, a power outage would be if you
were self hosting.

When you self host something and you have a power outage, that counts as an
interruption of service; when a major cloud provider suspends your account
because of some chain of mistakes on your or their part, this doesn't impact
their SLAs, because the _service_ is technically fine.

------
thaumaturgy
What this article, and the comments on it at the moment, are missing is that
developers choose many of these technologies because they are sexy and will
help the developer get their next job.

Which of the following two developers has the better chance of breaking six
figures next year:

"Used Hadoop, MapReduce, and GCP for fraud detection on..."

...or...

"Used MySQL and some straightforward statistics for fraud detection on..."

 _This_ is a big part of why all these things exist in places where they
shouldn't. As a dev that always goes for the simplest solution first and has
yet to break a hundred k at 40 ... I'm spending my evenings now trying to
figure out how to deploy the latest technologies where they're totally not
needed.

~~~
umanwizard
Maybe I’m just swimming against the tide, but I work at a “big N” company and
I am more impressed by “saved X dollars”, “made process Y faster” or “built
feature Z” than I am with a specific set of technologies.

I’ve interviewed a lot of incredibly bright people that didn’t know any
technologies more modern than C++.

~~~
rjknight
Part of the problem is that many software developers work in organisations
where that kind of information - dollars saved or made, time saved, steps
reduced - isn't shared. In a siloed org, the development team writes code to
satisfy bug reports or feature requests, which come from a business analyst or
product owner, who is the sole conduit to "the business". Why these things are
needed, their relative priorities, or their ultimate impact, are not regarded
as important concerns for the development team.

tldr: software developers often can't measure the impact of their code, so
they fall back on describing the technologies they use, which drives a
counter-productive desire to employ "sexy" technologies.

~~~
Udik
> Part of the problem is that many software developers work in organisations
> where that kind of information - dollars saved or made, time saved, steps
> reduced - isn't shared.

It's not even that they are not shared. It's that the importance of projects
is often measured in dollars spent, complexity, amount of people working on
them. Which means that often the inefficient teams and solutions are
considered more important- their managers and tech leads have more people
under them and they are more visible to the rest of the business.

------
kradroy
I have this silent debate with my engineers from time to time when one of them
gets an itch they feel the need to scratch with an industrial strength back-
scratcher. I usually go "lawyer mode" and ask them question after question to
justify their choices. They either forget their itch or realize that rubbing
themselves against a wall will fix it. I understand their desire to put en
vogue frameworks on their resume, but I can't have someone's flight of fancy
fucking up our tech stack.

~~~
threeseed
I actually think it's incredibly insulting when people assume that engineers
are choosing frameworks just for their resume. Most in fact are looking to
these tools e.g. React, Spark, Kafka etc because so many other engineers are
using them with success. And so they think they will equally have success.

But then they didn't have the context as to why those tools were chosen and so
often they aren't suited. But I've never met anyone in the last 20+ years with
thousands of engineers who was doing it for their resume. In fact the best
thing for your resume is for the project to be a success anyway.

~~~
AnIdiotOnTheNet
> I actually think it's incredibly insulting when people assume that engineers
> are choosing frameworks just for their resume.

Maybe, or maybe it's just realistic in a world where common advice is not to
stay at any company for more than 2 years.

~~~
threeseed
Mostly younger people are choosing to jump around jobs.

The common advice and wisdom is actually not to do this.

~~~
braythwayt
_The common advice and wisdom is actually not to do this._

This is a long thread unto itself, but in an environment where many companies
have zero loyalty to their employees and would rather hire more experienced
people than train and promote their existing employees who already know the
ins and outs of the company...

Job hopping is often the fastest way to more money and more crucially, more
responsibility which means more personal growth.

------
craigkerstiens
Could not agree more with this. Whether it is data warehousing, a maze of
micro-services, or machine learning for your basic crud app you should look
into whether you truly actually need it and whether it helps solve a real
problem you have. A basic stack with Rails/Django and Postgres can get you
quite far. This is often as much as most companies/startups ever need.

Also personally love the callout to Joe who while being a professor is
consistently practical on when approaches do or don't make sense and at what
scale they do.

~~~
threeseed
Not sure what reality people are living in.

But a basic stack can only be used for basic systems and for basic problems.
And there just isn't many of these going around anymore as they've all been
solved. Or more commonly nobody is interested (including from the business) in
delivering something basic. They want to innovate for their users.

~~~
worble
I'd argue the complete opposite: every company and their mother is online
these days, and nearly always they're a) doing something that's been done
before and b) are nowhere near the amount of throughput that would require
some kind of overcomplicated setup.

No, your company isn't special, and "innovating for your users" is probably
the worst thing you could do, compared to delivering a good product using
established practices that are tried and tested to deliver results.

------
setquk
Three worries for me:

1\. I know the risk of only being able to peer through the fence at the
distant piece of software that is running your business, unable to gain any
insight while your production application is limping badly and customers are
running away like water. On top of that, thrashing it out with Mr Clippy is
better than average support. If my business depends on it I want experts at
hand who have the tools and the access they need to do the job they do.

2\. The insane pace of serverless is entirely fad driven and lacks quality
engineering which is required for critical pieces of software. The tools are
universally poor quality, unstable, unreliable, poorly documented and built on
gaining mindshare, making IPO and selling conference seats. _Best practice_
never materialise as the rate of change does not allow an ecosystem to settle
and work the bugs out. The friction is absolutely terrible but no one speaks
of this in fear of their cloud project being labelled a failure. Every person
I have spoken to for months is hiding little pockets of pain under a banner of
success. Some people clearly will _never_ deliver and burn mountains of cash
hiding this.

3\. Once you enter an ecosystem, you are at the mercy of the ecosystem
entirely, be that a service provider or a tool. Portability is always
valuable. It has cost, scalability, redundancy and risk benefits far beyond
the short term gain of a single vendor decision. I'm currently laughing at an
AWS quote for a SQL Server instance with half the grunt, no redundancy, no
insight possible for only 2x the capex and opex combined of dedicated hardware
including staff. But can't move to Azure because everything is tied into SQS.

I can never be behind anything but IaaS myself. This is contrarian especially
in this arena but I will put my 25 years of experience on the line every time
and say that it is the right thing to do. IaaS is choice, flexibility, allows
you to gain deep insights and protects you from serfdom, fad technology and
pick and choose mature products rather than what the vendor sees fit.

This is just another rehash of buying a mainframe. It's just bigger and you
pay hourly to write COBOL.

~~~
johnwyles
1\. You have to pay for those but the thinking goes that if you are at the
complexity and scale of those problems you reach out to a TAM to sort them out
for you (ie. $$$).

2\. Give it time; this will evolve and it is still in its infancy. Like all
things it is buggy in the beginning but as adoption hockey-sticks so too will
the stability, documentation, etc.

3\. Portability is traded for breadth of services and depth is gained through
vendor lock-in and the one-size-fits-all package. Of your concerns I would say
this one will be around for a long time or at least until a "conversion kit"
is built to shoehorn all of your stuff into another provider allowing you to
jump ship or test the waters elsewhere.

I am a veteran like you (20 years) and while you choose IaaS because you like
the control most want to punt their problem to someone else and to pay for
that.

If our goal is to get from LAX to JFK we could fly our own plane, charting our
own course, looking up weather, doing engine checks, refueling, dealing with
air traffic control. and we'll get there and be in full control the whole way.
However most will pay to be shuttled in to a commercial airplane. Then of
course there are some that are willing to pay more to have their own personal
pilot get them there in a chartered aircraft where they are afforded a more
tailored experience.

There are some really great pilots you can hire out there to do it all for you
(or if you are in fact one yourself) but if the goal is to simply go from
point A to point B in as little time as possible we don't have time to find,
train, and rely on a single pilot or ourself to get there. We will take the
hit and pay others to get us there.

That is what all this serverless non-sense is about if you ask me. The
tradeoffs of simplicity and handing the busywork off to someone else is more
enticing than the control we have in the process. Also, isn't it nice to be
able to say, when you arrive late, that it was the airlines fault? :)

------
tetha
Yup, experiencing that at work quite a bit. People are currently arguing about
big data solutions for a data set of 300Gb or so, growth in the area of 100ish
Gb / year. I'm still arguing that this could be an in-memory dataset for
postgres, and we wouldn't need that with a good schema.

Or other people are wondering if we could replace all of our relational
databases with kafka. While complaining about inconsistent data sets. Well
let's talk about advantages of a good relational schema first.

Maybe I'm turning into a grumpy old admin/DBA... but mariadb or postgres used
well just solve so many problems.

~~~
freehunter
I once asked if Postgres would have any negative performance impact from
holding 1.8m records and once the laughing stopped and they realized I was
serious I got a good lesson on how robust Postgres can be.

~~~
kod
I once worked for a company that maxed out a 32 bit autoincrement primary key
on a mysql table. Relational databases can hold a lot more than people give
them credit for.

~~~
icedchai
Yep. I worked at a company that had to migrate to 64-bit IDs. This was with
MySQL 5.0 on servers with HDs, not SSD. Some of those "alter table" commands
took all night!

------
staticassertion
This all boils down to "make cost benefit decisions".

The problem is that maybe cost benefit decisions are _really hard_. To
understand the cost of map reduce, for example, I probably have to use it at
least once. High cost just to understand something. And I have to understand
the alternatives, so I'm trying those out too.

You know what's faster? Signaling. When large companies say "this approach
works for us" it's really cheap and easy to say "Well that's probably good
enough then".

Do you lose out on efficiencies by not fully understanding the problem? Duh.
But probably not as much as understanding an entire domain for every technical
decision you make.

The steps outlined, particularly 1,2,3,4, are really expensive for every
single technical solution. Reading multiple technical papers for a db is
probably more costly than just acquiring customers, hitting scaling limits for
your uninformed choice, and moving to a new db based on a somewhat more in
depth approach or hiring a consultant.

Let's assume that if you guessed a technical solution, the cost of 'guessing'
is 0, and the cost of picking the wrong solution is 'low'. Should I waste time
doing anything other than guessing? If the cost of 'do what google does' is 0,
or near 0, and better average chance of working, why not pick it?

Unless the stakes are high, putting that kind of investment into every
technical decision just seems needless. And they arguably have to be very high
to offset the cost, and I would state that determining those costs upfront
itself may be difficult and error prone.

For every mis-guess listed in that article, how many new customers did they
acquire due to making the right 'guess' based on signaling from other
companies? The companies clearly hadn't gone under from those bad decisions,
so would it really have been the right call to have invested in such deep
thinking about the problem early on?

Not sure I believe what I wrote entirely, just thinking out loud.

~~~
wahern
> Let's assume that if you guessed a technical solution, the cost of
> 'guessing' is 0, and the cost of picking the wrong solution is 'low'. Should
> I waste time doing anything other than guessing? If the cost of 'do what
> google does' is 0, or near 0, and better average chance of working, why not
> pick it?

That makes more sense if you s/cost/risk/. The answer to why you shouldn't
just do whatever Google is doing is because trying to copycat the leader is
the one thing that will guarantee your _failure_ in the market place.

Same with AWS. AWS Lambda was built to solve _existing_ problems. Everybody
will use it to solve the same problems it's designed for, but only a tiny
number of people will actually succeed in the marketplace because it just
becomes a numbers game. If you want to _win_ you have to play a different game
--look for problems where AWS Lambda is a poor fit.

If you're doing the same thing as everybody else you're by definition just
reinventing the wheel. Why bother?

~~~
staticassertion
I mean cost in terms of 'time to make a decision', without considering 'cost
of making the wrong decision', which is closer to risk.

> The answer to why you shouldn't just do whatever Google is doing is because
> trying to copycat the leader is the one thing that will guarantee your
> _failure_ in the market place.

The article seems to demonstrate the opposite - decisions following the
leader, and paying for it later, but surviving in the interim.

> If you're doing the same thing as everybody else you're by definition just
> reinventing the wheel. Why bother?

I don't really agree. When it comes to something like a DB or architecture,
that's probably not where you need to be inventing new solutions - products
are rarely sold based on their technical implementation, and more about the
results they drive.

If you were building a new Google search, and you used all of the same tech,
that's probably a bad idea. But if I'm building a product for a completely
isolated domain, who cares if it's solved using the same tools that Google
solves search?

Regarding lambda, there's likely tons of room for competitors even within a
domain to implement off of the same technology - again, it's just a tool, and
not likely to be the differentiator.

------
bartread
At some level this is one of the most "no shit, Sherlock" pieces I've ever
read. On the other hand you see it happening absolutely everywhere so it very
much needed to be said.

I don't know why it is but people seem much more interested in the tech than
in the value they can create. Which is odd. Why? Because it's the second part
where you get to exercise the most creativity and do the most interesting
work. How do you think LinkedIn arrived at Kafka, for example?

What'a possibly more baffling is it's not just the devs who buy into the
technology hype: often it's the so-called business types, and those in
management and leadership positions, advocating and egging them on.

~~~
threeseed
If everyone just focused on business value and ignored the technology aspects
then nothing would've been invented.

There would be no Internet, Web, VR, ML, AI etc and we would all be writing
assembler or using punch cards.

~~~
dwheeler
> If everyone just focused on business value and ignored the technology
> aspects then nothing would've been invented. > There would be no Internet,
> Web, VR, ML, AI etc and we would all be writing assembler or using punch
> cards.

The Internet, Web, ML, and AI at least were all originally invented using
_research_ funding, not business funding. The Internet (specifically the
TCP/IP suite) was funded by DARPA, the research arm of the US Department of
Defense. So you're right that some people need to focus on something other
than business value, but usually that is someone paid to do research, not paid
to implement a specific solution that needs to be widely deployed within 1-3
years.

But if you're picking a product to use for a particular task, you aren't doing
that kind of research. You are instead doing engineering. That is, you are
determining how to solve a particular problem using the knowledge and
resources (including reusable code) at hand. So you need to _think_ to
determine what collection of resources will actually do the job well, at
reasonable cost and time, etc. Different problem.

~~~
threeseed
All of those I listed are completely different now to when they were first
invented. To the point of being unrecognisable. And all of that is because of
real world innovation from real world engineers solving real world problems.

This anti-innovation mindset is just as damaging overall as everyone using the
latest tools and ignoring the more proven ones.

~~~
Jtsummers
How is this article anti-innovation? It's saying, "Don't adopt a solution that
doesn't actually fit your needs." It's not innovative to use MapReduce to do
payroll reports at the end of a quarter, when a simple SQL query can produce
the exact same thing. It's just silly. It's not innovative to select something
that overfits your needs unless it actually prepares you for your (ideally
known, but at least high confidence expectation) needs.

I shouldn't drop a bunch of money on an underutilized system unless it offers
enough benefit.

~~~
mrosett
> It's not innovative to use MapReduce to do payroll reports at the end of a
> quarter, when a simple SQL query can produce the exact same thing.

I really, really hope this is just a random illustrative example.

~~~
Jtsummers
Yes. That’s not a thing I’ve seen in my office or any office I’ve been in. It
was just an example.

------
opportune
I use managed services a lot. And fortunately my data size is big enough for
it to actually make sense to be using these big data tools.

The issue is the quirks. Every managed service has at least a dozen quirks
that you're not going to know about when you visit the flashy sales page on
the cloud provider's website. And for the vast majority of users, they're not
going to have access to the source code to understand how these quirks work on
the backend. So you end up in a situation where yes, it does take way less
time to get 95% of the functionality done, but getting that last 5% can still
take a considerable amount of work.

As an example, I am using Azure Event Hubs lately. It is supposed to provide
something like a simple consumer api like Kafka does, but with consumer group
load balancing across partitions. Awesome, there is a system that
automatically handles leasing across partitions in a way that abstracts this
all away from the client! Except, well actually the load balancing is
accomplished via "stealing leases" (meaning, they are not true leases) so if
you use the api you are meant to use, you will get double reads - potentially
very many if you want to commit reads after doing more-than-light processing
which can take time. So you need to use the much more poorly documented,
barebones low level api and probably still end up writing a bunch of logic to
dedupe.

Except, you use this kind of tool to begin with because you want to set up a
distributed consumer group to read from a stream... so now you have a non-
trivial engineering problem figuring out a way to get a distributed system to
manage deduping in a light-weight way across hundreds of processes and
machines...

~~~
indigo945
Are there any good "introduction to AWS" books or book series that actually
mention those problems and how to work around them? All that I've seen just
parrot the sales pitches, and it would be excellent to know about such
problems beforehand.

------
lloydde
This is great. It reminds of that Adam Drakes’ 2014 article “Command-line
Tools can be 235x Faster than your Hadoop Cluster” [1] was the wake up call I
needed. 1\. [https://adamdrake.com/command-line-tools-can-
be-235x-faster-...](https://adamdrake.com/command-line-tools-can-
be-235x-faster-than-your-hadoop-cluster.html)

~~~
saberience
That article totally misrepresents the normal use case for a Hadoop cluster
though. Hadoop clusters are meant for when you have multiple petabytes of data
and this network bandwidth becomes the bottleneck for doing batch processing
jobs.

Let me know when your command line tools run multiple large scale processing
jobs on petabyte datasets.

His article was him constructing a straw man about why people use Hadoop and
attacking that.

~~~
henryfjordan
I've worked with these "straw men" you say he constructed, they are absolutely
out there. There was a time in 2014 when Hadoop/MapReduce was the hammer and
every problem out there looked like a nail.

How many people have used Hadoop for a project? How many Petabyte+ datasets do
you think are out there?

Unless you truly believe the answer to those 2 questions are the same, I think
you can see why that article had to be written.

------
nostrademons
The only reason to use Hadoop is if you need a Shuffle phase, i.e. the
_intermediate data_ between Map and Reduce is too big to fit on one machine.
If you have big input but small append-only output, use a work-queue (SQS or
MySQL/PostGres will let you set this up in minutes), dump to files, and merge
them with gzcat | uniq | gzip > output.txt.gz. If you have big input but a
small volume of update-based processing (< ~1000 or so outputs/sec), have your
workers update an RDBMS. If you have small input but big output, build up
output file chunks and progressively write to S3 and delete them. If both your
input & output fit on disk, use UNIX pipelines and command-line tools. If they
fit in RAM, just load them with Pandas or equivalent and manipulate them in
your favorite interactive programming language.

------
lukev
This is really a very good article, and I've seen this same behavior manifest
on numerous client projects.

BUT. I have also grown a bit skeptical of the "Just use a RDBMS!" mantra. The
same advice applies. _Think about your use case._ Even if your data isn't
exoscale, a relational DB might not be the best choice.

My current project has data on the scale of hundreds of billions of rows.
Nothing crazy, easily handled by a good Postgres box (by the numbers.) And for
certain access patterns, it would be.

Unfortunately, it turns out that a lot of the analytics queries our business
users need end up being joins and aggregations involving full or nearly-full
table scans. PG is not particularly well optimized for this; it gets CPU and
disk-access bound. Queries take tens of minutes, causing timeout errors in our
BA tools.

Loading the data into a Presto (Facebook product) cluster instead dropped
query times for the same aggregation into the tens of seconds range. Sure, our
data doesn't even really count as "big" and Presto is probably overkill if
you're only looking at size, but it is optimized and built for highly
partitioned parallel aggregations.

~~~
DanFeldman
Exactly same case here - we're using Athena (packaged presto from AWS) and
have a Postgres instance w/ identical data in a different data model. Presto
is mindblowingly good at the low latency queries and scales well to anything
nontrivial. It seems opening files for reading is the biggest source of delay
in presto! Postgres is not handling the scale well, but it was a nice
experiment.

~~~
Erwin
Your Athena data set are immutable S3 blobs of data however, while Postgres
reads data that could be modified while it's reading them, concurrently.

Postgres has to deal with a transaction modifying 100,000 rows in a table,
needing that transaction to then see its changes but then aborting, leaving
the data exactly as it was before. Athena/Presto don't have to worry about any
transactions, table files bloated by changes that need vacuuming, etc. etc.

So I don't think it's fair to claim PG does not scale, when you are comparing
a transactional and analytical database.

~~~
lukev
That's rather the point though. The people saying "YAGNI, just use Postgres"
aren't, a priori, any more correct than people saying "You definitely need
Hadoop if you're gonna scale!"

You need to analyze our own use case critically, understand the tradeoffs of
different tools and choose pragmatically.

------
_bxg1
Programmers often make the fallacy of assuming that only the asymptotic case
should be considered (just look at Big-O notation). But when it comes to
specialized tools and frameworks, unlike sorting algorithms, many use-cases
never break past the scale where the asymptote is the primary thing driving
the curve.

With that in mind, it's my experience that every tool or framework has a
different window of scale in which it's ideal, which has both lower and upper
bounds, and one simply needs to find one's own project along that axis when
choosing a technology. Hadoop may be the best solution as N approaches
infinity - and we as programmers _love_ thinking in terms of infinity - but it
may not start being the best solution until 10x your actual range for N.

------
arduinomancer
The thing is, at least from my experience, the engineers working in startups
_know_ the solution they're implementing is overkill. Its just that everything
ends up getting built assuming the startup is going to take off and become the
next huge unicorn.

A lot of the time design meetings in a startup revolve around "Is this
scalable?" or "What happens when we have 10,000 users or 1 million users?".

I think the problem isn't choosing overkill technologies, the problem is
trying to solve a problem that doesn't exist yet and probably never will.

~~~
ska
That's just bad engineering practice unless you can articulate why there isn't
a transition path to more scalable tech when you actually need it. Usually in
practice there is, and certainly for most companies. You can probably point to
counterexamples, but to reuse the OP's term "you aren't them, either"

~~~
TheHegemon
Unfortunately I've definitely seen cases where scalability wasn't taken into
account and now it's impossible to bolt-on after the fact.

------
holoduke
I have participated in lots of startups. I have 3 successful companies myself.
My background is cs. And what I can tell is that every single service like aws
or GCS is absolutely insanely priced. Not affordable unless you have a new
Netflix. Even though it looks like you are saved from a lot of maintainance
you still have to deal with limitations you sometimes don't have a solution
for.i have seen companies switching to aws going from 400 dollars a month to
about 5-10k per month. Silly amounts for big organisations, but for startups
it does make a lot of difference

~~~
freyir
We're still in a boom period, where many start-ups/VCs are operating with deep
pocket books and instructions to scale fast at any cost. During the next bust,
lean will come back in fashion.

~~~
adventured
It's an interesting question, because Fed rates are going to be held
permanently low due to the US Government's debt / interest costs (and the next
~$18 trillion that will be stapled on over the coming ~15 years). Those Japan-
style permanently low rates will always press upwards on risk capital markets
like venture capital. It spurs capital to seek higher returns versus the weak
yields everywhere else (whether treasuries or other). Start-ups will become
more valuable persistently due to this effect, at least for the next 10-15
years prior to the end fallout once the US Government is formally drowning in
debt interest cost (which is when the really bad consequences kick in, like
full blown aggressive currency debasement to chop the debt down and stealth
default; everyone will be taking real haircuts then, anyone with assets in
dollars anyway).

I think it would take a very severe recession, in the 2009-2010 style, to
hammer venture capital down considerably. Instead I expect Japan stagnation to
continue to envelop the US economy, and for the exact same debt-laden reason.
Ever slower growth, low traditional inflation (higher real inflation from
currency debasement QE), debt taking up an ever larger share of capital
available for investment, stagnant productivity, stagnant wages. With the
enormous amount of wealth in the US, trillions of dollars will always be
looking at the venture capital market in a given decade. ~2012-2035 will
probably be the best years for venture capital in US tech history, and broadly
the best context for start-ups, that we'll ever see. It's early in the loose
money sloshing period from perma low rates, but not yet late such that you're
eating an always-on QE real-value debasement hammer constantly (which sucks
down your real value creation, as you run uphill against the Fed while it
tries to debase government debt to keep the US Govt solvent).

------
flexer2
In principle I agree with the author, but some of the patterns these big
companies have introduced are valuable for companies with significantly lower
amounts of traffic. For instance, in the article they reference Kafka. In one
of our products, we use Kinesis, which has similar semantics, for data that is
no more than 25k records per day. However, we find it useful because it
enables us to have multiple consumers that operate independently, plus using
Kinesis Firehose to automatically archive those records off to S3. We just use
a single shard, which is more than enough throughput for us. We don't have any
plans to scale to hundreds of shards, but find what it provides to be very
useful in separating what each process does, and makes the code much simpler.
And if we ever did need to scale, it wouldn't be much work to do so.

------
m0zg
Been saying this for years. People build these ridiculous contraptions:
Spark/Hadoop, key value stores, NoSQL, microservices, distributed filesystems
all over the place, fault tolerance and so on. Then you ask "how much data and
traffic do you have?" Nearly always you get an answer where a single machine
and properly designed software would be more than enough.

I figure, in a way we should be thankful for this cluelesness. If people had
any clue whatsoever, IT employment would shrink by a factor of at least 3, and
salaries would drop pretty massively as well.

------
cs02rm0
UNPHAT - these are great, but I'd add one more - ask yourself if you're an
engineer before making a technology choice.

The number of projects I've had where the team has had one arm tied behind
their back because they "have" to use Hadoop or NiFi or Lambdas or something
else a manager is determined is the thing everyone's using is just lunacy.
They're all tools which have their place, but you really have to know them to
know when to use them. And as importantly when not to.

I mostly do consulting gigs where projects are 6 weeks to 6 months long and
it's been years since I've seen that not hamper a team.

------
tombert
While I largely agree with the thesis of this article, I actually really like
the entire "scalable microservice" way of designing things.

It's overkill for anything I need to do, but my home server is six 8-core
ODroid devices, orchestrated with Docker Swarm, with most of my services on
there being load-balanced, and glued together with Kafka if they need to talk
to each other.

Do I need my internal video streaming server to be able to scale horizontally?
Of course I don't, there's only ever three people max watching things from
there at any point in time. However, I find that, overall, my brain thinks
more-or-less in terms of these microservices, and it doesn't really hurt much
to do it that way.

If I find it pleasant enough to do, why the hell not make it hyper-scalable
and able to reach Google levels?

EDIT: Just a note, I know that Docker Swarm probably isn't quite capable of
Google-size. Still, moving to Kubernetes or something wouldn't be terribly
hard (the reason I didn't use it was because a few years ago I had some issues
with Kubernetes on ARM and Swarm worked outta the box).

~~~
sz4kerto
There are Swarm clusters in production with tens of thousands of hosts.
There's nothing in Swarm that makes that impossible.

~~~
tombert
Fair enough; I didn’t know that Swarm was used for any large projects because
apparently I don’t know how to read or use search engines; I stand corrected!

I guess that backs up my original point even more then; my stupid video
streaming server might actually be able to scale to Google size some day :)

------
karmakaze
There was no section on "You are not Netflix", so I guess microservices are
OK.

Just thought of a naming convention, how many types of microservices do you
run:

    
    
      10? deciservices, 100? centiservices, 1000? milliservices.

~~~
Jtsummers
Other direction, deca-, hecto-, and kiloservices.

[https://en.wikipedia.org/wiki/Metric_prefix](https://en.wikipedia.org/wiki/Metric_prefix)

~~~
jake-low
I think OP's point was that if your company provides a service, and runs 10
containers to do it, then each container provides 1/10th of the total
functionality of the service and is therefore a "deciservice". The joke being
that you then can't say you use "microservices" until you've got 10^6 of them.

~~~
karmakaze
I meant division of functionality rather than horizontal scaling. For
instance, all of Netflix consumer facing features making up the 'Netflix app'
is a full 1.0. If we subdivide the functionality into pieces the number of
functional pieces determines the fraction of the whole application it
provides.

But.. if you have 10^6 total instances I don't think anyone would object to
you calling them micro.

------
blunte
I can assure the author that those of us solving problems for small companies
(in terms of employee count, not revenue or assets under management) do not
jump on the typical bandwagons. We tend to be very critical and pragmatic.

I'm not even sure it is engineers at bigger companies who choose to jump on
these bandwagons. I suspect it is often wannabe-technical managers who read
that MegacorpX is using tech Y that sells the idea to upper management, along
with some unfounded beneficial promises, that causes some of the trends we
see.

------
cle
Do you know how many engineers work at Google and Amazon? A lot. Chances are
that many readers of this article actually are Google and Amazon.

My point is: these generalizations in either direction are not helpful. Many
people are operating at Google scale, and many people aren't. Be aware of that
when you read about potential technologies.

~~~
BoorishBears
If you're Google, you know.

So if you're the audience of this article, you're not Google.

~~~
kradroy
I'd dare to say that many groups/products at Google aren't even operating at
Google scale. I'd also dare to say that some Google products operating at
Google scale can be a bad thing (i.e. unprofitable). The latter statement, or
both statements, might be supported by the number of products they kill off.

------
rocketraman
I find the the amount of traffic and comments articles with this message
generate quite amusing, given the article can be boiled down to: don't use
tools like Cassandra, Kafka, etc. _until_ you've thought through whether they
are the right tool for your use case. That last part is often forgotten --
these tools may not always be the right tool for the job, but SOMETIMES they
are, regardless of your scale.

Well, duh.

The corollary is: if you're using one of these tools, and you HAVE thought
through your use case and reasons, don't get defensive about it. If challenged
by those with lesser knowledge after reading an article like this, calmly and
rationally explain your reasoning. And like any technology choice, be prepared
for someone else to offer another option you weren't aware of, with _better_
reasons.

Again, duh.

------
dmitryminkovsky
I’m not sure why people frame the use of certain technologies as orthogonal to
non Google-scale use cases.

For example, one might like using MapReduce or Kafka Streams for their
programming paradigms, not for the redundancy or scale they provide.

Another example, Kubernetes, which makes it trivial to run containers and
attach storage to them.

~~~
freyir
I guess the idea is that for all the people using MapReduce, most would be
better served by a different paradigm than MapReduce. Because in most cases,
there's a better paradigm than MapReduce.

~~~
dmitryminkovsky
> Because in most cases, there's a better paradigm than MapReduce.

You'd think so, right? But what happens to me all the time is I try to use
"small-scale" tools to handle some small problem, and then I find myself
writing glue code that's already implemented in, say, Kafka Streams. So I
might as well just run one Kafka broker locally and write a Kafka Streams
application. Or the same with the Django ORM, which I reach for all the time
just because I don't like writing database access code when I can write up
some models and be done with it. Every time I reach for "small" tools I end up
writing tons of code that's not actually solving my immediate problem or
question.

------
EliRivers
_We like to think that we’re hyper-rational_

Don't we just. Which makes us vulnerable to every shiny gewgaw and every
personal and cultural bias out there.

------
taneq
Great article, and one that I think a lot of people here could learn from.
Everyone gets all excited about fancy architectures and algorithms because
they're fun and cool, but pragmatically, you can buy a LOT of server for the
same price as a couple of engineers messing around for a month.

------
leowoo91
I find it funny with all these cloud providers giving huge free credits for
startups (AWS giving away 15k for e.g.) like they could utilize all the
amount. Reality? 10 visits per day.

------
bsaul
Hype driven development is the real curse of the last decade. Not just big
data, but also smaller techs like mongodb, or the countless javascript
frameworks... choosing the right stack for the job could be a consulting job
in itself, even for simple web development.

------
mcguire
" _The thing is there’s like 5 companies in the world that run jobs that big.
For everybody else… you’re doing all this I /O for fault tolerance that you
didn’t really need._"

That's not necessarily entirely true. In the early 2000s, I worked for Cadence
Design Systems, who at the time needed to build and test the plethora of tools
they'd built or acquired on a large variety of systems. I worked on
"GridMatrix", sort of like make but using a large, heterogeneous cluster of
machines, built on top of the Condor and LFS batch scheduling systems---the
same sort of thing underlying MapReduce.

On the other hand, I get where the author's going: cargo-cult design is
rampant in enterprise software development. But it's not just cargo-culting
Amazon or Google; it also involves fashion and resume padding.

And then, there's "eNumerate multiple candidate solutions. Don’t just start
prodding at your favorite!"

Bwahahahahah. <\- Unamused laughter.

Our first and only response, as an engineering discipline (if you want to call
it that) is to pick the first idea that comes to mind and beat it to death
with a stick.

------
jchw
How many times can we run this same exact type of article? If it didn't change
anything the first time it ran, what are people expecting to happen now?

Maybe Hadoop isn't necessary for everyone who needs it. They'll learn. And if
they don't, more job security for you.

Otoh there are some crazy advantages of running certain "large scale"
softwares. Not everyone needs the "scale" Kubernetes offers, but managing
tasks declaratively as containers that get scheduled on boxes in a hermetic
fashion? Or maybe continuous delivery - not everyone needs CD, but it
certainly offers many advantages to those who do it.

Most people could probably live off of shell scripts running under Cron on a
box with CGI scripts written in Perl without any issues, that does Not imply
there are no advantages to new technology because not everything is about
scale and fault tolerance.

------
bsder
Most people simply don't realize just how fast PostgreSQL is on a modern
machine (and, by extension, just how bloody fast a modern machine is).

My old laptop could do something like 150K transactions per second in
PostgreSQL. You can scale _really far_ before your database becomes the issue.

~~~
jodrellblank
I don't realise how bloody fast a modern machine is because the software I use
day to day takes _seconds_ to do basic operations like move the cursor, or
scroll a window, or load a list of ten items.

~~~
bsder
Hudson giveth and Nashua taketh away.

Or, for the youngsters: Portland giveth and Seattle taketh away.

------
MrStonedOne
One common mistake I see often is in step 1, understand the problem. Assuming
the problem is not understanding the problem.

I'm reminded of an issue my old boss had at his house. His power bill was
raising at an alarming rate, he assumed it was the central hvac, spent a few
hundred getting it tuned up, no effect. Spent a bunch of weekends replacing
weather stripping, adding sealant, etc. minor effect.

I came in with an app that lets you estimate current power usage by looking at
the power meter[1], and killed circit breakers one by one while measuring
current usage.

The cause turned out to be a malfunctioning swamp pump he forgot the house
even had.

He assumed he knew the problem, by focusing on the common cause of such
problems, he assumed wrong, and because of this, every solution was addressing
a different problem than the one he was trying to solve.

-

Another example is the story[2] of the old as fuck ibm mainframe that was
powering a website that would take on the order of 10s of seconds to return
data. Everybody assumed all of the delays was to be blamed by the "legacy
hardware". Every solution pitched involved migrating off of it, a daunting
task no manager wanted to green light. Finally they brought a consultant in to
figure out what was going on. Turned out it returned data in 6ms or less, the
cause of the lag was the java app that would read the output from the
mainframe to transform and send to the browser.

They could have solved the problem years ago if they had just actually tried
to understand the problem.

-

[1] (power meters have a thing that happens every n watts, along with a decal
that tells you what n is, so you can determine current power usage by
measuring time between these n watts, there is a handy dandy app
[https://play.google.com/store/apps/details?id=com.sam.instan...](https://play.google.com/store/apps/details?id=com.sam.instantaneous_power_timer))

[2] 7074 says Hello World

~~~
baroffoos
Not entirely related but one problem solving issue I see often is people not
testing and refusing to test their assumptions. Often while trying to help
someone with a problem I ask if they tried something and they say

"No that cant be the issue because it shouldn't be affecting it"

Yes under ideal situations this wouldn't be an issue but if the system is not
working as expected then how can you be so sure that this bit is working as
expected.

------
so_tired
How many of these clever VC say to themselfs "None of our companies will be
the next FAANG/MAGA"..

Oh wait. They dont say that. They fund, and hope and pray!

------
fiatjaf
You should be telling "you are not Google" to people who try to build the next
social network or any app that is basically an enormous chicken-and-egg
problem.

Most programmers try to do that. And they keep trying. They're wasting their
lives.

------
lazyant
A lot of these decisions can be made with a bit of "back of the envelope
calculations"; playing with orders of magnitudes based on latency times:

[http://highscalability.com/blog/2011/1/26/google-pro-tip-
use...](http://highscalability.com/blog/2011/1/26/google-pro-tip-use-back-of-
the-envelope-calculations-to-choo.html)

[https://people.eecs.berkeley.edu/~rcs/research/interactive_l...](https://people.eecs.berkeley.edu/~rcs/research/interactive_latency.html)

------
ycombonator
We are a 50 person team here and the cool kids have created over a 1500 apis
that are served over 6000 container instances. The “system” only serves < 1 mm
users. We’re definitely not cargo-culting:)

------
peeters
My general test for most new technologies is that the desire to use it has to
have arisen out of the pain of not using it. If I'm looking at a framework for
any reason other than having shouted "there _has_ to be a better way!" then
I'm not in a position to evaluate its merits, and even if it is the right
choice I'm not in a position to understand why, and will thus be prone to
using it in a stupid way.

It might seem expensive to do things the "wrong" way first every time, but I
really do think it's necessary.

------
bluetidepro
I don't think a lot of people _actually_ think in terms of "they are Google",
but more so in terms of "we hope to one day be Google" so they build (more
complex than needed) systems with that ambition of scale and everything.
Almost as if they are over planning for the future. Not to say that is either
right or wrong, but just something worth noting, in my opinion. I don't think
anyone actually over engineers something thinking they ARE Google when they
are clearly not.

------
csallen
_> just, think for yourself. Is it the best solution to your problem? What is
your problem exactly, and what are other ways you could solve it?_

This is the crux of the issue, psychologically. When making an important
decision in the face of uncertainty, having someone else with a lot of clout
(or a group of people whose collective wisdom you trust) simply _tell_ what
you do feels amazing. That feeling is hard to ignore in favor of constructing
your own cost-benefit analysis.

------
PeterStuer
I used to consult for a public service department.Their internal
infrastructure was incredibly complex with hot fail-overs/redundant/scale-out
hardware (running homegrown custom LoB applications for less than 200
employees). This paradoxically lead to huge downtime as no-one was smart
enough to configure this mess, and deploying a new service which should have
been trivial always ran into worst case scenarios.

------
m463
I'm reminded of JWZ and his very practical advice on Backups:

[https://www.jwz.org/doc/backups.html](https://www.jwz.org/doc/backups.html)

In particular Addendum B:

RAID is a waste of your goddamned time and money. Is your personal computer a
high-availability server with hot-swappable drives? No? Then you don't need
RAID. RAID is not a backup solution. Even if you use RAID, you still need
backups.

~~~
hddherman
With modern filesystems, however, RAID-like implementations do have their
purpose in ensuring data integrity, as is the case with btrfs and ZFS.

Side note: the linked URL redirects to a questionable image when navigated
from HN. Interesting choice by the site owner.

~~~
m463
That doesn't mean his point is not spot on.

I have more than one friend that has overengineered an industrial solution for
basic computing needs. (and when things break, complexity goes exponential)

as to URL redirect: I don't see the image/redirect you refer to, even clicking
on the link here on hacker news. Can you elaborate? I take the links I
recommend seriously.

~~~
hddherman
The URL will redirect to
[https://imgur.com/32R3qLv](https://imgur.com/32R3qLv) if navigated from HN,
most likely caused by the Referer request header being present. Doesn't seem
to happen all the time, but try it out in Private Browsing mode and it may
redirect you there.

------
randomsearch
I think this article is an instance of a more general observation: most
programmers should spend more time understand the needs of the business and
users, rather than learning new technologies. The examples given were large
companies solving their business problems rather than developing technology
for its own sake.

Or to put it another way, form follows function.

------
revskill
In my case, a multi-tenant application, i split the application into multiple
services, one service per module.

It's not about scalability or about anything technical. It's about domain
understanding and data isolation.

There's NO problem here to understand. It's just one way to manage our data.
Or another way speaking, design for failure.

~~~
EpicEng
>It's just one way to manage our data

Sure, but there are tradeoffs to any approach. More services == more devops ==
more complexity in fault tolerance, communication, data storage/consistency,
etc.

I'm not saying it's a bad model _for you_, obviously I have no idea, but the
point of the article is that some tend to jump toward complexity when they
shouldn't.

~~~
revskill
It's not "more services == more devops". It's like this: Only manage the
nessessary data.

If i need to change one part of data, i don't migrate the whole data. I just
need to change that only changed part.

~~~
EpicEng
I can't imagine a microservice architecture that does not require more devops
work than a comparable and sane monolith. Your second and third sentences
haven't convinced me.

~~~
revskill
Not quite. With a monothlic, it's an overhead to migrate data, because one
small mistake will take your application down.

So, in real world, it takes more overhead to manage a monothlic than a
microservice architecture.

~~~
iBelieve
With microservices, if you make a mistake and take part of your application
down, aren't you worse off because only part of your application is running
and is now in an undefined state?

~~~
threeseed
You really need to think that comment through first.

If you're a bank then it's fine if your notification microservice goes down
because at least you can still accept payments, handle deposits, transfer
money etc.

In almost no situation is there a case where having no availability is
preferred over partial availability.

------
fizixer
Please don't introduce new acronyms like UNPHAT when we have something like
YAGNI that works just fine (and probably there might be something, a rule of
thumb, before Fowler, or Ron Jeffries, or whoever, coined YAGNI).

Secondly, extreme YAGNI and extreme future-proofing are two ends of one of the
considerations on how to approach writing a software. Neither extremes is
good. In reality, the best location on that spectrum would depend on the
situation at hand.

There is no silver bullet. And I'm sorry but I didn't learn anything new from
this blog post, something that is already not part of the software engineering
vast body of knowledge.

 __edit __: I know UNPHAT is not an "exact synonym" of YAGNI. But what you're
describing is a problem solving approach. That's probably even less new.
Please look at Polya 'How to Solve it' if interested.

------
wyldfire
Note that this was from 2017.

------
mherrmann
This, for Docker. Everybody is using it these days, and I understand there are
valid use cases. But it's way overused and very often does not justify the
added complexity. Keep it simple.

------
sebringj
This is true but it's also true that small/medium businesses are riding the
same hype train and they'll gladly pay you to take them along for the ride so
in the end, meh-money.

------
paulsutter
This is one reason Snowflake is killing it. The real market for "big data" is
data warehousing, and users want a data warehouse that's in the cloud and
portable among vendors.

~~~
ddorian43
Only in aws,azure portable.

------
adamstac
We did an entire show on this with Ozan Onay back in 2017 ~>
[https://changelog.com/podcast/260](https://changelog.com/podcast/260)

You can read through the transcript too ~>
[https://changelog.com/podcast/260#transcript](https://changelog.com/podcast/260#transcript)

------
sergiotapia
Great article - I'm kind of at a crossroads myself. I'm very productive with
Rails and can scale it probably well beyond anything I'd ever need and can
move away the big chunks to microservices later.

But there's also Elixir which can do away with most of these concerns.

So what do I do? Go with Rails to ship stuff? Or struggle a little bit more
with Elixir to get that much more legroom?

------
zelon88
I wrote an open response to this. Looking back I was a bit off-topic and I
could write it better today but what the hell.....
[https://www.honestrepair.net/index.php/2017/06/08/re-you-
are...](https://www.honestrepair.net/index.php/2017/06/08/re-you-are-not-
google/)

------
peterwwillis
Assuming low level techs finally get realistic about the complexity they sign
up for... If an executive says "we will use only use X technology for _all X
tasks_ ", they probably picked the Google thing, and you have to deal with it,
or they'll hire someone else who will.

------
TH3R3LL1K
Architects and engineers alike love increasing the complexity of systems just
to use buzzword products.

------
corporateVeal27
This is a good article. I'd say it draws heavily on the same themes as "Hard
Facts, Dangerous Half Truths and Total Non-Sense" by Sutton and Pfeiffer.
Drinking Wild Turkey every morning isn't going to turn you into the Southwest
Airlines CEO lol

------
externalreality
This article has been written many times by many authors (even using the same
cargo cult airplane image) and yet is no less true than the first time it was
written.

That said, clamoring toward things like React has yielded a net positive
result. So cults can be a double-edged sword.

------
NikkiA
A place I worked was building a video dating site, we spent thousands upon
thousands ensuring that the site would 'scale' to millions of concurrent
users.

I think we maxxed at somewhere around 60 concurrent before I left. They gave
up on the idea a few months later.

------
gumby
Great advice, and to double click on it: google wasn't google either when they
started and part of their value to investors was their bare-bones hardware
infrastructure. _That_ part of google is (philosophically) worth emulating.

------
diehunde
Is this really a thing though? I would be surprised if I find someone that's
able to deploy a spark cluster for a production application and at the same
time doesn't know that is not a good solution for their problem.

------
alvalentini
Really interesting article and good points. Although, I always had the
opposite problem: people wanting to use technology that is too basic without
caring about the implications at scale. "As long as it's free".

------
tempodox
Funny how a blogger reminds us to get in touch with reality once in a while,
and then every commenter on here bends over backwards pretending that they
actually _are_ Google.

------
cfv
Oh, I had to spend an hour "fixing" my incorrectly autofilled fields for
Chrome to please stop playing games with my users data.

I'm acutely aware I'm not Google.

------
bytematic
Usually I implement things for future workloads. Sometimes I like to imagine 1
billion people could use each component, then it is truly ready.

------
DEADBEEFC0FFEE
What is meant by "“solve” the problem mostly within the problem domain, not
the solution domain."?

~~~
danans
To use a restaurant analogy, if you are a chef, and you are opening a new
restaurant, you should focus more on the problem of devising the best recipes
and experience for your customers, rather than on whether your kitchen has a
high-end cooking stove.

------
mcs_
You should do a show with this script. Educational and fun. Well done.

------
Hex08
Resume Driven Development

------
pwinnski
You Are Not Google (2017)

~~~
kgwxd
You Are (Still) Not Google

------
newshorts
How many people on this thread actually are google?

------
mrkeen
> I understand that Kafka is still useful for lower throughput workloads, but
> 10 orders of magnitude lower.

Should we stop using MySQL too? That handles more data than I have.

~~~
redhale
I think a good analogy might be using MySQL for storing a single key/value.
You could do it, but why? Instead you could just use an environment variable
or simple file.

------
steelframe
i work for google and now i'm confused

------
robertAngst
One gripe I have with this thinking- Until you are, and you have failure.

Things that don't scale or be AA Quality will show up.

Go viral and fail to service. 15 minutes of fame squandered on an avoidable
tech mishap.

One time a top comment on a thread was about a misspelling.

------
xxxhenriquexxx
Whatttttt

------
tcbasche
This entire article kind of assumes that you're stuck with your current
workload and will never scale, which is awfully pessimistic. Should we all
just take the attitude that, oh well, we'll never be massive so why bother
trying to reach any sort of throughput?

~~~
Negitivefrags
If you make that assumption you will be right most of the time.

Deal with problems when you have them.

