
How to avoid picking the wrong technology just because it's cool - scarhill
https://blog.bradfieldcs.com/you-are-not-google-84912cf44afb
======
AznHisoka
I used to work at Standard and Poors. They do media monitoring for lots of
financial companies.

When I joined, I thought they using state of the art tools to monitor millions
of sources.

Turns out they just had a team of thousands in India that manually visited
certain websites to check for press releases, transcripts, earnings reports,
etc and push them to a database if they found them.

~~~
sharemywin
wonder if that could be crawled and save money.

~~~
dismantlethesun
Accuracy. Every solution I've seen that relies on automatic crawling will
eventually have a parsing error when someone changes their sentence structure
of a press release.

It's not so obvious when you're looking at the breaking releases for a few
stocks or companies, but historical records have at least 1 error per stock
per year.

~~~
greggyb
So split your stream:

    
    
        1. Data matching expectations (you do have a definition of correct, right?)
    
        2. Log for manual review -> manual inserts or correction and placed into queue for (1)
    

Monitor (2). When inserts start trending up, it may be time to update your
processing logic.

~~~
flukus
I came up with a similar idea for a company several years ago where we had a
team of people doing data entry from faxed documents. I wanted to build
something that would do all the OCR it could and then display it to users to
verify, which should have been a 10 times efficiency increase, not to mention
speed and accuracy.

The idea was rejected, they wanted either a perfect solution or nothing. I
don't know why, but for some reason the idea computers removing humans is
acceptable to management, but computers augmenting humans wasn't.

------
Animats
The author's acronym is silly, but it's a real problem. Soylent liked to
blither about their "infrastructure", for a product that sells a few times per
minute. They could be using CGI scripts on a low-end hosting service and it
would work fine.

Wikipedia is some MySQL databases with read-only slaves front-ended by Ngnix
caches and load balancers. That seems to get the job done. Wikipedia is the
fifth busiest web site in the world.

Netflix's web site (not the playout system) was originally a bunch of Python
programs.

The article mentions a PostGres query that required a full table scan. If
you're doing many queries that require a full table scan, you're doing
something wrong. That's what indices are for.

~~~
squeaky-clean
I remember reading a scaling-out article from some startup. Some of the things
felt a little over-engineered, some were impressive, some seemed wrong. But
then they get to the point where they brag about their scale, and the metric
they used was that they can handle thousands of requests.... per day.

~~~
aqme28
[https://engineering.hellofresh.com/scaling-hellofresh-api-
ga...](https://engineering.hellofresh.com/scaling-hellofresh-api-
gateway-7d40be55450f)

~~~
tnolet
This is hilarious and sad at the same time. However, most of these write ups
are aimed at attracting talent. Even more, some tech stacks are deliberately
built to attract talent when the core domain is just too simple or boring. "We
serve user subscriptions and recipe data from an SQL database using Rails"
just doesn't sound as snappy as the infra-porn on the blog.

~~~
taneq
Isn't that kind of thing a real red flag for the kind of talent you'd want to
attract? If someone told me they'd built a GPU compute cluster for their phpbb
based social club forum, I'd think they were an idiot and not want to work
with them.

~~~
keganunderwood
I am sorry if I'm completely off base but I'm still thinking about the danluu
page on options v cash which quotes this

[https://news.ycombinator.com/item?id=11200296](https://news.ycombinator.com/item?id=11200296)

Maybe startups don't want the absolute best of the best but rather the best of
the gullible?

Edit and or poor

~~~
taneq
So you're thinking that kind of technical mark-missing is the startup
equivalent of the typos and other glaring errors in email scams? They're to
weed out the people smart enough to be a problem?

------
Kiro
Nothing I've built has ever needed anything more than a $5 Digital Ocean
droplet and one of my services gets around a thosand requests a second at
peak. Purely anecdotal and I'm not doing anything CPU intensive but I really
feel startups are overdoing their infrastructure.

~~~
sidlls
It isn't just startups. And I agree with another commenter: it seems like
resume driven development.

~~~
__jal
There's also a weird peer-pressure involved. Overheard a conversation a while
back that summarizes it nicely - someone was talking about a scheduling system
written they've used for years, and mentioned it was written in Perl. Another
participant guffawed, and after the requisite Perl-bashing, the original
person allowed that, yes, even though it worked fine, they should rewrite it.

No idea what company that was, but I'd love to work in a place where that was
the most pressing concern on my plate.

~~~
thehardsphere
Were all of these people competent?

Honest question. I have only seen the "it's written in X, so therefore it must
be re-written in something nicer even though it is working" thinking from
incompetent people who were just trying to take ownership of something they
didn't quite fully understand. Though I never have seen it in a appropriately
functioning commercial setting; if management is competent, they'll
immediately recognize the high costs with no concrete benefit and say no.

It's one thing to say "we have to re-write this because it uses Java applets,
and Java applets are problematic because Oracle is dropping support for them,
so our customers are going to be screwed soon if we don't do something." It's
another thing to say "we have to re-write this because it's in Perl because
Perl is something I don't like."

~~~
ProblemFactory
I've seen this situation multiple times, and yes the developers involved were
competent. They were even well-meaning, and wanted to build something for the
benefit of the company, not just their resumes.

I think the tendency to over-engineer and over-polish comes mostly from
getting too invested in one particular project or task. The developers have
"professional pride" \- they want to deliver software that has good
architecture, high test coverage, easy to understand and maintain code,
reliable, scalable, etc.

This means competent developers are very tempted to continue working on a
project as long as there are possible improvements to it, even if these
improvements do not make business sense. Nobody wants to admit that "cron job
that fails once per month" is a sufficient solution when they can see a better
solution, and go work on the next hacky cron job instead.

------
dismantlethesun
Working in the D.C. area has given me a high tolerance for acronyms and
backronyms (seriously: P.R.O.T.E.C.T. Act stands for "Prosecutorial Remedies
and Other Tools to end the Exploitation of Children Today").

U.N.P.H.A.T does raise a smile to my face for trying, but if the author is
reading, I'd suggest you change it to a prescriptive paragraph where the first
word in each sentence becomes a letter in the acronym (e.g. B.A.M.C.I.S).

====

Here's my best try:

UNPHAT:

Understand the problem.

Nominate multiple solutions.

Prepare by reading relevant research papers.

Heed the historical context.

Appraise advantages versus disadvantages.

Think!

~~~
dismantlethesun
Drat, it's too late to edit my comment and take it out, but "Consider
candidate solution" wasn't meant to be included in the acronym. It was part of
my brainstorming.

~~~
dang
Ok we took that out for you.

~~~
dismantlethesun
Thanks.

------
pram
A lot of this just sounds like Resume Driven Development, not people thinking
they're Google or Amazon.

~~~
OpenDrapery
I was thinking the same thing. I wonder how much of it is due to the way we
make devs work and the hours we make them keep.

For example, I'd be more than happy to use the same old, tried and true,
boring tools to just get the job done, if it meant that I could then go play
golf or otherwise not be in the office.

But if you insist that I be in my seat 8 hours a day regardless of workload,
then goddamn let's take this shiny new tool for a spin!

Do I want my resume to show that I used the same tool for every job for the
last ten years? Or do I want it show some new hotness?

The industry and employers are as much to blame for this as the engineers, if
not more. When you use middleman firms to find your employees, and all they
understand is buzzwords, well then guess what game the devs are gonna play?

~~~
allcentury
Fellow golfer and tinkerer - we are a product of the job markets. Hot
employers typically want new and shiny on the resume in addition to
fundamentals, seems like everyone is playing the same game...

~~~
collyw
I get the feeling that my skillset is becoming outdated.

Fact is I know Django inside out, and plenty of Python libraries. Its very
rarely that I will find something that requires me to learn a new language or
tech (I will likely get things done a fair bit faster using the tech that I do
know). Anything else feels like resume driven development.

------
lacampbell
I am fortunate, in that I got a lesson in not over-engineering things very
early in my career.

My first programming job was a 3 month contract at the maintenance department
of an international airport. They had a bunch of information in large,
unwieldy ERP system and wanted to automatically generate job sheets for the
different maintenance crews. So I did the simplest thing possible - I
generated an excel file from the ERP system, then using that file as input, I
outputted different excel worksheets for the different crews.

It was very plain GUI app that had one or two buttons. I remember being a bit
worried that it wasn't nearly fancy enough for 3 months work, but everyone
seemed pretty happy with it.

Later on I found out that - before me - they had hired an experienced software
developer who had worked on the same problem for 6 months, and at the end of
that 6 months had apparently not produced a solution. I had done the dumbest,
simplest thing - not because I had any insight or wisdom, but because it was
really the only thing I had the skills to do. But I delivered.

It was a brilliant, accidental first lesson in not over-engineering.

~~~
gaius
_I generated an excel file from the ERP system, then using that file as input,
I outputted different excel worksheets for the different crews._

As a complete aside, you might be surprised how far you can go with Excel
these days. Do you know it has a built-in in-memory columnar database now? You
can have millions and millions of rows of data in there that you can use in
tables and charts completely independently of the size of the grid. Pull back
a huge chunk of data from the DB and slice and dice it to your heart's content
locally.

I look at people buying expensive "business intelligence solutions" and I
think, it's right there on your PC all along and you don't even know it...

~~~
collyw
The problem is people using Excel for everything that it shouldn't be used
for.

~~~
gaius
"Throwaway" Python code winds up becoming part of real systems all the time,
but we don't blame Python for that

------
joshribakoff
I've seen this in action. Using code generators to convert XML configuration
to a few API end points. Or using a DSL/rules engine because you don't want to
write code. Or having APIs that hit other APIs ad infinitum when the whole
thing runs on one server because "micro services are the only right way". The
result was we spent time gluing together what was already a monolith
disguising as microservices, rather than adding features the customers wanted

More recently I had to solve time drift on 1000s of devices. The problem was
someone installed puppet to manage those devices which uses NTP. The devices
are behind firewalls so if they block the puppet master or mess with SSL
puppet doesn't even phone home. Or worse it gets incorrect time from NTP peers
on the network. The solution was to throw out the shiny tool "puppet" and just
call "date". Puppet and NTP are great in theory for getting time down to the
millisecond but totally backfired when some devices were off by over 24 hours.
For our purposes as long as all devices were within 5 minutes we were good.
The irony was after disabling NTP puppet just started it again. And we
couldn't use puppet to fix that since 50% of our users had it blocked. No
other choice but to throw out puppet and start over from scratch. The guy who
spent months setting up puppet was not happy.

~~~
liveoneggs
the real issue is why the firewalls were randomly blocking puppetmaster and/or
ntp and why the puppet ssl stuff stopped working (apparently randomly?)

Everyone involved sounds like they need a lot more experience.

~~~
joshribakoff
With all due respect, you don't know the real issue. Your response is the same
thing the guy who installed Puppet said to me... just have them unblock it.

Our sales pitch is "these devices use plain http and will work behind your
corporate firewall". The blockage wasn't an issue that could be solved, it was
our whole business model to workaround the blocks by using simple http instead
of https, proxying everything through our IP, and things like that.

Even the puppet documentation says not to run a puppet master when you have
devices that are behind firewalls or limited network. The guy who added puppet
apparently didn't read that.

I wasn't the one who decided the business model just the guy who fixed it to
work as advertised while dealing with the pressure of everything crashing &
burning. You're right no one had experience but thats not the point.

My point was that the fancier tools sometimes just add new issues without
solving your real issue. Despite my lack of experience I solved the time drift
using a linux built in "date" to set the date time. It didn't account for
network lag like NTP, and an NTP developer would probably laugh at my
solution, but now all devices are accurate to within a few minutes & that
particular problem was solved. So don't always go for the most complex tool is
all I'm saying.

For what its worth I do plan to bring back puppet but run it in "puppet agent"
(offline) mode. We'll using custom scripts to copy in new puppet configs so
puppet does not need to phone home.

~~~
liveoneggs
I would love to get into it more deeply as you continue to supply details! :)

distributing hardware outside of your network and using puppet in
master/client mode is obviously a bad idea, just like having any dependency is
difficult to manage (sometimes like NTP)

However, clocks will drift. Consider ntpdate in a cron or an easier-to-manage
sntp client vs ntpd, which is a little nutty.

So the point is that a tool like puppet, only properly configured, is probably
a great asset for your use case of distributing hardware, as it can help keep
things working as expected.

~~~
joshribakoff
Yes Puppet solved one problem... How do I add a cron to all devices, and retry
it if it failed without adding it twice to devices where it worked. Puppet is
amazing. It solved that problem....

But then it created a whole new world of problems since it violated our
business model to have it phone home. Thanks for the suggestions on NTP. We'll
likely add features that do require more accurate time in the future & your
suggestions will probably come in handy!

------
wwweston
> Don’t even start considering solutions until you Understand the problem.
> Your goal should be to “solve” the problem mostly within the problem domain,
> not the solution domain.

I'd guess that _this guideline alone_ would stop 2/3 adoptions of JS SPA
frameworks (and 4/5 Angular adoptions!) if followed.

~~~
BigJono
At least the SPA frameworks themselves have a reasonably common legitimate
use. The tooling around them is the major problem. Most projects I've worked
on, even complex ones, could comfortably trim from 100 dependencies down to 10
and have the developers working on them be an order of magnitude more
productive.

People wilfully wrestle with thousands of functions worth of APIs every day
and don't even notice the immense slowdown it's causing them. It's especially
bad in React land, which is ironic seeing as Sebastian Markbage at Facebook
has an excellent talk about reducing API surface area.

------
merb
Rule 1: don't use any modern javascript framework.

------
marktam264
I'm typing this impromptu but this article seems to be a qnd informal
carnation of the Architectural Tradeoff Analysis Method
([https://en.m.wikipedia.org/wiki/Architecture_tradeoff_analys...](https://en.m.wikipedia.org/wiki/Architecture_tradeoff_analysis_method)).

------
yumaikas
I suppose another point worth bringing up is that hardware has made some
pretty strong advances in recent, especially with SSDs being widely available.
Stuff like that has raised the ceiling on what a single box can do compared to
2000 and earlier, when Google was building MapReduce at first.

~~~
jethro_tell
Pretty sure that's in the article is it not?

------
NTDF9
"Don't use tensorflow to predict everything"

------
dlwdlw
Many people may not work at Google scale, but name would probably like to work
at Google

------
dang
We temporarily replaced this article's baity title with the text's more
accurate self-description.

If someone would care to suggest a good title—i.e. accurate, neutral, and
preferably drawn from the language of the article itself—we can change it
again.

~~~
stickfigure
FWIW, I think the original title was pretty good. I have had the unfortunate
experience of screaming pretty much exactly those words (albeit replace
"Google" with a different company from which one of my CEO's advisors came
from).

EDIT: how about combine the two?

 _You Are Not Google: Another "Don't Cargo Cult" Article_

~~~
nemild
I like your proposed title, "Another 'Don't Cargo Cult' article" on its own
seems dismissive, when the content seems quite useful for many engineers (the
acronym need more work, though).

~~~
dang
Ok, fair point. I've made up a title, even though we hate to do that, because
I can't find any phrase in the article that neutrally summarizes it.

~~~
ethbro
New title seems fair. (Also, welcome to the dark side of the editing force,
etc etc)

------
komali2
Regarding "UNPHAT": Is this... serious? Does the author genuinely hope that we
will use this acronym as a means to help guide our technology choosing
decisions? Is it not their creation and is just something I wasn't aware of
yet?

Finally, do these forced acronyms ever help anybody else out there? I mean
seriously, the "N" standing for "eNumerate?" The "P" standing for "Paper,"
which barely correlates to the actual meaning "consider a candidate solution."

Seems to me just saying "apply a principle of unfattening your technology
decisions" would be a hell of a lot easier to remember.

~~~
paulddraper
I doubt it. I think it needs to be three or fours letters. (E.g. Always Be
Closing. Keep It Simple, Stupid.)

\---

Trying my hand:

\- understand the DOMAIN

\- find the OPTIONS

\- research a CANDIDATE

\- know the HISTORY

\- consider the ADVANTAGES

\- apply deliberate THOUGHT

DOCHAT

~~~
jaclaz
Maybe simpler:

DOE

Don't Over Engineer

(which more or less brings us back to KISS principle)

~~~
ChuckMcM
I like DOE because the inverse is E (Engineer). It illustrates an on going
challenge in technology where the first question isn't "What capabilities
should our resulting systems have? And what constraints are there on our
implementation?" (which would be engineering a solution) instead we get the
question "What other systems out there seem to solve this problem?", or worse
"What other systems have similar inputs and outputs to the ones we have and
want?"

~~~
jaclaz
Also there is this other question (as I see it):

What CAN this (pre-chosen) _something_ (insert here _hardware_ or _tool_ or
_programming language_ or _library_ ) do?

Let's use ALL (or most) these functionalities! (because we CAN)

Losing sight of the actual question which should be "What is actually needed"?

------
gtirloni
This is just a "my technology stack is better than yours" post like countless
others we see daily. Sorry to dismiss it so abruptly but it gets tiring.

~~~
gtirloni
I think people downvoting tend to ignore the fact that the proposed "optimal"
solutions for non-Google companies were at some point novelties themselves. If
the same logic is applied we'd be using CICS app servers, IMS, and buying
terminals.

At some point things change, the new normal changes, etc. The shift we are
seeing in some areas also contributes to finally accepting the realities of
distributed systems.

~~~
icebraining
The people are probably downvoting because you have missed the point of the
article. Nowhere does it say that one shouldn't change things, or adopt new
solutions.

