A bunch of rants about the cloud

markbnj · on May 8, 2020

I always agree with a lot of what Rachel writes, and she gets a lot right here too, but having said that flexibility also has value. Like her I've done it both ways... actually I've done it more than two ways. Servers in the closet, servers at the co-lo, VPS from a hosting provider, virtual machines from a cloud provider. No question in my mind that a good cloud provider lets me do more, faster, and for (at least initially) less money. And the stuff we build on their platform is way more flexible and adaptable to changing conditions than it ever has been. This doesn't obviate one little bit her main point that it all has to be overseen and controlled or it gets out of hand.

smitty1e · on May 8, 2020

Ain't no fixin' daft, but you can scale daft, apparently.

raghava · on May 8, 2020

    Some of these "cloud" places, though, don't have that kind of oversight. They build their systems to just auto-scale up, and up, and up. Soon, there are dozens, then hundreds, then even thousands of these crazy things. If they write bad code, and their footprint goes up every day, they don't even realize it. Odds are, the teams themselves have no idea how many "machines" (VMs in reality) they are on, or how quickly it's changing.

    There's no equivalent of my friend to stand there and tell them to first optimize their terrible code before they can get more hardware. The provisioning and deploy systems will give it away, and so they do.

    What usually happens is that someone will periodically run a report and find out something mind-blowing, like "oh wow, service X now runs on 1200 machines". How did this happen? That's easy. Nobody stopped them from needing 1200 machines in their current setup. People wring their hands and promise to do better, but still those stupid instances sit there, burning cash all day and all night.

Most accurate picture of shops struggling with the problem of doing cloud well.

Not only "cloud", but 3rd party APIs/managed services as well. Slab based pricing, pricing on various rates, pricing on data. what not!

I have come to believe that most of the cloud/API vendors make bill tracking so convoluted and hard on a definitive purpose, just so that people can't get easy visibility and early-warning of impending disasters.

Any firm/outfit that says "we want to use cloud for its elasticity" must be prepared for that "elasticity in the billing" too. If they are not, they will find themselves in deep troubles.

conradev · on May 8, 2020

In mobile development, the bar is pretty clear: your app should not drop frames, never hang on the UI thread, and interactions should not take longer than the user-perceptible 100ms. The silicon is N years old and it can't scale horizontally, so either the user has a fluid experience using your app or they don't.

How do backend developers determine what is and is not reasonable performance for a given service?

I see so many stories from companies of "we switched to the right tool for the job and saved so many CPU cycles!". From the outside it sounds like half of the problem is determining you have a problem in the first place.

praptak · on May 8, 2020

> How do backend developers determine what is and is not reasonable performance for a given service?

Well it depends on what the backend serves. It's either "the frontend is not perceived to be sluggish" or "batch jobs don't overrun".

mav3rick · on May 8, 2020

Throughput of service requests ?

clarry · on May 8, 2020

> I'll say it up front: I prefer physical hardware. I know exactly what I'm working with, and there's no spooky stuff going on when I try to track down a problem.

I roll my eyes as I look at my desk with a multimeter, oscilloscope and pile of hardware with multiple (more or less buggy) board revisions full of jumper wires and cables going to serial pin headers..

I shudder to think about all the open tickets about hardware related issues. Of course, probing boards is only partial help since the other half of hardware lives in drivers that may or may not be based on reverse engineering, incomplete or missing specifications, etc.

The network is being odd, sometimes, and I still don't know why, and I can't reproduce it reliably.

Sure, virtualization adds more complexity.. but sometimes it's nice to know that hardware is someone else's problem.

acd · on May 8, 2020

Nothing beats pure hardware for performance. The app will be able to run in CPU L1-L3 cache. With shared cloud hardware there is bound to be cache evictions for the program making it slower. Ie other programs will compete for L1 cache.

A fun comparison would be the same program implemented as a monolith on metal vs micro services on the cloud. Why monolith on metal will run in cache and not do rpc network calls. That must be like a hundred times faster.

We have global warming programs should be maximum efficient.

Performance numbers everyone should know http://highscalability.com/numbers-everyone-should-know

JDEW · on May 8, 2020

So the conclusion we should draw is that a well run data center is better than a badly operated cloud?

goatherders · on May 8, 2020

Love it. this perspective us lacking. Thanks for sharing.

luhn · on May 8, 2020

Spoken from a place of ignorance, as I've never worked at a large tech company: I wonder how much of this is a feature and not a bug? Undergoing a performance review by an overbearing infra dude ever time you need a new server sounds hellish. So maybe the alternative is paying AWS seven figures a month for reasons nobody truly understands, but look at how productive everyone is!

Yhippa · on May 8, 2020

I've always thought about at big companies that are "well managed" and all-in on the cloud, how much money is leaking away due to rushed implementations or bad architecture? And as customers of those companies we are paying for it.

kfk · on May 8, 2020

This to me is a bit like saying we should go back to assembly because programming languages are too much of an abstraction from machines. As with everything it comes down to tracking solid metrics to avoid wasting resources and getting decent logs when bugs occur.

johnvanommen · on May 8, 2020

Couldn't agree more.

In the 80s, I wrote in assembly. In the 90s, I saw Microsoft release operating systems which were incredibly bloated and memory hungry. At the time, I thought it was insanely inefficient.

In hindsight, they were right and I was wrong. It's true that Microsoft's software was bloated, but they were prophetic enough to see that RAM and storage was going to get cheap. The hardware became powerful enough to make the bloat of the operating system unnoticeable.

dpc_pw · on May 8, 2020

> The hardware became powerful enough to make the bloat of the operating system unnoticeable.

No it didn't. Lots of stuff that was instant on the computer from the past, is slow on modern computers. Basic things like displaying content of a directory etc.

What changed is public perception - people accept bloated and terrible software as something normal.