
Amazon's Cloud Is One of the Fastest-Growing Software Businesses in History - adventured
http://www.businessweek.com/articles/2014-07-15/amazons-cloud-is-one-of-the-fastest-growing-software-businesses-in-history#r=read
======
deeths
AWS is doing some great stuff, but this is a really misleading article.

Ignoring the obvious issue that AWS is a service and not a software business
(and they have very different growth characteristics), the data on the graph
is incorrect. Perhaps calling it a "tech" company would have been more clear.

AWS's $3.1B revenue in 2013 would be year 3 of the graph. Everything after
that is speculation, but is compared to actual historic data from other
companies (without explanation).

Several of the other companies lines seem significantly off also. For instance
MSFT's first billion dollar year was 1990. By year 3 they were making $2.76B,
not the just under $2B shown. By year 4 they were showing $3.75B, not the
under $3B shown. By year 6, MSFT was making $5.94B, not the $4.5B shown.
Similarly, the lines for CRM and VMW are showing lower revenue numbers than
the actual ones for the later years. For instance CRM should be $3.72B in year
4, not the $1.75B shown.

If you stop the graph at year 3, where actual AWS data stops, AWS has grown
only a little bit faster than MSFT. Also, the periods graphed for MSFT, CRM,
and VMW all include recessions, whereas AWS is growing during a boom.

~~~
eevilspock
Furthermore, if you inflation adjust all of the numbers, AWS's graph will
deflate further.

------
padobson
I'm not surprised at all. When I design a web app, I'll design it around AWS
infrastructure so that they're the "Ops" in my "DevOps" title. They handle
everything about web app development that I don't care to.

~~~
tluyben2
I wanted to come and type that up here; when we do apps, web, scaling, cdn,
file storage, trivial SSL etc we just prep for AWS. All the scripts are there
as most people ask for it and it always works. With their solid offerings and
constant price drops I would see no reason to ever move. It's all the stuff so
painful once and now is all such a warm bath.

~~~
yry4345
> "I would see no reason to ever move."

(Obligatory) Apart from the single political point of control. AWS is to web
apps what GoDaddy is to DNS and Verisign is to SSL.

~~~
iancarroll
You're right, actually. I've dealt with all three, AWS is definitely more
friendly (if we're talking about support here) but they all have the monopoly.
AWS, honestly, is the only one that actually has an advantage. Verisign is
overpriced, GoDaddy is just a typical domain registrar.

------
ChuckMcM
It isn't particularly surprising, but it would be much more interesting to
look at margins vs revenue. What is the margin on that $5B vs the $3.1B? I'm
guessing that given the recent price cuts in between Amazon, Google,
Softlayer, and others that the margins are starting to get squeezed down a
bit.

I find that particularly interesting when looking at the relative success of
Docker. The reason being that IBM's Z1 mainframe seems to have better
operational economics near full utilization than rack and stack servers do.
You can see how that works by looking at where the OpenCompute stuff takes out
cost, no more sheet metal around servers, shares power supplies etc.

Makes me think that 'blade' computing should have been 'building sized' from
the beginning :-)

~~~
opendais
Aren't those like $75,000?

I'm curious what you are basing the operational economics statement on.

~~~
ChuckMcM
I currently have about a thousand servers that do Blekko's web crawl in a colo
facility, they are pretty modernish (westmere vs Ivy bridge but similar
configuration), the operation economics for running those is a couple million
a year fully loaded. My particular application is a 'bad fit' for services
like AWS due to its large storage foot print and the fact that it constantly
churns through petabytes of data but were we to host web servers w/ Docker or
the equivalent we could probably provide something like a 100K web hosts.
Doing that $5 'droplet' style with Digital ocean would be say a half a million
a month so pretty good margins (caveat all the things like racks and drive
replacement and floor space) but when those margins got squeezed something
like an equivalently sized Z10 could shave operational costs from both a
operator perspective and a power/cooling perspective. (total watts per
delivered core-gigabyte-megabit of compute delivered to the Internet)

At web scale, costs shift in interesting ways, operator time is much more
valuable than say the cost of hardware power, and cooling overwhelm floorspace
costs etc. Luiz Barrosa is an engineer who gave a talk titled Data Center
sized computers which alluded to some of those shifts, Urs and Luiz updated it
a couple of times.

As I've gone from NetApp to Google and now to Blekko it has been interesting
to me to watch how the multi-core, large memory, SSD changes have pushed
around system design like this. And I was thinking that back in the day the
"big problem" with blade computing was that using 20 blade chassis an install
with 20 blades was cost effective but one with 21 blades was not. The
virtualization wave made that moot since you only install full racks/chassis
as demand calls for it, and current container wave adds to that single OS type
installs for even better cost savings. Fun times.

~~~
dekhn
Why do you think AWS is a poor fit for your workload? There are customers who
churn petabytes in S3. If I was running a crawl and indexing operation, I'd
put the bytes in blob storage like that, and aggressively negotiate better
pricing with Amazon.

There are some pretty obvious reasons mainframes aren't the server format of
choice for today's cloud. The people who are building to today's cloud grew up
in a world where fast desktop computers running linux were ubiquitous and
cheap (important for poor students learning to ship code on a budget) while
mainframes were something you couldn't easily get access to even while waving
large sums of cash at IBM.

No doubt there could be warehouse scale systems that were more efficient,
overall, compared to the current designs that the cloud providers use. Every
bit of the stack could be squeezed to provide exactly the hardware you needed
for a particular problem. It doesn't seem like the economic incentives exist
at this time, for a provider like Amazon (I imagine they make far more money
on modest VM configs with no GPUs or Infiniband than they do with the high end
stuff even if the latter has a higher profit margin, because volume wins in
cloud)

Multi-core-large-memory-SSD is just the latest architectural evolution. We've
just been pushing the bottleneck between the various parts of the computer,
and custom manufacturers have grown rich and then out of business producing a
system that had 30% more "X" than what you could buy top-of-the-line from Dell
(Bull will sell you an x86 system with 24T of RAM!). Right now the pain point
is having many memory spaces across machines, rather than a single super-fast
RAM you can access from any processor (the laws of physics and cache coherency
suggest this is a pretty hard problem), because multicore and SSD got so big
and fast.

~~~
ChuckMcM

       > Why do you think AWS is a poor fit for your workload?
    

Every time I've done the math it comes out 10x as expensive, and I've done the
math with pretty much everyone from the CTO down to the guy who is "sure we
can do it cost effectively."

Generally folks who have petabytes of data in S3 don't have large amounts of
read/write change. So a typical file or image sharing site like Imgur or
Github will be 'read mostly'. When you're doing search you crawl billions of
documents, often replacing 30 - 40% in your store, and you are constantly re-
reading them as you index and rank them. Further, as you process the data what
you really want to do is push your computation out _to_ the data rather than
pull it over the wire, mutate it, and push it back over the wire. Processing
through 1 petabyte of data, on a 10gbps backbone where you pull it and then
push it, running full duplex (so you're pushing and pulling at the same time)
takes a million seconds at a gigabyte per second. That is 11 days, 13 hrs, 47
minutes. That is what happens changing 1/3 of a 3PB data set. Pushing the
computation _into_ the data (which is to say running the processors where your
data is actually stored on disk) you can process a petabyte (assuming your
data distribution algorithm is good (and ours is)) in about a day and a half.
Not quite 1/10th the time.

If you want to coordinate 10,000 worker threads which are working through your
data, you need to be able to share messages between them, they don't have to
happen a lot but their latency adds up if they take too long.

You end up asking for the same system built in the "cloud" that you've built
in a colo; Dedicated "fat" machines with lots of memory and disk, all within
easy network 'shouting' distance (aka a non-blocking full crossbar bandwidth
network) from each other without any confounding network traffic going over
your back bone. And when you arrive at that inevitable conclusion, the actual
loaded cost of the machine falls right out the bottom and lands in the
customers lap including all the loaded up margin costs. Three months or so ago
(right after the last price war) we ran all the numbers again, Amazon would
cost about $1.5M/month to let us do what we want to do.

Now as I point out to cloud sales guys, and to you, this isn't "bad" or
actually a problem, building search engines requires an extraordinary amount
of horsepower to be brought to bear on a very large, very noisy, data set.
There is absolutely no rationale for making that configuration cost effective,
its an outlier, the number of people who do that you can count on one hand.
But it is the same reason that people don't just pop out the AWS toolkit and
poof crawl and index billions of web pages :-)

    
    
       > volume wins in cloud
    

I don't agree with that, I think what 'wins' in the cloud is the ability to
oversubscribe the hardware. Just like Comcast sells everyone on the street 50
megabits of internet knowing darn well that more than a handful use that much
at the same time everyone will throttle down, cloud providers sell you an
'instance' which probably spends most of its time not doing anything at all.
And while it isn't someone else is. _That_ is the 'magic' that makes this
stuff so profitable for Amazon. Not people like me who have 100GB memory
machines running at 85% utilization 24 hours a day[1]. It would be like
everyone on the block signing up for high speed internet and every one of us
downloading a copy of the entire Internet Archive :-) Not a likely situation,
so rarely considered something the infrastructure needs to support.

[1] To be fair, they don't do that continuously, crawls start and stop and we
switch things around, but when they are in the fetcher/extractor phase and
running at R3, its a wonder to behold :-)

~~~
kordless
> what 'wins' in the cloud is the ability to oversubscribe the hardware

We're increasing compute exponentially, so it makes sense we'd want to over
subscribe it as much as possible. Demand is like a dog nipping at your heels.

------
beachstartup
a couple of points about aws:

1\. it's not really a software business. their marginal cost of products sold
is nowhere near that of software. the software they develop facilitiates the
sales/leasing of hardware and networks.

2\. it's a division of a much larger parent company that poured billions into
it. it's a "business" but not a company, which is what they're comparing it
against.

~~~
dba7dba
I agree with the department versus a business. No other startup could've done
what they did, providing the tech knowhow, logistics (always a big issue at
that scale), and the brand.

EVERYONE willingly gives their credit card info to Amazon.

------
flyt
One of those rare examples where the best product is winning in an entire
category.

~~~
meddlepal
I don't know if they're the best at everything. I've been using Google Compute
Engine extensively lately and I think it is a lot more polished feeling at
times than EC2.

~~~
tszming
I am interested to know which part of the GCE is a lot more polished than EC2.

~~~
crb
One data point: On AWS, a hardware outage takes your machine down (with
notification). GCE offers live migration, which you generally don't even know
has happened. See [http://www.rightscale.com/blog/cloud-industry-
insights/googl...](http://www.rightscale.com/blog/cloud-industry-
insights/google-compute-engine-live-migration-passes-test)

~~~
toomuchtodo
You should be building your AWS infrastructure around failure scenarios, and
while you should be notified of a failure, your app should continue to chug
along.

/DevOps

~~~
crazypyro
Isn't he just saying that Google does this automatically for you?

------
dictum
Amazon's cloud services are a software business in the same way a utility
company that wrote software to automate some management tasks is a software
business.

~~~
teej
I had to call a human being and use a fax machine to get my power turned on
with the local utility. I can spin up an EC2 cluster with an API call.

~~~
opendais
Heh. I signed up on a website and put in a requests to turn my utilities on.

