Hacker News new | past | comments | ask | show | jobs | submit login
The Scale of AWS and What It Means for the Future of the Cloud (highscalability.com)
178 points by StylifyYourBlog on Jan 12, 2015 | hide | past | favorite | 58 comments

I sincerely hope there are a few top cloud providers fighting for market share. An AWS monopoly is not a good thing.

Google has awesome data center technology, and my guess is they have at least as many servers as AWS (even if most of those servers are for internal use). If they are willing to invest for the long term, they can be a credible player.

Microsoft appears to be deeply committed to Azure. As in - we can not lose, we will spend whatever it takes to make this fly. They will also leverage their somewhat captive enterprise customers with Azure / AD integration.

Not clear to me how the rest of the pack will fare.

We, in our modest way, continue to compete with S3 and Glacier for the purposes of online storage / offsite backups / cloud storage.

Last year (2014) we announced petabyte-scale offsite filesystems[1] that are price-competitive with S3/Glacier.

In fact, we have for many years solved a pain point for (some) customers who run their infrastructure on S3: "my infrastructure is on S3, and my backups are on ... S3 ?" ... and our support of s3tools in our environment makes that very simple.[2]

[1] UNIX based, runs on our ZFS platform with snapshots.

[2] ssh user@rsync.net s3cmd get s3://rsync/mscdex.exe

I've been quite impressed by AWS uptime. I've been looking at Azure - I am cautiously bullish. I'd like to take their services out for a genuine ride, but don't have the time to really invest deeply into exercising them.

They give you $150 credit each month for 3 years upon signing up with bizspark with nothing to pay up front. Surely that justifies taking the time to check it out if you're already curious.

$150/month can't even come close to paying me back for the downtime experienced using Azure. Plus their online console is one of the most useless interfaces I've used.

I am a BizSpark member, and my blog: http://www.devfactor.net is hosted there. I CAN say that when I got ~500k page-views in the last week of December Azure held up fine.

In terms of performance, it is pretty good for the price (free).

My only issue has been the down-time, Its actually been down more than my previous Linux host. I hope MS can figure out how to keep it online a bit more often :)

The time I would spend would be the time writing management code, putting the APIs through their paces, really finding the limits (I do that in some areas of AWS today). Unfortunately, I'd have to be doing that at my day job to really invest the quality time.

I will probably shake out an account and see what it's like for a cookie cutter user. I'm sure the experience is vastly different for a power user.

I've heard Cloud called a trillion-dollar market opportunity. This makes me cautiously optimistic that there will be multiple big players - with that much money at stake, most of the large tech companies can't afford to not bet big on it. Google's Cloud offering hasn't had much uptake, yet, but its existence alone is forcing big price drops on AWS and Azure.

> I've heard Cloud called a trillion-dollar market opportunity.

The term 'cloud' is a LOT of different technology based offerings/markets, which is why it is going to be so big. Compute is inevitable.

"The cloud will keep getting more reliable, more functional, and cheaper at a rate that you can't begin to match with your limited resources, generalist gear, bloated software stacks, slow supply chains, and outdated innovation paradigms."

If the cloud is so great, how come it's so much cheaper to rent a dedicated server at scale?

that's comparing an orange with the produce section of a supermarket; AWS is much more than virtual machines; for just a box with a set cpu, memory, disk and bandwidth you're right that it's cheaper to rent a dedicated server and even cheaper to co-locate.

Plus even for a single server with a cloud provider you still get advantages like one-click snapshot, monitoring dashboard etc etc.

For production/serious servers looking at price alone is not the best idea; what about other features like reliability (uptime), support etc.

Because it's a lot more difficult to scale. EC2 allows you to spin instances up and down, balance load between then and cluster databases, etc. etc.

If you know all you need is one server, dedicated is great. If you need flexibility, less so.

Nothing you mentioned, spinning instances up and down, balancing between them, clustering databases, is exclusive to EC2, AWS or "cloud" computing in general.

Scalability is largely a design exercise, not, as much as AWS sales engineers want your CTO/CFO to believe, an infrastructure exercise. At the point where infrastructure becomes an issue, you're building your own AWS.

I'm happy to admit that AWS might make some of this easier, but it's almost certainly going to be more expensive [1], and it's often at the cost of flexibility (lock-in, AWS-specific knowledge).

How are dedicated servers, or even collocated servers, possibly less flexible?

[1] There are exceptions, S3 and Route53 stand out as, at the very least, being cost-competitive to a greater extent than other AWS/Cloud offerings.

I'm unaware of any dedicated provider that will let me take a $20/mo small instance and scale it within minutes to an insanely beef 8 core instance for a day for about $20 and then back down, with less than 15 mins total downtime. Total cost for a month less than $50.

One of the things I feel AWS has succeeded at is putting control of infrastructure in the technology team's hands. In many places dedi or colo requires contract negotiation with the provider and involves some sort of purchasing dept. There are some places I've worked with where getting a dedi/colo could be weeks or months of different teams paperwork. With AWS the tech team can spin up 100 servers with no outside involvement needed after the initial work of getting an aws account with $x-xxx k/mo limit.

It's hard to beat the operational flexibility AWS provides, but I can see a few scenarios where creating your own mini private cloud out of dedi/colo servers could be more cost effective.

> I'm unaware of any dedicated provider that will let me take a $20/mo small instance ...

You are looking in the wrong places. Dedicated server providers offer dedicated servers. What you want is a VPS. There are dozens of VPS providers with a multitude of products and billing by the minute or by the month and everything in between.

> In many places dedi or colo requires contract negotiation with the provider and involves some sort of purchasing dept. There are some places I've worked with where getting a dedi/colo could be weeks or months of different teams paperwork.

This is merely a failure on the part of your employer.

> It's hard to beat the operational flexibility AWS provides, but I can see a few scenarios where creating your own mini private cloud out of dedi/colo servers could be more cost effective.

It's more like the other way around. To refute my point, please give some examples where AWS is cheaper AT SCALE.

AWS might make some things easier or more convenient, but in no way does it come cheaper at scale as their PR flak claims.

Cloud computing was supposed to be synonymous with the concept of utility computing. Where you can plug into any provider with your app and it just works and you pay a markup on the kilowatt hour so to speak. Until there's a uniform cloud platform, we're not there yet. I suspect we're at least a generation away.

EC2 doesn't just allow you to scale, it allows you to think of servers as disposable units, the same way people think of processes on bare iron.

I'm late to the AWS party, myself, but have been working on a project recently that leverages some of this. It is an eye opener--when you have virtual servers that are controllable by API, you really open up new frontiers of designing applications.

On a scale of 1 to 10, where 1 is on-premise server and 10 is 'the cloud', I think renting dedicated server can be a 7, which is pretty 'cloudy'.

Exactly. And one thing about cloud compute is that it is cheap for a subset of use-cases. Try going cloud on a high-bandwidth / low-cpu use-case.

AWS has really taken the IaaS thing seriously. I don't think about failed hard drives, power supplies or switches anymore.

I think about the health of my applications and design them to fail gracefully when failures happen. You can do this with hardware but then you've got to go to the DC or send someone there to fix it. It's not just an API call away from being fixed.

Great way to think about it! Running applications in a highly automated way is just so freeing.

I define my server fleet as an AutoScalingGroup. If any machine fails, a replacement is brought online and begins taking traffic automatically - taking only a few minutes. No operator intervention needed.

As a person who once had to recover failed machines manually and individually, it's a beautiful feeling. Not to mention the enormous flexibility of being able to launch additional machines at any time, or automate the scale-up process.

"All 14 other cloud providers combined have 1/5th the aggregate capacity of AWS (estimate by Gartner)", yet the slides say "5X the cloud capacity in use than the aggregate total of the other 14 providers". The "in use" part is very important to include in the sentence as it makes a difference to understand capacity vs just having a lot of customers.

I don't buy networking gear so this claim that it's getting more expensive over time really caught me by surprise. Anyone know why?

Maybe it is because the more people switch to cloud services the less hardware they need to buy. And then if the hardware manufactures can't do economy of scale, the cost of production will rise.

I often wonder about the implications of having the U.S. government (increasingly) running in Amazon's cloud...

You aren't the only one thinking about this. While government certainly does leverage the cloud for certain applications, there are many govt agencies which will never go near it. But, they still want the features of object storage, the clear API, the ability to scale up and distribute the storage easily. I work for a large company that builds and sells "private cloud" infrastructure, and many governments, banks, insurance companies, etc do not want their data out there on public networks or public cloud, even if it is encrypted. They already have IT staff, they have existing data centers, they can handle dropping in new gear and in the long run it will be cheaper for many workloads. The S3 API has become the standard, and other vendors are now implementing it (at least a subset) on their boxes, so you can use applications like Cloudberry against an EMC Atmos or Hitachi HCP, or write 3rd party applications against the S3 API and just change the endpoint as necessary.

This will change over the next decade. James is correct, the costs are simply dropping too fast. The existing constituencies like IT departments with existing data centers will increasingly have a hard time justifying their budgets. This will become more true as new applications that are designed for a cloud environment are written. The only applications I have seen that are really cheaper are 100% CPU bound workloads and there aren't very many of those in real life.

I have a hard time seeing the military and intelligence communities ever putting their data and computing up on Amazon. DMV, Social Security, and police records? Presidential archives? What about concerns such as regulatory compliance? Patient records from hospitals that must comply with HIPAA, and compliance with Sarbanes-Oxley regulation for financial records? Strategy and design documents from multinational corporations? I don't deny that data and compute will continue to move to the cloud over the next decade, but there are some things which may never make that transition.

AWS is HIPAA compliant (http://aws.amazon.com/compliance/). The CIA selected AWS for a large ($600MM) buildout. Strategy and design documents are already in the cloud organically via OneDrive, DropBox and Google Drive. NASDAQ does sell Sarbox compliance as a key feature of its cloud product but I have a hard time seeing how in the long run that will really differentiate.

The reality is that AWS is likely not any less secure than legacy IT infrastructure. It also provides great primitives for writing more secure applications.

Not just the government, probably most of the sites you visit in one way or another link to Amazon. Look at how many sites go down when there's an AWS outage.

What does this mean for personal privacy when most of the services you use are backed by one platform?

You make it sound like there is one shared database for everything. AWS does a great job of internal firewalling and separation.

"AWS outage" can mean various things, but there are and have been AZ-wide outages, and given the difficulties of replication and redundancy, there are services which are located (or reliant on -- even if unintentionally) a single AZ.

So yes, much online infrastructure _does_ have AWS dependencies. Sometimes (by way of secondary services) in ways that the operators of the service itself may not be directly aware.

This line of thought works with CDNs too.

>> Every day, AWS adds enough new server capacity to support all of Amazon’s global infrastructure when it was a $7B annual revenue enterprise (in 2004)

Curious to what that means or how it is measured.

What did Amazon run on in 2004? A warehouse full of E1xk's or had they started moving to x86 by then?

Are they adding a single cabinet of dense x86 servers each day (which I'm sure is as powerful as a datacenter full of Sun gear from 10 years ago)?

EDIT: I have been downvoted. This is not really an article, it is an advertisement for AWS. Perhaps people don't like me downplaying their commercial.

The future of the cloud is not AWS. Its not in Amazon's datacenter or some other company's data center. Its not even necessarily in a server.

The servers are going to mainly go away as we transition slowly from server-based networking to content-based networking.

That means that the fundamental protocols are completely unconcerned with what server they are running on or where.

The future is things like Named-Data Networking, Ethereum, distributed apps.

As a stepping stone we might see public clouds that allow you to deploy to ANY city anywhere in the world, enabled by distributed secure data storage and other technologies like Docker and OpenStack.

There is absolutely no reason everyone should run their applications on AWS.

We will also eventually move away from vendor-specific REST APIs to systems built on open semantic interface/data definitions.

I've been waiting for that to happen for the last 15 years, ever since Gnutella came out in 2000.

If you're going to claim that's the future, you ought to understand why distributed content-addressable P2P networks like Chord (created by YC's own Robert T. Morris), Kademlia, Alpine, and JavaSpaces all failed, and P2P sharing networks like Napster, Gnutella, Audiogalaxy, and Kazaa were unable to break out of their illegal-music-sharing niche. And then explain why it's different this time. If anything, the forces that made distributed hash tables unworkable in 2001 are stronger now, as Ethernet bandwidth, file size, and disk space have increased much faster than consumer Internet bandwidth.

"UCLA, Cisco & more join forces to replace TCP/IP" http://www.networkworld.com/article/2602109/lan-wan/ucla-cis...

I guess your prophesies seem groundless to some people here.

For example, I have not idea why future you are describing is going happen. There are many other alternatives.

What if future will be all about cloud computing provided by gargantuan sized companies? What if only a few hosting companies will remain and personal owning of computing device would be economically inefficient?

If you replace "the cloud" with "the mainframe" we are heading at full tilt back to the 70s, and that's a good thing.

I might be misunderstanding you, but I don't think any extreme is a good thing simply because being limited to an extreme means having less choice, and that is a bad thing. There are advantages to centralism, there are advantages to distributed systems and there are advantages to doing the processing locally. The best state of affairs is having all these options available to you when deciding how to architect the best system for your requirements. If a mainframe type solution is the best, do that. If running it on your PC can solve the problem, then do that. If you need a combination then do that. Personally, I think that returning to the "mainframe does it all" concept is a huge step back in terms of freedom, and therefore in terms of potential.

The papers for CMU's cloudlets project cover some topological assets of code that runs centrally or is slaved to central logic, http://elijah.cs.cmu.edu

The next step is formalizing interactions for code that runs on the same node (central or edge), but originates from competitive businesses.

In uptime, security and manageability, nothing can touch the mainframe. If you build a mainframe out of thousands of CPUs and call it a "cloud" that's fine :-)

Cloud-to-butt is like a gift that keeps on giving.

I can't protect my data from some random server it might run on. I happen to trust AWS with my data. AWS also scales vastly quicker than anything I could afford to do on my own.

There are a huge number of use cases where AWS / data center / servers are the best fit. That will remain true for the foreseeable future.

I happen to trust AWS with my data.

Yeah, so let's host everything in the world on the servers of a single company. What could possibly go wrong?

The magic homomorphic system doesn't exist, and I don't see any reason to believe it will exist on a time frame I care about. Short of that we can either deploy our own infrastructure (capital) or lease it from someone else (opex/"cloud"). The cost/benefit there is a business decision, and I see no reason to believe the latter option is truly "a single company."

tldr: AWS meets the security, privacy, and regulatory demands of the financial services industry.

There was an AWS:reInvent 2014 presentation about NASDAQ OMX. OMX is the holding company of NASDAQ that develops and grows the technology that runs stock exchanges in many countries.

OMX is using Redshift to build a cloud solution (FinQloud - Regulatory Records Retention) for their 20+ exchange customers (worldwide stock exchanges). To protect their data, they use HSMs (actually, a cluster of HSMs) to encrypt the data. NASDAQ OMX has a direct, leased connection to the AWS Data Centers. The data is stored on s3 in encrypted form and only decrypted at the time of Redshift building the reports by getting the decryption key from the HSM (over the leased connection). They have multiple alarms and monitoring around Redshift access in their offshore ops center (e.g. the postgres audit table).

True data privacy and protection is near impossible but Amazon makes it easier to achieve high-levels of data privacy and security.



The idea of encryption is that you can store it anywhere and it'd be secure.

If you can find an example of NASDAQ OMX using EC2 machines to host the data, maybe I'll believe you. But for now, your post is pretty bland hyperbole.

I'm sure the cloud is secure enough for a lot of businesses. But I think the links you have are reeking of "marketing", as opposed to a practical example.

They're doing the processing in RedShift, which is a hosted Amazon service, running on Amazon hardware.

I get downvoted all the time. Don't sweat bandwagon bias. It is what it is.

I'm working on this: http://utter.io/

Wow.. that is so similar to what I was thinking of. I knew I wasn't crazy. Well, I was pretty sure. I am very broke at the moment but I contributed a little to your Indiegogo.

I'm all for the decentralization things like Bitcoin can bring, but Ethereum is a scam (like so many others that are using the word "blockchain" to attract fools' money). That might be one of the reasons you are being downvoted.

With a cloud provider I at least have some confidence what I'm running on vs. some rando who's trying to skim personal info from one of your magical distributed nodes.

You weren't downvoted because you're downplaying a commercial; you were downvoted because you forgot the ponies.

This is from 2003. It explains why the Semantic web won't happen anytime soon. You're welcome to keep trying, though. Sounds cool.


"...they built to their own networking designs. Special routers were built."

What an audacious call by Amazon

This is an awesome video about why they had to, and how it works: https://www.youtube.com/watch?v=Zd5hsL-JNY4

It's becoming easier to customize servers and routers, http://www.opencompute.org/projects/networking/

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact