

Amazon CTO: 'You should be able to walk away' from cloud providers - alphadoggs
http://www.networkworld.com/news/2012/042012-aws-lockin-258519.html

======
bermanoid
"Koffler says it's not necessarily hard to move data out of S3, it just may be
more expensive than putting it in."

This is a ridiculous statement. PUTs are $0.01 for 1,000 requests, and GETs
are $0.01 for 10,000. Deletion is free. Data transfer out per GB is comparable
to the monthly cost per GB.

Sure, putting data in is cheap because you don't get charged for bandwidth,
but storing it costs just as much per month as moving it out would. If you've
got somewhere better to store your data than S3, you'll come out ahead price-
wise after a single month, so you should do it. That's not lock-in, not even
close.

 _Convenience-wise_ , yes, there's a lot of lock-in with AWS, no question. But
that's a different problem, it's not fair to imply that you're locked in
because of charges to actually get your data out.

~~~
jacquesm
You'd be pretty silly if you had all your data _only_ stored on S3 without a
backup under your own control. So moving out should come at a cost of $0. You
should be able to cancel your account at a moments notice and not lose a
single bit, if only because the other side can do the same.

~~~
jshen
where do you put this backup? Another cloud provider? Servers you manage? Etc.

I think you're understating the cost of keeping such a backup, and the
complexity that may come with it.

~~~
jacquesm
I think you're underestimating the cost of not keeping such a backup. That's a
terminal mistake if there ever was one.

~~~
blake8086
I would love to hear both of your estimates of the costs.

~~~
jrockway
Not having a backup: your company goes out of business.

Having a backup: $1000 a year for a dedicated server you physically own.

~~~
dangrossman
Not all businesses would go out of business if some data was not backed up. If
Chartbeat lost its data, it would be a minor inconvenience that the feature of
their real-time dashboard where you can rewind and look back at an earlier
point of the day wouldn't work until it repopulated the next day. They
probably wouldn't even lose a customer over it. When they were a young company
without much funding, it might've made business sense to only back up their
accounts and not the historical data for the sites being tracked.

Not all data can be backed up for only $1000 a year either; it's not just a
matter of storage costs and a server. A write-heavy service like Foursquare
can't just turn off and do a dump at night; to have backups they need enough
server capacity to replicate data as it comes in. For those servers to keep
up, they have to be as beefy as the ones they're replicating -- a year or so
ago, that was 64GB RAM per database for their Mongo instances. That's
definitely more than $1000 a year's worth.

~~~
beagle3
> a year or so ago, that was 64GB RAM per database for their Mongo instances.
> That's definitely more than $1000 a year's worth.

You are mixing data, memory and processing power. In the case Amazon crashes
and burns, they need that CPU+Ram+bandwidth etc. near the data.

But to just keep a copy of the data (not available online) they don't need a
really beefy server -- just store all the updates from a node, and replay them
later at your leisure. 5400rpm will often be fast enough for this purpose (of
course, you would need days to recover in this setup ... but that might be ok
if past data is not needed online)

Take twitter or foursquare for example: They need everyone's most recent
max(4, tweets/checkins in the last day) available online at any given time.
But if last years' tweets/checkins are not available online when the system is
degraded in case of backup restoration etc -- that may be acceptable.

------
aristus
My friend Tom and I did a talk about this at various conferences, wow, 2 years
ago. The interesting thing is that we're following the same evolution as the
old snail-mail network and the SMS network. Eventually we'll realize that
"cloud peering" is the best solution. But there will be lots of sturm und
drang before we get there.

[http://assets.en.oreilly.com/1/event/31/The%20Cloud_s%20Hidd...](http://assets.en.oreilly.com/1/event/31/The%20Cloud_s%20Hidden%20Lockin_%20Network%20Latency%20Presentation.pdf)

[http://www.slideshare.net/sh1mmer/the-clouds-hidden-
lockin-n...](http://www.slideshare.net/sh1mmer/the-clouds-hidden-lockin-
network-latency)

There were four separate things needed to bring about what we think of as the
modern international mail system. First you needed better infrastructure
inside and between countries. You needed literally more portable objects,
envelopes with stamps on them, instead of loose sheets, scrolls, wax seals,
etc. You also needed standardization of the rates and address formats so one
letter can travel anywhere. Last was a uniform rate, no-questions-asked
promise to deliver via optimized routes, what we would now call a "peering
agreement". Two ends of the agreement would treat each other as peers, and
honor each others' comminucations as they would their own.

It's that kind of system we don't have, but should, in the cloud. We need
better infrastructure in the form of optimized routes between clouds. We need
to be able to move our virtual machines and configurations around without
special help. We need to make sure we don't get locked-in by screwball APIs or
data formats. Most of all, we need the various cloud and web services vendors
to commit to honoring each other's traffic without clobbering us, their
customers, with metered billing.

~~~
ianso
Hi there,

This is super-interesting - may I ask where you learned about this part of the
history of the postal service?

~~~
aristus
I started researching peering, and came across references to the SMS study and
the Treaty of Berne. From those two clues I found everything else via
Wikipedia and the library. Iirc there is a book about the history of the Berne
treaty but the name escapes me.

------
MartinCron
I just tried to find the original source of the quote "the best SLA is choice"
that I originally heard from Matt Mullenweg from WordPress. It seems relevant
now.

------
christkv
EC2 is one thing and is fairly simple to move away from, more problematic is
S3, DyanmoDB or Google App engine or anything like that. They touch only
lightly on the subject to be honest. ORM's lol will not solve your problem in
this case.

This article is way to fluffy to offer any proper advice.

------
nlake44
While there are a lot of AWS offerings/cloud services which have vendor
lockin, the core EC2 API is supported by Eucalyptus (<http://eucalyptus.com>)
and OpenStack (<http://openstack.com>).

~~~
jdunck
I don't think it's fair to call richer services lockin. GAE has lockin, in my
opinion, because basic services use GAE-specific APIs and common needs (like
scheduled jobs) are accomplished in uncommon ways.

If you're using auto-scaling or beanstalk or whatever, that's an example of
AWS offering something where there isn't a norm, isn't a "normal way" to do
it, and if you need that thing, your other alternative is to build it yourself
- which is still an option should you decide to move away from AWS.

Lockin is avoidable switching cost which benefit the party imposing the
switching cost. Are there examples of AWS doing that?

~~~
bermanoid
One I can think of is related to RDS - if you run your own MySQL instances,
you can pretty easily migrate away to another set of (MySQL) machines by
replicating off the master onto a new slave, and then promoting the slave to
master and shutting down the original master. RDS doesn't allow (external)
replication, so your only option to get away from RDS altogether is to do a
mysqldump, which can get nasty if you are trying to migrate away without
downtime (a full export/import via mysqldump can take a long time, and you'll
miss any updates during the process). At the very least, it involves either a
chunk of downtime or a decent bit of manual migration, which is unfortunate.

That said, I don't think they set it up this way to specifically cause lock-
in, it's just technically difficult to allow users to manually manage
replication alongside RDSs automated system.

~~~
foolinator
Their messaging is a non-standard protocol. AMQP standard would be kick ass.

Same with their cloud search - sooo obviously solr/lucene-based but 0 exposure
to the lucene syntax.

Not a big deal though, it's not like it totally locks you in to use these.

~~~
ceejayoz
> Same with their cloud search - sooo obviously solr/lucene-based

Except it isn't. <http://aws.amazon.com/cloudsearch/>

"Amazon CloudSearch was created from the same A9 technology that powers search
on Amazon.com."

------
dredmorbius
Another project forwarded me by a fellow KPS engineer is the DeltaCloud
project from the Apache Software Foundation. An API abstracting differences
between clouds.

Not using it (yet), but it's on our viewscope.

<http://deltacloud.apache.org/>

------
jedberg
For reference, here is the original blog post on Data Gravity:

[http://blog.mccrory.me/2010/12/07/data-gravity-in-the-
clouds...](http://blog.mccrory.me/2010/12/07/data-gravity-in-the-clouds/)

~~~
mccrory
It's ironic that two vendors are so desperately trying to convince people that
there is no issue with storing large amounts of data with them because it is
so trivial to get it out. Never taking into account the dependencies that are
created (and then broken if you try to move the data). Dependencies include:
Latency and Bandwidth, Data that you don't own, the sheer volume of the data
(try moving 10PB around), proprietary or specialized APIs, regulatory, and
cost...

~~~
ceejayoz
None of those dependencies are unique to the two vendors. With large amounts
of data, most of those apply, even if you're on a fully open stack.

~~~
mccrory
Absolutely agree, there is no magic fix for the effects that Data Gravity has
(there are ways to lessen the effects). The only exception is in the API, a
fully open stack gives you greater portability of your app and export of your
data is easier because transformation isn't required (just export/import or
replication/copy).

------
fingerprinter
Something like this is why I'm really, really excited for Ubuntu 12.10, Juju
and Awsome. A huge step in the right direction for sure.

------
__alexs
If they really meant this they would AGPL their stack. It's the only way to
provide your customers with the sort of ecosystem that allows them to freely
choose their suppliers.

~~~
bitops
Clearly, they do not mean this in the way that you're implying. Amazon is not
about to change their business model to become a professional services
company. They are still retail first, cloud second (though that may change, of
course).

~~~
__alexs
I wasn't implying that Amazon should change their business model to focus on
services any more than they are currently.

------
Gring
Apple, as always, was not available for comment...

~~~
Gring
Downvoters should turn on their brains before voting. This was very much on
topic. gggrrrr

