
AWS Snowmobile – Move Exabytes of Data to the Cloud in Weeks - chang2301
https://aws.amazon.com/blogs/aws/aws-snowmobile-move-exabytes-of-data-to-the-cloud-in-weeks/
======
wazoox
Previous discussion:
[https://news.ycombinator.com/item?id=13073176](https://news.ycombinator.com/item?id=13073176)

------
andrelaszlo

        Never underestimate the bandwidth of a station wagon full of
        tapes hurtling down the highway.
    

—Tanenbaum, Andrew S. (1989). Computer Networks. p. 57.

~~~
mojoe
Never underestimate the street value either. I wonder if they have an armed
escort for this truck -- the hardware must cost on the order of 10-20 million,
and the data itself could be worth many multiples of that. Could make a great
heist movie.

~~~
tyingq
I wonder if Amazon pays for the re-do if the truck ends up in a spectacular
accident.

~~~
mountaineer22
Can't one simply mirror a Snowmobile to a second Snowmobile prior to
transport?

~~~
superuser2
Redundant Array of Independent Trucks?

------
jasode
As other commenters noted, it's fascinating that no matter how advanced the
networking technology progresses, we'll always have a variation of
"sneakernet"[1] to bypass the limitations of the network. The sneakernet just
evolves from floppies to 45-foot shipping containers.

If humans later colonize Mars and want to have the full 50-terabytes copy of
Wikipedia in the biosphere, it's faster to send some harddrives as a rocket
payload on a 6 month journey rather than try to transfer it via the 32kbps
uplink[2] which would take ~500 years.

[1]
[https://en.wikipedia.org/wiki/Sneakernet](https://en.wikipedia.org/wiki/Sneakernet)

[2]
[http://mars.nasa.gov/msl/mission/communicationwithearth/data...](http://mars.nasa.gov/msl/mission/communicationwithearth/data/)

~~~
tyingq
A screenshot of Tanenbaum's _" Never underestimate the bandwidth of a station
wagon full of tapes hurtling down the highway"_ argument with the context
around it from his book.

[http://imgur.com/a/qi4BP](http://imgur.com/a/qi4BP)

~~~
teddyh
A thing bothers me regarding that calculation. Shouldn’t the time it takes to
_write_ the data to all those tapes be taken into account? I.e., if you’re
using tapes to transport data, your real bottleneck is your tape drive write
speed.

~~~
tyingq
Perhaps an artifact of the time period. In the 90's, you likely already did
backups to LTO/Ultrium tapes daily. So the time was considered already sunk
cost.

------
FryHigh
I feel unusual when I see articles like this. I deploy "workloads" that
require instances, auto scaling, Multi-AZ etc. It makes my projects feel
minuscule at the scale of other companies that actually use something like
this! I wonder how many companies will actually use this in any given year.

~~~
Beltiras
I imagine surprisingly many. I have operations that are not remotely on that
scale. 8 employees with total data on the order of tens of terabytes. I found
that to be a surprisingly heavy density of data per employee. A 1000 employee
company with the same density is on the petabyte scale.

------
dx034
I wonder why this makes sense. Isn't it more useful to get a few hundred
snowballs and ship them via Fedex? You can transfer in parallel and should be
at the same speed as with Snowmobile. It's at the DC next day and the data
will be faster in S3 than by truck. Also, the economies of scale will never
pay off for Snowmobile, likely more for Snowball.

At the same time, logistics (incl insurance and security) is handled by
companies that are very good at it. Fedex, DHL and the like offer physical
security services for goods if you need it in addition to encryption.

Think it's a PR move only. They will probably find a few clients to somehow
utilize one truck, but I don't think it's more efficient than Snowballs.

~~~
kabdib
Installing, powering and cabling "a few hundred" of anything in a datacenter
is a big deal. You probably don't have room. You may not have power. You have
to deal with hundreds of boxes, cardboard isn't allowed on the datacenter
floor (ideally), and just mucking around on the loading dock wrangling stupid
stuff like shipping labels is going to suck up a ton of time.

[I'm a C++ dev who likes to help design and build datacenters. It's fun.]

~~~
dx034
But isn't that the same problem with the truck? You also need to get that
connected to the datacenter for a week? I guess as that connection has to
leave the building, it should be even harder?

~~~
ghshephard
Just run a conduit of fiber out to the parking lot temporarily.

------
brianwawok
Unless my math is wrong, storing 100PB in S3 is something like $2,750,000 per
month.

Then what happens in 5 years if local storage cost dropped by a factor of 10,
but S3 cost did not drop?

Big risk, no?

~~~
gamegoblin
If somehow a technology is developed to allow local storage cost to drop by a
factor of 10, don't you think S3 would make use of the same technology to stay
competitive?

Cloud storage is a commodity these days. The market saves you in this case --
if one cloud provider didn't use the 10x technology and pass along the 10x
savings to the customer, another company would do it and steal all of their
customers.

~~~
adrianN
You'd still need to get your data out of Amazon's cloud and into the
competitor's. That's neither cheap nor easy.

~~~
gamegoblin
My point is that _all_ cloud competitors would necessarily switch for economic
reasons. If not to keep you from leaving, then to keep a competitor from
capturing all _new_ growth.

~~~
brianwawok
Thats not how it works. People still use IBM mainframes, despite linux being
10x cheaper or more. If you got locked in sometimes you get locked in.

Many AWS services have no easy migration path off it.

~~~
gamegoblin
That's absolutely how it works.

If a 10x cost reduction storage technology comes along, cloud providers _will_
necessarily adopt it and _will_ reduce their prices by approximately 10x.

Here's why:

\- If they don't, it will become more cost effective for potential customers
to run their own datacenters rather than put data into the cloud, so their
growth will basically stop.

\- Even if potential customers don't want to run their own datacenters, both
potential and _existing_ customers will put new data to a competitor who _did_
pass along the 10x savings to the customer. So again, their growth will
basically stop.

This is the nature of a commodity product in a free market. Basically all
cloud providers use an S3-compatible API, and costs and performance are in the
same ballpark. There are tons of open source compatibility layers that
abstract which provider you are putting to. If one of them starts costing 10x
less, you just flip the switch and all new data goes there.

The ability for customers to completely cut off growth of their service if
prices don't fall is a supreme motivator. The only case this wouldn't be true
is if all of the cloud storage services formed a cartel to fix prices. But
given that every time Google or Amazon lowers prices, the other one follows to
maintain parity, we have evidence that that isn't the case.

Addendum:

I don't understand your analogy at all. IBM mainframes are a specific type of
hardware that excel at highly available batch and transactional processing.
Linux is an operating system. Linux is free, so it's infinitely cheaper. Also,
IBM mainframes run Linux.

~~~
brianwawok
> If a 10x cost reduction storage technology comes along, cloud providers will
> necessarily adopt it and will reduce their prices by approximately 10x.

Here are my two counterpoints.

1) Bandwidth prices HAVE fallen 10x in the past N years. Many cloud providers
(ovh, etc) DO offer this price drop. Yet how many people really left AWS or
GCE for ovh? I would guess not that many.

2) As I said with mainframes.. there is a 10x cheaper option to a mainframe
that has been around for oh 15 years. But people are still on them, BECAUSE
they are still locked in to them. That is my point. Don't get locked into a
single anything. Sending 100 PB data for a commercial entity to hold for you,
with no guarantees of future pricing, is a bad move. Locking yourself in is
one of the worst things you can do as a company.

~~~
gamegoblin
@1: That doesn't directly address either of my points. You're just providing
an orthogonal example of a something that cloud providers didn't move on.

I agree that the bandwidth pricing is almost certainly designed to create
lock-in. What I am contending is that that is unrelated to the storage
pricing. You'll notice that my arguments didn't include bandwidth pricing at
all, because it is irrelevant to those arguments.

I'll give a more concrete example. Let's say I want to transfer 10PB out of
S3. On their pricing sheet they actually say to contact them to get a quote,
but before that the prices are dropping pretty fast as you get more data. e.g.
10TB is 9 cents per GB, but after you get past 300TB you're paying 5 cents per
GB.

Let's be pessimistic and assume 5 cents per GB, even though you could probably
get it for much cheaper by contacting them.

So 10PB will cost me (10PB * (5c/GB)) = $500K to export

Standard storage is about 2 cents per GB. So your 10PB sitting in S3 is
costing you 200K / month just to sit there and do nothing.

Do you see the problem here? Moving onto the 20K / month provider becomes
cost-positive after 3 months.

Even if they were to _increase_ bandwidth costs 10x, it'd still become cost-
positive in a relatively short amount of time (couple of years).

Furthermore, if you are paying S3 millions of dollars per year to store data,
you're almost certainly in a position to get them to contractually agree to
that cheaper-than-public bandwidth cost I mentioned earlier, so you don't even
have to worry about the situation in which they hold you hostage by increasing
bandwidth costs 1000x.

 _Yet furthermore_ , to reiterate my previous point even if they were to
increase bandwidth costs 1000x on normal customers that didn't have enough
clout to get contractual guarantees, that would kill their business. Nobody
new would put any new data in them. Sure, they could hold the existing data
hostage, but absolutely nobody is going to put any new data there.

Similarly, if they were to not decrease storage costs 10x compared to a
virtually-identical competitor, nobody would put _new_ data with them. This is
the part that is totally unrelated to bandwidth costs. Everyone would start
putting all of their _new_ data into the competitor, regardless of their old
data being locked in.

The fact that Amazon and Google both immediately reduce storage prices after
the other one does is evidence that this is the case.

@2 I think you underestimate mainframes. People who use them aren't totally
stupid. They do a specific thing very well. Right tool for the job and all
that.

------
robbiemitchell
A surprising number of comments are questions that are answered explicitly in
the article.

PSA: consume the full content before you comment on the content.

------
bluedino
>> We needed a solution that could move our 100 PB archive but could not find
one until now with AWS Snowmobile.

Around this time last year BackBlaze had 200PB of customer backups. They
described storing it on _54,675 hard drives across 1,215 Storage Pods_

So imagine 600 storage pods or half of BackBlaze's entire operation, for just
one customer. Insane.

------
chucknelson
I love how this is a literal trailer pulled by a truck - see the pictures at
[https://aws.amazon.com/snowmobile/](https://aws.amazon.com/snowmobile/).

I wish they gave more details as to what hardware was in there - are there any
pictures of what the trailer looks like on the inside?

~~~
zeristor
So sneaker net was once someone walking around with burnt CD-ROM, then it was
a box of drives, and now its a truck.

Quite a dramatic illustration of the increase in data usage.

If one extrapolates next stop will be a train, and then a container ship full
of hard drives.

~~~
masklinn
> then it was a box of drives, and now its a truck.

Between that was FEDEX-ing a NAS. That's pretty much the standard data-
exchange format in astronomy, so you have a fast link and don't need to bother
plugging a bunch of drives in, just plug the NAS to the power and network and
off you go.

~~~
zeristor
Hmmm....

So when will The Universe be too small to data considering data use is
exponential.

I imagine there are lower upper limits, but this would be the upper upper
limit.

------
neals
This really helps me, personally, abstract the concept of 'data' a lot more.
It's not about files or records, data has a 'volume'. A few PB fills up a
shipping container.

Next time my client asks me how "much" a PB is, I can just say "about a
shippingcontainer's worth".

~~~
rockostrich
But bytes/volume isn't a constant. You may be able to say a 100 PB is equal to
about a shipping container now, but in 10 years it will be much smaller
(probably not keeping up with Moore's Law for storage though).

~~~
masklinn
Even then it depends on medium, an LTO-7 or HDD can store ~26GB/cm3 while an
SDXC tops out at around 1TB/cm3.

------
OJFord
Every blog post should have a Lego visualisation like this.

~~~
hackcrafter
Agreed! But for the love of LEGO, after all that effort to make great custom
scenes, they could have used a better camera!

Those photos look like they were taken with a poorly focusing cheap
smartphone.

------
dkresge
350KW seems a bit high for something that should effectively be an append-only
file system. I would have expected 95% of the trailer to be in standby at any
point in time.

~~~
bArray
Yeah, thought that number was a bit high. Maybe there's something else to it?

~~~
daveloyall
Encryption.

~~~
mountaineer22
Air Conditioning.

------
patrickg_zill
While I admit that there may be customer demand for this, even if there is
not, Amazon has certainly made a great marketing message.

~~~
chinathrow
There is: DigitalGlobe is/was using it.

------
notacoward
Now _this_ is real persistent container storage. ;) And persistent it will be,
because even at 1Tb/s it will take you over a week to load the data onto it.
The bandwidth while in transit is phenomenal, but before that it will be
sitting at your DC's loading dock for quite a while.

------
jjagylstor
So- suppose we could move that same 100 PB in a standard 24" cube FedEx box,
collecting the data in less than a week and using only two 110v power
connections. Would that be interesting? Oh, and it takes less than a single
rack of gear.

------
onde2rock
Any idea on the type of link they are using to connect to the containers ?

If they fill the thing in 30 days, that average to 40 Go/s, way faster than
10gbe or Fibre Channel.

~~~
ceejayoz
> Each Snowmobile includes a network cable connected to a high-speed switch
> capable of supporting 1 Tb/second of data transfer spread across multiple 40
> Gb/second connections. Assuming that your existing network can transfer data
> at that rate, you can fill a Snowmobile in about 10 days.

------
jasoncchild
[https://tools.ietf.org/html/rfc1149](https://tools.ietf.org/html/rfc1149)

------
jklein11
Maybe someone at Amazon was reading xkcd[1]

1\. [https://xkcd.com/949/](https://xkcd.com/949/)

~~~
Twirrim
Even before Snowball was launched the import/export team was joking about UPS
trucks full of snowballs, both within the team, and then with management. I
guess management found out there was actual customer demand for it.

------
aeharding
Because it's a shipping container, maybe eventually they'll do worldwide data
transfer?

~~~
dx034
Probably by plane. Shipping would otherwise probably be slower than using the
network.

~~~
masklinn
Worst case shipping is ~30 days (e.g. China to Hamburg), which is 38.6GB/s for
100PB in a container.

I doubt you have a permanent dedicated 40+GB link between you and the nearest
AWS data center.

Shipping by boat would still be way faster than a network upload.

~~~
dx034
You are correct, I had my math wrong. But don't know if they're certified for
shipping, not sure how the sea transport would affect the drives?

~~~
masklinn
Well if the container is completely and securely sealed it should not affect
the drives, after all the drives or other parts are shipped from factories.

But yeah I'd expect air-shipping as well, if only so that the container can be
kept secure, I doubt Amazon is going to send security personnel for week-long
trips on cargo ships.

------
orf
Whoever said "The Internet is not a truck"? Seems like this blurs the lines a
bit :)

------
joelthelion
I'd love to see actual pictures of that truck :)

~~~
tyingq
[https://techcrunch.com/2016/11/30/amazon-will-truck-your-
mas...](https://techcrunch.com/2016/11/30/amazon-will-truck-your-massive-
piles-of-data-to-the-cloud-with-an-18-wheeler/)

------
cyphreak
So this is absolutely a honeypot then, right?

There's no way to verify that this truck full of my corp's valuable data isn't
stopped somewhere along the way and cloned and then driven to NSA or
something.

~~~
tyingq
You're putting it in the truck with the intention of shipping it to Amazon to
load it onto their infrastructure. If you don't trust Amazon with your data,
the truck isn't unique in any way...they can clone your data at their data
center too.

~~~
cyphreak
Well that's what I'm saying. I mean I guess that rabbit hole goes all the way,
even if Amazon brought their machines to me, there's still not a guarantee
that it won't be stolen somehow.

------
raverbashing
I wonder how long it takes to load all the data in the truck (over 10GbE? or
are there better ways?)

~~~
ceejayoz
Right there in the article.

> Each Snowmobile includes a network cable connected to a high-speed switch
> capable of supporting 1 Tb/second of data transfer spread across multiple 40
> Gb/second connections. Assuming that your existing network can transfer data
> at that rate, you can fill a Snowmobile in about 10 days.

