
Using Amazon Snowmobile to transmit 100PB of satellite images to AWS - artsandsci
https://www.wired.com/2017/05/best-way-transmit-satellite-data-trucks-really/
======
Piskvorrr
Everything old is new again:

"Never underestimate the bandwidth of a station wagon full of tapes hurtling
down the highway." —Tanenbaum, Andrew S. (1989). Computer Networks. New
Jersey: Prentice-Hall. p. 57. ISBN 0-13-166836-6.

~~~
Posibyte
It really is a marvel when you think about it. The latency is horrible,
several hours to several days, but each packet of data can contain several Pb
of information inside of it. It comes with a built-in ACK method (phone call
or site update), and the packet almost never needs to be verified externally
because the data inside has all of its own integrity check methods.

~~~
JackFr
I assume they're sending the checksums with Bandit (lest a truck hijacking
lead to a MITM attack.)

~~~
ajford
This sounds like the lead in to a fantastic spy movie.

The rag-tag team of ex-spies have to band together with the younger techy team
to hijack a Snowmobile and modify data on the fly, barreling down the highway.

~~~
opportune
"Man in the Middle"

~~~
ge96
_Blows on gum wrapper_

I just got you unlimited international calling time

------
wging
Make sure to read the linked blog post from Digital Globe. It is more
interesting, in my opinion (it includes some interesting details and seems to
be written towards a _slightly_ more technical audience):
[http://blog.digitalglobe.com/industry/digitalglobe-moves-
to-...](http://blog.digitalglobe.com/industry/digitalglobe-moves-to-the-cloud-
with-aws-snowmobile/)

------
ortusdux
Wow. It looks like a 16hr drive to the Oregon center. Given an 8hr break in
between shifts it works out to almost 10 Tb/sec transfer rate.

~~~
deno
Unless they’re shipping those disks as a ready-to-go container DC[1], you need
to account for the time to unload and install them. Actually, you need to
account for it on both ends.

[1]
[https://en.wikipedia.org/wiki/Modular_data_center](https://en.wikipedia.org/wiki/Modular_data_center)

~~~
kristjansson
They are though!

> The Snowmobile comes with a removable connector rack that needs to be
> mounted on one of your data center racks where it can be connected directly
> to your high-speed network backbone. The connector racks provides multiple
> 40Gb/s interfaces that can transfer up to 1 Tb/s in aggregate.[1]

so there is a bit of setup and teardown, but they're not plugging disks in
one-by-oen

[1]:
[https://aws.amazon.com/snowmobile/faqs/](https://aws.amazon.com/snowmobile/faqs/)

~~~
Twirrim
That's talking about customer premises though, not AWS side.

~~~
jsmthrowaway
What's the difference? They'd probably do the same thing at both ends.
Snowmobile is essentially a giant external drive hurtling down the highway,
and the removable rack (which I'd assume is just switches) is its "USB"
interface. Actually, it'd probably be _more_ efficient on the AWS side,
because they likely already have said interconnects in their facility and
don't need to unload it.

If I were designing it I'd expose a private S3 interface, or maybe NFS, to the
customer from the truck, store the incoming bits on-disk in the form that S3
expects, then just copy raw blocks from disk and merge metadata databases on
the AWS side so you're not doing two passes through the S3 API. I can't
imagine they're doing something much different than that. Not every customer
would be in a position to handle iSCSI or FC, for example. You could unload
the disks at the destination, too, but that seems less efficient and more
disruptive to their lifetime.

~~~
Twirrim
I can't envision S3 ever wanting their secret sauce to be out there in the
world available for everyone to potentially compromise and figure out. Merging
the metadata together would be a complex task that would require significant
engineering effort on all sides, especially given none of the components were
ever designed with such a task in mind, and everything is highly tuned towards
the actual types of workload they typically see. Pushing things through the
front end API would be the quickest and easiest route for all parties
concerned. That's what it's there for, and scaling will be a known quantity.
Regardless of how impressive the Snowmobile is, and the quantity of data it
stores, the bandwidth it would consume putting in to S3 would be a drop in the
ocean compared to their usual workload.

~~~
deno
Any evidence for any of those claims? Putting anti-tampering failsafes and GPS
tracking on the Snowmobile doesn’t seem like such a stretch, when you consider
they don’t rent this out to just anyone on the street.

And 100 PB is still a lot of data. Facebook in its IPO filing said their
entire media library was at the time 100 PB. Backblaze’s total backup size was
150 PB in 2015, just 2 years ago. I’m sure S3 is at exabyte scale now, but
even so, 100 PB is not a drop in anyone’s bucket.

~~~
Twirrim
> Any evidence for any of those claims? Putting anti-tampering failsafes and
> GPS tracking on the Snowmobile doesn’t seem like such a stretch, when you
> consider they don’t rent this out to just anyone on the street.

> And 100 PB is still a lot of data. Facebook in its IPO filing said their
> entire media library was at the time 100 PB. Backblaze’s total backup size
> was 150 PB in 2015, just 2 years ago. I’m sure S3 is at exabyte scale now,
> but even so, 100 PB is not a drop in anyone’s bucket.

I'm ex-AWS, and used to work on a team closely involved with both S3 and
Import/Export (the team that produced snowmobile, though that expansion
launched after I left) I'm well aware of the kinds of capacity scale S3
operates at, as well as some other storage teams. Remember that backblaze, for
all its scale, operates out of a small number of datacentres, where AWS has
operations right across the world.

Being ex-AWS also why I'm familiar with the way AWS Security thinks, and I can
easily picture how they'd hit the roof if S3 was "ported" on to the device.

That's before we even begin to tackle the concept of just how hard it would be
to port S3 in to a small scale platform as the Snowmobile. Don't forget,
you're not just talking about all the things that make up S3, but all the
things that make up Amazon infrastructure as a whole. One of the reasons
Amazon is able to push out so many services is because they've got a mature
and well established ecosystem behind the scenes that is designed to operate
at scale.

You'd be talking year(s) of effort to port S3 into a snowmobile, at best, vs.
a matter of a few months to port and fully test a layer on top of the device.
When it all boils down to it, the S3 API is pretty simple. Why on earth would
you choose the most technically complicated way to approach the problem?

~~~
deno
That’s much better for a source, thanks.

Do notice the context, however. We’re approaching this problem from the
perspective of trying to determine the network bandwidth of a portable
datacenter driving down the highway.

Assuming that container is really just a giant USB harddrive then it’s 10 days
for copy at client’s site, 8 hours for the drive, and then another 10 days to
assimilate the data back at AWS DC, or more if pushing it to 2 other DCs is
slower.

Perhaps there’s some way to alleviate some of those concerns by shipping
minimal S3-API server in the Snowmobile, then deploy actual production code
when it’s assimilated back home. Perhaps the Snowmobile could become its own
region temporarily.

Still, you’re right that is a lot of moving parts for something that you’d
want to avoid… Except in this extremely important hypothetical, obviously.

Also, while it is probably true that S3 has way more storage distributed
across the globe, I think the Snowmobile still follows the Sat Nav not DNS &
BGP ;) I mean it delivers to a single DC, so that’s what needs to be
considered.

Actually 100 PB is a lot on disk, but it’s even more over the network. They’ll
probably want to push it during off-peaks, but should probably consider to
just keep driving the truck to the next AZ.

------
acidburnNSA
As we approach Enemy of the State technology, I feel slightly better about how
cloudy it is in Seattle.

~~~
zardo
The clouds don't really help with the cameras on street posts.

[http://www.seattletimes.com/seattle-news/crime/judge-
blocks-...](http://www.seattletimes.com/seattle-news/crime/judge-blocks-
seattle-city-light-from-disclosing-locations-of-fbi-surveillance-cameras/)

------
comboy
I wonder when we'll achieve communication faster than moving physical medium
for huge amounts of data. It's not so obvious since information density also
keeps increasing.

~~~
mjb
The historical trend is that storage density grows faster than throughput, and
throughput improves faster than latency.

~~~
gm-conspiracy
Something video-streaming services and media companies should consider.

~~~
ryandamm
Perhaps we can encode high quality video onto physical media... using
microscale structures to _optically_ encode the video data? We could then use
snail mail to move these optical storage devices to end viewers.

Oh, wait. Darnit.

~~~
gm-conspiracy
Thing bigger.

Instead of a movie or few episodes, consider physical media that could store a
year or decade of TV shows and movies.

What is more likely to occur in the next decade: increasing density and
decreasing costs of storage media, OR, increased speeds at the local ISP level
with decreasing costs?

~~~
cle
There's a lot of incidental value in on-demand streaming. The streaming
company gets immediate and much more intimate feedback about how you're
interacting with their product.

~~~
peteretep
I'm sure the "NetFlix CHILLNAS™" device which you plug into your TV via HDMI
can be made to be Wifi-enabled only and absolutely lousy with instrumentation
to send analytics back to homebase.

------
dghughes
I have some security video on four 1TB USB 2 external drives I added the files
to the drives a few files at a time over a year or two. I wanted to move all
the files all onto a single large drive but I just gave up due to the drives
spinning for so long (days!) and its power is over the USB cable not external
brick.

I guess that's the point of the article the data we humans can now create is
so enormous it's impossible to move it other than in bulk physically.

~~~
dsp1234
Take the external drives out of the USB 2 enclosure (max speed 35MB/s) and
move them into a USB 3 enclosure (max speed 400MB/s). You can also directly
attach the drives to a PC via an IDE or SATA interface. The USB 3, IDE or SATA
interfaces are going to allow you to run the drive at whatever the drive's
speed is, rather than be limited by the transfer speed of the interface.

~~~
brianwawok
Older spinning rust may not be much faster than 35 MB a sec sustained

------
thatwebdude
(Regarding a featured photo) Man, I really gotta find a way to invest in
shipping containers. It's foolproof as of late!

------
YCode
The Sneakernet has its advantages.

