Hacker News new | past | comments | ask | show | jobs | submit login
New physical AWS Data Transfer Terminals let you upload to the cloud faster (amazon.com)
49 points by vinni2 43 days ago | hide | past | favorite | 47 comments



Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway.

- Andrew S. Tanenbaum


I just learned the AWS truck has been retired for months https://www.datacenterdynamics.com/en/news/aws-retires-snowm...


I wonder how this process worked in practice. Do you simply send your only one set of hard disks and hope for the best? Do you keep your original data on one set of disks, copy it to another set of disks and send it off? Do you make multiple copies of the same data and send them all together on the same truck? Multiple copies on multiple trucks? How would you do reconciliation on the other end once the disks arrive at the destination?

Like everything feels so simple and straightforward from afar but once I try to actually reason about something even the simplest of tasks feels complicated.


> I wonder how this process worked in practice. Do you simply send your only one set of hard disks and hope for the best?

No, you don't put your own disks in the Snowmobile/Snowball/Snowclone. It contains disks, so when it arrives you connect it to your network and copy data onto it, and then it is driven to an Amazon datacentre.

See, e.g. https://docs.aws.amazon.com/snowball/latest/developer-guide/...

https://aws.amazon.com/blogs/aws/aws-importexport-snowball-t...


Thank you. That makes much more sense. So this way once the data is on the AWS device it is Amazon.com's responsibility so the customer doesn't need to worry about the truck or whatever.


> so when it arrives you connect it to your network and copy data onto it, and then it is driven to an Amazon datacentre.

And if there is an accident on the road, bad luck. /s


Judging by the locations (New York and LA) I wonder if this is to cater to folks from production houses who want to upload large video files for processing or backup.


Or law firms with large discovery data sets


How big are they? I thought its the 100gb order of magnitude as (dead tree) libraries.


Hmm. A couple years ago I think one large firm I knew their aws instance had about 400tb I think. Constantly growing with new cases.

They had instances around the world this was just one.


It adds up quick. I know of a law firm (under a 100 employees) with over 20TB.


That's still peanuts, you can get consumer grade HDDs with that capacity in a single drive. A business grade line would have no trouble uploading all of that data in less than a week, even with a bunch of extenuating circumstances.


Some smaller businesses may have a huge data store, but not the money to pay for a business grade internet connection to upload it in a reasonable amount of time. I've worked for clients who have a 10 megabit full duplex fiber connection for over $1,000 a month (probably because of low ISP competition and because they were in a newly built, low density area). If they were working on migrating to the cloud, they would certainly consider taking a few hard drives one time to AWS rather than maxing their 10 MB full duplex connection for weeks or months.


> no trouble uploading all of that data in less than a week

When you're doing e-discovery, deadlines are often measured in days - not just for the upload time, but for the analysis and finding the needle in the haystack.


Also gotta think of what else is using the corporate internet pipe you can’t drown it in one aws upload for days.


I'd imagine with LLMs today, discovery work is probably done on the cloud by bots.


It'd be a great way to get sued for negligence. You can't even assume the counterparty has correctly put everything into discovery for you. What you don't know is what gets you into trouble.

An example from the Karen Reed case, the police, somehow, uploaded a video that had been put through a "mirror filter" and thus showed a vehicle in the opposite orientation from reality. Is your LLM going to notice that?


Do you know of a single attorney who has been held liable for negligence for using an LLM to help accelerate their document research work?


It's already a common grounds for a malpractice suit. In a lot of cases these will be handled by the attorneys insurance and will probably be settled.

You really shouldn't take the absence of evidence as any sort of evidence itself.


Using an LLM is “common grounds” for a malpractice suit? Come on, the technology hasn’t even been around that long. Without corroborating evidence, why should anyone believe you?


Failure to properly perform discovery is already common grounds for a malpractice suit. I don't care if you believe me. You seem to have your mind made up anyways.


Yes, you can be held liable for failing to properly perform discovery. But the general case isn’t what we’re talking about here. It’s the specific case of using an LLM to assist with it.

> You seem to have your mind made up anyways.

I haven't made up my mind about anything; it's you who claimed that using an LLM "is a great way to get sued for negligence." It's a fundamental rule of debate that person who makes the argument bears the burden of supporting it.

You seem to be making the implicit assumption that using an LLM to assist with the process will probably be found to constitute negligence. Again, why should anyone believe you, especially if it hasn’t happened yet? Your argument is just FUD, pure and simple.

As an attorney I can tell you these questions just aren't that simple. You can get sued for anything. But that's not really all that important. What matters is whether LLMs would do a worse job of performing document review than human review would. The answer to that question will depend on the specific facts of the case and the current state of the art.

We simply don't know yet what the error rate is of using an LLM; and the tech is improving rapidly. One should expect enterprising attorneys to test them out experimentally to build trust. For example, they can easily be tested vs. human review on small document corpuses.


A few years ago there was definitely document processing automation and query based filtering but still alot of human work.

I assume you’re right and AI now does some of the work but I doubt all of it. Also how reliable would the AI be… you’d hate to not have critical evidence at trial because you trusted the AI fully and it missed something.

Discovery data includes audio, video, social site data, as well as the usual documents and emails.


Yeah that’s the quickest way to go bankrupt. Imagine trusting the current LLMs to do that and the prompts involved. No one is going to trust that.


I would have thought download would be more interesting. Dodge the egress charges on cloud migrations


Egress charges for migrations hasn't been a problem since 2024 March.

from https://aws.amazon.com/blogs/aws/free-data-transfer-out-to-i...

    We believe this choice must include the one to migrate your data to another cloud provider or on-premises. That’s why, starting today, we’re waiving data transfer out to the internet (DTO) charges when you want to move outside of AWS.

    If you need more than 100 gigabytes of data transfer out per month while transitioning, you can contact AWS Support to ask for free DTO rates for the additional data. It’s necessary to go through support because you make hundreds of millions of data transfers each day, and we generally do not know if the data transferred out to the internet is a normal part of your business or a one-time transfer as part of a switch to another cloud provider or on premises.

    We will review requests at the AWS account level. Once approved, we will provide credits for the data being migrated. We don’t require you to close your account or change your relationship with AWS in any way. You’re welcome to come back at any time. We will, of course, apply additional scrutiny if the same AWS account applies multiple times for free DTO.


If you request this, AWS requires:

"After your move away from AWS services, within the 60-day period, you must delete all remaining data and workloads from your AWS account, or you can close your AWS account."

So egress charges are still a significant problem.

https://aws.amazon.com/ec2/faqs/#Data_transfer_fees_when_mov...


I wonder if this will cause devices like ATM skimmers to pop up at these secret locations, skimming the traffic via MITM attacks on the network or tampered laptops.


They appear to be rooms inside data centers, so it's pretty doubtful.


Just for a reference of the price, it will be 300$ per hour in US and 500$ in EU.

That looks quite expensive in my opinion, even if this target big professionals.


I don't think I've ever handled so much data I'd need to increase data throughput.

How much are we talking? Like petabytes? (Do you just stroll in with on a huge disk array?)


They mention it as a way to take your snowball and upload it and walk out and continue using it without shipping it back and forth. Those look to go into the 210 TB range of raw storage.

In the past, at some companies I was at, I could see using something like this once a quarter to upload full quarterly backups, depending on the price per hour.


In a previous role we used their (first party) physical transfer appliances to upload ~600PB of video into S3. It was a complex logistical exercise to take it from the physical SANs, but the AWS specialists we worked with were great and it went without a hitch.


I think it depends on your network connection.

We used a snowball a couple of times to either move data from S3 to or it.

In some cases its because we didn't have enough local storage to shuffle the data, or because we only had a 100 meg net link and a couple of TBs to move.


I don't think it's really so much about throughput as it is about avoiding any ingress chargers at all.


AWS doesn't charge for ingress.


FTA: "There will be no per GB charge for the data transfer if you upload data into AWS Regions in the same continent of your location."

N.B. I'm not contradicting the parent, but reframing the concept - it seems that the Data Transfer Terminal is ALREADY in AWS so ingress isn't a thing since your data is ALREADY in AWS as soon as you connect the optic fiber cable to whatever storage you've brought onsite. But since you're only renting the connection, your data can't stay forever in AWS unless you copy to S3 or somewhere else in AWS.


Does aws charge ingress?

I've got a good connection for the UK but still about a 100 meg upload. I have external drives that are 4TB, so if everything goes perfectly that could take about two full days to upload.


It looks like there is no local buffer, so one needs to stick around till the upload is done? Did anyone see a mention of upload bandwidth?


from the FAQ, https://aws.amazon.com/data-transfer-terminal/faqs/

    - What is the connection type?

    - Each Data Transfer Terminal facility will have at least two (2) 100G optical fiber cables that are connected to the AWS network.


    - What are the key requirements for preparing my device to use the Data Transfer Terminal facility?

    - To prepare for using the Data Transfer Terminal facility and connecting to the network, you need to ensure your uploading device is prepared to connect to the network. You should have the following for an optimal data upload experience:
        • A transceiver type 100G LR4 QSFP
        • An active IP auto configuration (DHCP)
        • Up-to-date software/transceiver drivers
Also, from https://aws.amazon.com/data-transfer-terminal/pricing/:

    Per port charge for data transfer

    US to US @ $300
    US to EU @ $500
    US to Other @ Contact us


Are the facilities available to rent late at night for LAN parties?

And do they serve good refreshments?


I presume that given they mention that it only supports public endpoints, that it's just a directly peered connection on the public internet and there's no special security stuff at play here?


They probably mentioned public endpoints to highlight that they allow access to AWS services but they’re not renting out access to a massive bandwidth pipe for non-customers or to upload to non-AWS. The “public endpoints” part probably is used to warn customers that existing technical limitations apply wrt abilities- eg you can’t dump data directly to the S3 data plane, you still have to go through the public API.

Considering the fiber connection is “part of the AWS network”, it may not have access to the outside internet, but also contextually probably doesn’t have any privileged access to AWS servers.


My statement that it's on the public internet is more that it's directly connected to someone like level3 and not a direct fiber drop to an AWS datacenter.

I would be surprised if the building in NYC has direct fiber to IAD or if the building in LA has direct fiber to SFO.


Kind of. Im pretty sure “public endpoint” is to disambiguate from virtual network functionality like VPC Endpoints and VPC Peering etc which are specific to your account/VPC.

To your other comment, not many companies own the “layer 0” physical infra long haul. But yes, this is almost absolutely going over “dark fiber” that AWS is leasing from someone like zayo or L3. Since 2016ish effectively all AWS regions and DX POPs use this model to interconnect. The LAX & NYC sites are almost definitely the same DCs & TCs used for “local zones.” So yes, a few hundred gb to tb of leased lines back to PDX and IAD. Search for “aws backbone” content. And for security Im not sure whats public these days, but fiber taps are a real and known threat.

Source: ex AWS PE who spent some time in and adjacent to edge networking.


> Don’t be surprised if there are no AWS signs in the building or room. This is for security reasons to keep your work location as secret as possible.

Huh?


From time to time around midnight Bezos can be seen peeking through the "A" of random AWS signs. This prevents that




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: