I wonder how this process worked in practice. Do you simply send your only one set of hard disks and hope for the best? Do you keep your original data on one set of disks, copy it to another set of disks and send it off? Do you make multiple copies of the same data and send them all together on the same truck? Multiple copies on multiple trucks? How would you do reconciliation on the other end once the disks arrive at the destination?
Like everything feels so simple and straightforward from afar but once I try to actually reason about something even the simplest of tasks feels complicated.
> I wonder how this process worked in practice. Do you simply send your only one set of hard disks and hope for the best?
No, you don't put your own disks in the Snowmobile/Snowball/Snowclone. It contains disks, so when it arrives you connect it to your network and copy data onto it, and then it is driven to an Amazon datacentre.
Thank you. That makes much more sense. So this way once the data is on the AWS device it is Amazon.com's responsibility so the customer doesn't need to worry about the truck or whatever.
Judging by the locations (New York and LA) I wonder if this is to cater to folks from production houses who want to upload large video files for processing or backup.
That's still peanuts, you can get consumer grade HDDs with that capacity in a single drive. A business grade line would have no trouble uploading all of that data in less than a week, even with a bunch of extenuating circumstances.
Some smaller businesses may have a huge data store, but not the money to pay for a business grade internet connection to upload it in a reasonable amount of time. I've worked for clients who have a 10 megabit full duplex fiber connection for over $1,000 a month (probably because of low ISP competition and because they were in a newly built, low density area). If they were working on migrating to the cloud, they would certainly consider taking a few hard drives one time to AWS rather than maxing their 10 MB full duplex connection for weeks or months.
> no trouble uploading all of that data in less than a week
When you're doing e-discovery, deadlines are often measured in days - not just for the upload time, but for the analysis and finding the needle in the haystack.
It'd be a great way to get sued for negligence. You can't even assume the counterparty has correctly put everything into discovery for you. What you don't know is what gets you into trouble.
An example from the Karen Reed case, the police, somehow, uploaded a video that had been put through a "mirror filter" and thus showed a vehicle in the opposite orientation from reality. Is your LLM going to notice that?
Using an LLM is “common grounds” for a malpractice suit? Come on, the technology hasn’t even been around that long. Without corroborating evidence, why should anyone believe you?
Failure to properly perform discovery is already common grounds for a malpractice suit. I don't care if you believe me. You seem to have your mind made up anyways.
Yes, you can be held liable for failing to properly perform discovery. But the general case isn’t what we’re talking about here. It’s the specific case of using an LLM to assist with it.
> You seem to have your mind made up anyways.
I haven't made up my mind about anything; it's you who claimed that using an LLM "is a great way to get sued for negligence." It's a fundamental rule of debate that person who makes the argument bears the burden of supporting it.
You seem to be making the implicit assumption that using an LLM to assist with the process will probably be found to constitute negligence. Again, why should anyone believe you, especially if it hasn’t happened yet? Your argument is just FUD, pure and simple.
As an attorney I can tell you these questions just aren't that simple. You can get sued for anything. But that's not really all that important. What matters is whether LLMs would do a worse job of performing document review than human review would. The answer to that question will depend on the specific facts of the case and the current state of the art.
We simply don't know yet what the error rate is of using an LLM; and the tech is improving rapidly. One should expect enterprising attorneys to test them out experimentally to build trust. For example, they can easily be tested vs. human review on small document corpuses.
A few years ago there was definitely document processing automation and query based filtering but still alot of human work.
I assume you’re right and AI now does some of the work but I doubt all of it. Also how reliable would the AI be… you’d hate to not have critical evidence at trial because you trusted the AI fully and it missed something.
Discovery data includes audio, video, social site data, as well as the usual documents and emails.
We believe this choice must include the one to migrate your data to another cloud provider or on-premises. That’s why, starting today, we’re waiving data transfer out to the internet (DTO) charges when you want to move outside of AWS.
If you need more than 100 gigabytes of data transfer out per month while transitioning, you can contact AWS Support to ask for free DTO rates for the additional data. It’s necessary to go through support because you make hundreds of millions of data transfers each day, and we generally do not know if the data transferred out to the internet is a normal part of your business or a one-time transfer as part of a switch to another cloud provider or on premises.
We will review requests at the AWS account level. Once approved, we will provide credits for the data being migrated. We don’t require you to close your account or change your relationship with AWS in any way. You’re welcome to come back at any time. We will, of course, apply additional scrutiny if the same AWS account applies multiple times for free DTO.
"After your move away from AWS services, within the 60-day period, you must delete all remaining data and workloads from your AWS account, or you can close your AWS account."
So egress charges are still a significant problem.
I wonder if this will cause devices like ATM skimmers to pop up at these secret locations, skimming the traffic via MITM attacks on the network or tampered laptops.
They mention it as a way to take your snowball and upload it and walk out and continue using it without shipping it back and forth. Those look to go into the 210 TB range of raw storage.
In the past, at some companies I was at, I could see using something like this once a quarter to upload full quarterly backups, depending on the price per hour.
In a previous role we used their (first party) physical transfer appliances to upload ~600PB of video into S3. It was a complex logistical exercise to take it from the physical SANs, but the AWS specialists we worked with were great and it went without a hitch.
We used a snowball a couple of times to either move data from S3 to or it.
In some cases its because we didn't have enough local storage to shuffle the data, or because we only had a 100 meg net link and a couple of TBs to move.
FTA: "There will be no per GB charge for the data transfer if you upload data into AWS Regions in the same continent of your location."
N.B. I'm not contradicting the parent, but reframing the concept - it seems that the Data Transfer Terminal is ALREADY in AWS so ingress isn't a thing since your data is ALREADY in AWS as soon as you connect the optic fiber cable to whatever storage you've brought onsite. But since you're only renting the connection, your data can't stay forever in AWS unless you copy to S3 or somewhere else in AWS.
I've got a good connection for the UK but still about a 100 meg upload. I have external drives that are 4TB, so if everything goes perfectly that could take about two full days to upload.
- What is the connection type?
- Each Data Transfer Terminal facility will have at least two (2) 100G optical fiber cables that are connected to the AWS network.
- What are the key requirements for preparing my device to use the Data Transfer Terminal facility?
- To prepare for using the Data Transfer Terminal facility and connecting to the network, you need to ensure your uploading device is prepared to connect to the network. You should have the following for an optimal data upload experience:
• A transceiver type 100G LR4 QSFP
• An active IP auto configuration (DHCP)
• Up-to-date software/transceiver drivers
I presume that given they mention that it only supports public endpoints, that it's just a directly peered connection on the public internet and there's no special security stuff at play here?
They probably mentioned public endpoints to highlight that they allow access to AWS services but they’re not renting out access to a massive bandwidth pipe for non-customers or to upload to non-AWS. The “public endpoints” part probably is used to warn customers that existing technical limitations apply wrt abilities- eg you can’t dump data directly to the S3 data plane, you still have to go through the public API.
Considering the fiber connection is “part of the AWS network”, it may not have access to the outside internet, but also contextually probably doesn’t have any privileged access to AWS servers.
My statement that it's on the public internet is more that it's directly connected to someone like level3 and not a direct fiber drop to an AWS datacenter.
I would be surprised if the building in NYC has direct fiber to IAD or if the building in LA has direct fiber to SFO.
Kind of. Im pretty sure “public endpoint” is to disambiguate from virtual network functionality like VPC Endpoints and VPC Peering etc which are specific to your account/VPC.
To your other comment, not many companies own the “layer 0” physical infra long haul. But yes, this is almost absolutely going over “dark fiber” that AWS is leasing from someone like zayo or L3. Since 2016ish effectively all AWS regions and DX POPs use this model to interconnect. The LAX & NYC sites are almost definitely the same DCs & TCs used for “local zones.” So yes, a few hundred gb to tb of leased lines back to PDX and IAD. Search for “aws backbone” content. And for security Im not sure whats public these days, but fiber taps are a real and known threat.
Source: ex AWS PE who spent some time in and adjacent to edge networking.
- Andrew S. Tanenbaum