Now it may be that even if more expensive, it is a better use of your capital (renting instead of owning so that you can invest your capital in higher return investments like hiring better people). But is it cheaper?
reply
Moving into the cloud at this stage doesn't make any sense otherwise.
I, too, have wondered about the technical details when I see stories like this. If the data was originally stored in S3, they probably used the Cloud Storage Transfer Service (https://cloud.google.com/storage/transfer/). The secret sauce appears to be parallelization of the transfers.
When I copied hundreds of GB of data from one bucket of to another AWS bucket, it took me days to do! Then I realized it was because I was running a single threaded process. The slowest part was getting the list of objects to copy. If I had a separate, faster index of the objects, then I could have done a copy in parallel much more quickly and max out the bandwidth available to me. (Maybe that is part of the rationale of Netflix's S3mper?).
I'm curious to hear other thoughts on accomplishing fast transfers of S3 data.
Edit: theoretically speaking, if you had 1 TB of data to transfer, and a paltry 100 Mbps (12.5 MB/s) connection, you could do this in under 24 hours if the entire network was fully utilized. Not a network engineer so go easy on me if I am wrong!
I think the only other option left is use Spot Fleet EMR. I believe the S3 API takes a marker for pagination so you can probably keep paginate to list the next N keys (of course assume your current bucket is no longer accepting new objects... otherwise create a Lambda function and let new file to be copied to the new bucket on create event).
I don't remember if aws s3 cp --recursive bucket1 bucket2 requires local copies first, or it is a remote copy-only operation.
I have always wanted AWS to allow S3 bucket to be transferred from one account to another with simply transferring ownership.
I suspect Evernote, if using S3, would have a similar data profile of many small objects.
See https://www.eofs.eu/_media/events/lad16/07_thiell_cheap_n_de... (slides) and https://www.youtube.com/watch?v=WbE0nl5V8WE
Evernote’s engagement with Google engineers was a pleasant surprise to
McCormack. The team was available 24/7 to handle Evernote’s concerns
remotely, and Google also sent a team of its engineers over to Evernote’s
facilities to help with the migration.
Those Google employees were around to help troubleshoot any technical
challenges Evernote was having with the move. That sort of
engineer-to-engineer engagement is something Google says is a big part of
its approach to service.
I wonder if Evernote users can tell if they are on GCP or the previous infrastructure. I suspect not.
> The decision to go with Google over another provider like AWS or Azure was driven by the technology team at Evernote, according to Greg Chiemingo, the company’s senior director of communications. He said in an email that CEO Chris O’Neill, who was at Google for roughly a decade before joining Evernote, came in to help with negotiations after the decision was made.
Now it may be that even if more expensive, it is a better use of your capital (renting instead of owning so that you can invest your capital in higher return investments like hiring better people). But is it cheaper?
reply