Hacker News new | comments | show | ask | jobs | submit login
How Evernote moved 3 petabytes of data to Google's cloud (pcworld.com)
40 points by ShanaM 2 hours ago | hide | past | web | 14 comments | favorite





I can understand that a company wants to do away with the distraction of having to worry about running a datacentre but is using one of the major cloud services cheaper than running your own? Amazon and Microsoft seem to be making healthy profits off their cloud services. That's part of your cost.

Now it may be that even if more expensive, it is a better use of your capital (renting instead of owning so that you can invest your capital in higher return investments like hiring better people). But is it cheaper?

reply


Guess Evernote are prepping for a sale/IPO?

Moving into the cloud at this stage doesn't make any sense otherwise.

reply


Sadly the article never describes HOW they transfer the data. Simply sending 3PB on the wire will cost a fortune. Maybe Google does have some non-public service like AWS Snowball?

reply


Nothing against the author, but it sounds like another story that Google PR pitched and influenced. It is normal these days. I am sure AWS does the same. That is why the technical details may be light. The submitter is likely a Social Media Manager for Google (nothing wrong with that - kudos for not obscuring it!)

I, too, have wondered about the technical details when I see stories like this. If the data was originally stored in S3, they probably used the Cloud Storage Transfer Service (https://cloud.google.com/storage/transfer/). The secret sauce appears to be parallelization of the transfers.

When I copied hundreds of GB of data from one bucket of to another AWS bucket, it took me days to do! Then I realized it was because I was running a single threaded process. The slowest part was getting the list of objects to copy. If I had a separate, faster index of the objects, then I could have done a copy in parallel much more quickly and max out the bandwidth available to me. (Maybe that is part of the rationale of Netflix's S3mper?).

I'm curious to hear other thoughts on accomplishing fast transfers of S3 data.

Edit: theoretically speaking, if you had 1 TB of data to transfer, and a paltry 100 Mbps (12.5 MB/s) connection, you could do this in under 24 hours if the entire network was fully utilized. Not a network engineer so go easy on me if I am wrong!

reply


The most straightforward is running aws s3 command but as you noted it's single threaded, slow as hell. Imagine you have 100TB of data to migrate from one bucket to another. If the command encounter an error (which is very likely to happen), you have to start again.

I think the only other option left is use Spot Fleet EMR. I believe the S3 API takes a marker for pagination so you can probably keep paginate to list the next N keys (of course assume your current bucket is no longer accepting new objects... otherwise create a Lambda function and let new file to be copied to the new bucket on create event).

I don't remember if aws s3 cp --recursive bucket1 bucket2 requires local copies first, or it is a remote copy-only operation.

I have always wanted AWS to allow S3 bucket to be transferred from one account to another with simply transferring ownership.

reply


I did try EMR + s3distcp a while ago. AWS Support recommended it and provided instructions in their docs. It was not very fast for me. The data I was trying to copy was about 400 GB of small 5-20 kb files - not ideal for fast copying. I gave up and used a different method.

I suspect Evernote, if using S3, would have a similar data profile of many small objects.

reply


"Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway."

reply


Yeah, this is actually AWS Snowball. But the Google Cloud does no have a competing offering. It has none publicly, so it would be of real interest if they only have it for big customers handing over their first born son or for everyone on request but not advertising it like AWS.

reply


The Stanford HPC team reported last year having sent more than 2PB to Google drive using Lustre/HSM. This was using the public API and the fact that gdrive is free for *.edu sites.

See https://www.eofs.eu/_media/events/lad16/07_thiell_cheap_n_de... (slides) and https://www.youtube.com/watch?v=WbE0nl5V8WE

reply 


    Evernote’s engagement with Google engineers was a pleasant surprise to 
    McCormack. The team was available 24/7 to handle Evernote’s concerns 
    remotely, and Google also sent a team of its engineers over to Evernote’s 
    facilities to help with the migration.

    Those Google employees were around to help troubleshoot any technical 
    challenges Evernote was having with the move. That sort of 
    engineer-to-engineer engagement is something Google says is a big part of 
    its approach to service.
It's interesting to read that, given that one of the perennial complaints about GCE on Hacker News is the relative lack of support compared to AWS or Azure. Is it just that when you're a customer that big, even Google is willing to give you personalized service? Or is this a sign of a change in the GCE support model?

reply


I think having purchased the highest support configuration and being a big customer will always improve your support situation.

reply


>Right now, the company is still in the process of migrating the last of its users’ attachments to GCP.

I wonder if Evernote users can tell if they are on GCP or the previous infrastructure. I suspect not.

reply


So a large part of choosing GCE is possibly because CEO was a Googler and he was very familiar with how Google works and his contact at Google helped him scored a better negotiation and possibly even better support. Although I won't deny GCE team would snatch Evernote regardless because it's an important client to show off in GCE's client portfolio.

> The decision to go with Google over another provider like AWS or Azure was driven by the technology team at Evernote, according to Greg Chiemingo, the company’s senior director of communications. He said in an email that CEO Chris O’Neill, who was at Google for roughly a decade before joining Evernote, came in to help with negotiations after the decision was made.

reply


I'm sure that had nothing to do with it. Free donuts. That's how you influence and win enterprise accounts.

reply




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: