Hacker News new | comments | show | ask | jobs | submit login
How Dropbox is printing money (marcgayle.com)
9 points by marcamillion 2457 days ago | hide | past | web | 18 comments | favorite

The reference to rsync in Dropbox's YCombinator application is a bit of a tipoff --- rsync uses exactly this technique to avoid recopying files that already exist at the destination.

Yep...I never dived much into that reason - but that's why I quoted it and quoted the definition of 'Diff'. I guess I never brought it home as coherently as I wanted to.

I thought this was common knowledge. It wouldn't make sense to store Pirated.Movie.DVDRiP.avi thousands of times if it's the same file for thousands of users. Files that translate to the same hash, get served from one single file on DB's servers.

Well perhaps the technology was common knowledge, but the business model behind it - I don't think was ever popularly discussed.

It must have been overlooked.

If it is true that this is what they are doing, then Dropbox has literally discovered one of the most profitable business models ever!

It's a pretty common optimization. They had a good plan, and executed well, but let's not get carried away; other people were working on similar ideas. I'd bet that data stored in s3 are deduplicated in a similar manner.

Another plus to their setup is the hash for the file is calculated on your machine, so you pay them so you can calculate the hash on your own files and only upload them if they haven't seen them before.

I am sure other people were working on similar things, but how many of them were able to execute on them as well as Dropbox has?

Google has supposedly implemented something like that for Gmail Attachments, but Dropbox seems to have the complete package.

Is this what Dropbox is doing? Storing just one copy of each unique file? It would make sense for large files.

Well...I am not 100% sure, but that's the only logical thing given that if you tried that example I gave you with the 241MB file, it uploaded in a minute or two.

This is just me putting 2 and 2 together.

If they aren't, they should be :)

I am 100% sure. Pirated MP3s "upload" in seconds and my self-ripped music takes the regular amount of time.

Agreed! But didn't want to include that in the post, lest the RIAA and MPAA start breathing down their throats.

To me the post seems to state the facts with such certitude that you might want to put that disclaimer a bit higher.

Very true. Done!

Well, not a single copy. They would want some sort of redundancy, even if the are just reselling aws services.

I seem to recall that this is how Gmail handles attachments.

I know...the main differences though are that Gmail doesn't charge per attachment. Whereas Dropbox does.


What does that sort of entitled whining have to do with the storage methodology?

Also, neither Dropbox nor Gmail have ever charged me a cent.

Unfortunatley, they also waste money on bandwidth. Here's the scenario:

1) Copy file into Dropbox on Server1

2) Client1 & Client2 on the same LAN will start downloading the file from the Dropbox servers at the same time; wasting their bandwidth and mine.

I wish the LAN sync feature would be expanded for downloads.

Well..considering the alternative, that's the least. The alternative being that they store every file X number of times, where X is the number of users that upload that file.

Wasting money on bandwidth by doing multiple transfers is minimal in comparison.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact