
How Dropbox is printing money - marcamillion
http://marcgayle.com/how-dropbox-is-printing-money
======
rst
The reference to rsync in Dropbox's YCombinator application is a bit of a
tipoff --- rsync uses exactly this technique to avoid recopying files that
already exist at the destination.

~~~
marcamillion
Yep...I never dived much into that reason - but that's why I quoted it and
quoted the definition of 'Diff'. I guess I never brought it home as coherently
as I wanted to.

------
jrnkntl
I thought this was common knowledge. It wouldn't make sense to store
Pirated.Movie.DVDRiP.avi thousands of times if it's the same file for
thousands of users. Files that translate to the same hash, get served from one
single file on DB's servers.

~~~
marcamillion
Well perhaps the technology was common knowledge, but the business model
behind it - I don't think was ever popularly discussed.

It must have been overlooked.

If it is true that this is what they are doing, then Dropbox has literally
discovered one of the most profitable business models ever!

~~~
jwhitlark
It's a pretty common optimization. They had a good plan, and executed well,
but let's not get carried away; other people were working on similar ideas.
I'd bet that data stored in s3 are deduplicated in a similar manner.

Another plus to their setup is the hash for the file is calculated on your
machine, so you pay them so you can calculate the hash on your own files and
only upload them if they haven't seen them before.

~~~
marcamillion
I am sure other people were working on similar things, but how many of them
were able to execute on them as well as Dropbox has?

Google has supposedly implemented something like that for Gmail Attachments,
but Dropbox seems to have the complete package.

------
jolan
Unfortunatley, they also waste money on bandwidth. Here's the scenario:

1) Copy file into Dropbox on Server1

2) Client1 & Client2 on the same LAN will start downloading the file from the
Dropbox servers at the same time; wasting their bandwidth and mine.

I wish the LAN sync feature would be expanded for downloads.

~~~
marcamillion
Well..considering the alternative, that's the least. The alternative being
that they store every file X number of times, where X is the number of users
that upload that file.

Wasting money on bandwidth by doing multiple transfers is minimal in
comparison.

------
dave1619
Is this what Dropbox is doing? Storing just one copy of each unique file? It
would make sense for large files.

~~~
marcamillion
Well...I am not 100% sure, but that's the only logical thing given that if you
tried that example I gave you with the 241MB file, it uploaded in a minute or
two.

This is just me putting 2 and 2 together.

If they aren't, they should be :)

~~~
jolan
I am 100% sure. Pirated MP3s "upload" in seconds and my self-ripped music
takes the regular amount of time.

~~~
marcamillion
Agreed! But didn't want to include that in the post, lest the RIAA and MPAA
start breathing down their throats.

