It's always important to remember the difference between a syncing service and a backup service. A syncing service sometimes feels like a backup, because you can use it to recover files if a local device is destroyed or lost. HOWEVER, any service capable of syncing files is equally capable of destroying them.
It's important to have an automated one-way backup system that you can manually restore from. Something like Tarsnap  looks like a really good possibility (I haven't used it myself, but it seems solid)
This distinction is really without a difference. Dropbox consistently advertises themselves as both a syncing service as well as a backup service. Their tag line is even "Dropbox - Secure backup, sync and sharing made easy," with secure backup being ahead of sync and sharing. 
"Even if your computer has a meltdown, your stuff is always safe in Dropbox and can be restored in a snap.
In fact, if you're using the Dropbox desktop application, your files are backed up several times. The primary copy on your computer's hard drive is synced online and that copy is then backed up again for safety (emphasis mine). If you are using Dropbox to sync files between multiple computers, your files are backed up on those computers as well. If that isn't enough, Dropbox also keeps backups of all of your deleted and changed files too.
It's hard to imagine a scenario where Dropbox could lose your files. Hypothetically, let's say a nuclear bomb blows up the data centers where your files are saved. Even then, your files are still safe and sound on your computer and any other computers linked to your Dropbox account."
Clearly, though, Dropbox appears to have lost files (at least if we take it on faith that "these are bugs in Dropbox's syncing logic"), despite the fact that I see no mushroom cloud nearby.
>Even then, your files are still safe and sound on your computer and any other computers linked to your Dropbox account."
Unless, as happened here, Dropbox erased the file and synced a blank version across all your computers. They they're safe and sound on any computer linked to your dropbox connection that hasn't been connected to the internet since the file got corrupted.
If you catch it in the 30 day window when they keep old versions you're find, but there are files in my Dropbox that I don't use every 30 days.
As you said, "Dropbox erased the file and synced a blank version across all your computers." I agree. As such, I'm not sure why individuals are so quick to blame the end user just yet.
On the other hand, I don't want to immediately blame Dropbox just yet either. If you backup garbage (say, because you have disk corruption), then you can't blame Dropbox for backing up exactly what you told it to.
And Dropbox does offer a premium Packrat service if you want file history indefinitely. Perhaps the user can be blamed for assuming that he/she would only need 30 days of history, but this is really contingent on who caused the corruption to happen in the first place -- and that's unknown at the moment. 
Agreed on not being sure that it's Dropbox's fault. But even if the user killed the file, you've got the same problem.
A person running Time Machine (or similar incremental backup system) is a lot safer from this sort of problem than a free tier Dropbox user. Free Dropbox is better than nothing, but people can't keep assuming their files are safe because "they're in the cloud" and get synced to a few places.
"This distinction is really without a difference."
"In fact, if you're using the Dropbox desktop application, your files are backed up several times."
"It's hard to imagine a scenario where Dropbox could lose your files..."
Please stop. Right there. Stop believing some vendor marketing blindly (even if you do come to another conclusion later), stop reassuring other people who do so and stop calling a sync a backup. There is a very important distinction: Sync has mechanisms in place that are capable to touch the files on your backup. At least in dropbox' case these mechanisms are not completely separated from the initial backup mechanism of each version. I've had two almost catastrophical data losses with Dropbox until I was able to make that distinction. I've come to the conclusion that IT professionals should never ever treat a sync system as a backup system, and if you still think so please don't spread that advice to others.
Agreed, and it helps to have 2 backups of all your data using different techniques/products. I run a local backup to cheap disk, network backup to remote location, and I suppose my remotely hosted svn is my third backup of most assets I really care about.
Most of us here are developers. How many of our products are perfect? Always assume something will fail in a new and interesting way in the future. I'm not letting Dropbox off the hook - their product shouldn't do this - but you'll be happier if you treat backups with the same level of redundancy and planning as the rest of your infrastructure.
Yes - sync is not a backup but services as Dropbox are designed to be backup too (with their revision feature).
So I think the underlaying problem here is that any backup/syncing system might have a bug (like this one) or there might be operator or user error (deleting your revision history is just a couple clicks away). Recovery oriented computing website has a lot good papers on this topic .
This is very similar to problems with outages on Amazon EC2 - yes Amazon cloud is great but in order to make your service highly available you do need to have standby system on some other cloud (for example, we run on Rackspace but our standbys are on Amazon).
One approach to protect yourself against problems like this is to replicate/sync all your files from one cloud storage (your primary one) to some other cloud service (GDrive, SugarSync, Box, etc.). So should Dropbox have a bug, then you still have everything in other cloud service: including all revisions.
Services like cloudHQ  (that is my baby) can replicate and sync all your files from Dropbox to, for example, GDrive. And of course cloudHQ has options like "two-way" sync, "don't replicate deletion", "backup" (weekly incremental are in folders - so your will be fine even if "revisions" feature fails), etc.
This is one of the reasons that SpiderOak (an encypted sync AND backup service) keeps historical versions of everything forever, until customers explicitly chose to remove them.
For services that purge old versions and deleted files at 30 days, you lose if you don't notice a problem promptly. You can't be expected to be watchful over gigabytes of data; that's the whole point of a backup service.
Dropbox give you the choice, you can pay and have the versions stored forever or accept the lower level of service with 30 days for free. I am sure you have calculated the economics on keeping the extra copied for all non paying customers, it is possible it would be less sustainable.
As an aside, it seems Dropbox is biding on the [SpiderOak] keyword on Adwords, you should probably at least outbid them on your brand terms to reduce confusion/ misdirection for potential customers.
Depending on the country there are different rules about this. There are different rules for trademarks in Ad copy and in the keyword.
In terms of a trademark in your keyword, this is only unacceptable in: Australia, Brazil, China, Hong Kong, Macau, New Zealand, North Korea, South Korea, or Taiwan, and only after the trademark holder files a complaint.
So in general, Google's position is that (for the US, UK and EU at least) if the ad text doesn't contain the Tradmark text in question, it's ok if the keyword used to match on does, except for the countries you listed.
To be fair, a good sync service with history is supposed to be a backup at the same time. I'm using git rather than dropbox but if I push something, it's there forever. So, I'd have to think that Dropbox being able to completely destroy files is a bug, not a feature. Maybe a more experienced dropbox user can correct me?
The thing is, Dropbox treats their version as the primary copy. For instance, I added several directories to Dropbox, and decided that I didn't want it syncing a few of them. I de-selected the directories under "Selective Sync" and it removed them from my computer - even though my computer was the original source.
I do agree with you but, I can tell you that selling backup service is harder than you think. Also as pointed by paper , the human error accounts for ~50% of all system failures. And the worst thing is that majority of users who accidentally delete data, don't even notice data loss until lost data is needed and they don't recollect doing something wrong.
What I found out interesting that people (i.e., small business owners) will are scared of losing a credit card (even though you can call the bank and cancel your lost credit card and get a new one - inconvenience but not a big deal), but they will not backup critical company documents and data (even if they lose them the company will be pretty much closed - there is no "bank" to go to and get data back).
Why would a good sync service with history be a backup? They don't store a history indefinitely. Of course destroying files and history is a bug, but something that instantly replicates changes to those files from machine-to-machine shouldn't be considered a reliable way to recover them weeks or months later.
I use CrashPlan both on Mac and on Windows. On the Mac it works very well. On Windows it is okay, though it needs some hand holding -- sometimes the service gets stuck and needs to be restarted manually. Overall I find that it is better than Mozy (a competing service).
I love Crashplan -- especially because with essentially the same client, you can do backups to a public service, a hosted business service (multiple machines, centrally managed keys, etc.) or an enterprise hosted-by-yourself service.
Thanks for the feedback -- yes, this is something I plan on doing (amidst all the other tasks I'm juggling...).
I very commonly hear why people are using Tarsnap, and from time to time I hear why people are no longer using Tarsnap, but I very rarely hear why people never started using Tarsnap, so I really appreciate you taking the time to comment.
You can say that about any service--what if Crashplan's data center caught fire tomorrow? Tarsnap might be run by a single guy (I think?) but saying S3 is "more" or "less" reliable than any other private company isn't a great comparison. In any case a massive company like Amazon is the most likely to be reliable in these cases, I imagine.
My earlier point was that I think data is stored on Colin Percival's S3 account (he is the creator of Tarsnap) and therefore you might lose access to the data (if he couldn't pay the bills or got hit by a bus) even though S3 itself is fine.
I use Arq to backup files on my MacBook - it saves them to my S3 bucket, so even if everyone working on Arq dies (I sincerely hope they don't!) and their servers all explode, my backups are still intact on S3 - which is less likely to go down permanently and lose all my files than Tarsnap (not that that is particularly likely either!).
There are definitely tiers of reliability. An external drive drive is easily lost or damaged. A backup service can take individual drive/server losses but that might be the limit. S3 can lose an entire data center - the only real risk comes from software bugs.
This isn't really a concern for me, but I'd probably email Colin and ask for it. The tarsnap program is open-source, presumably if he stopped hosting it the server-side component could be replaced. The client does all the work.
If backblaze backs up my dropbox folder and this dropbox bug affects my files that are getting backedup, if I don't catch the 0 byte files within the 4 week backblaze revision window, won't I still lose my files?
Crashplan hangs on to deleted files for quite some time. I just used it last weekend to recover some stuff I'd deleted about a year ago and realized I wanted back. I was pleasantly surprised to find them still in the system.
There really is no difference between syncing and backup. Think of the backup/restore cycle as a sync and it will be clear. Backup is one way sync, restore completes the bidirectionality of sync.
Sync doesn't necessarily have the capability to destroy files, rsync has a switch to delete files that are locally deleted. However Dropbox is supposed to be rsync + rcs so this kind of problem is supposedly easy to fix by simply reverting to a good previous version.
No, it's only for UNIX like OSs and is command line only, making it impractical or putting it out of reach of many users, even those on HN. Putting it as the drop in replacement for Dropbox for backup is simply inaccurate. The services I listed are more applicable. And Percival's stature is irrelevant here.
You're the one who brought up that it's "just by a HN user", as if that was the most salient aspect of his life. And I don't think the top-level put Tarsnap forward as a drop-in replacement for Dropbox anyway.
(Hi all, I'm the PM for the Dropbox desktop client team.)
I just wanted to let you all know that we take any claims like this really seriously. There aren't any known bugs on the Dropbox side that would cause this, and unfortunately there are potential causes such as hardware errors, filesystem corruption, and other OS issues (including those like http://www.phoronix.com/scan.php?page=news_item&px=MTIxN... which another poster pointed out) that can corrupt data or create zero byte files.
Nevertheless we will continue to look into this just to be sure, and we also work hard to find ways for Dropbox to shield users even when the OS, disk or other components fail (our undelete/file revisions and Packrat are among these).
Thanks for posting here. The reason I'm confident that this bug is not due to filesystem errors on my machine's part is because Dropbox's version history for these files has become corrupted. If you read over the details and correspondence, you'll see that Dropbox engineers were able to recover two of my files that Dropbox's version history had reported as only having a 0-byte version. These files were edited recently, within the 30-day window that all Dropbox users, Packrat or not, have version control for.
The files had clearly been whole when first synced to Dropbox, but that version was not listed in Dropbox's history, and so I had no power to restore them. Even if my own disk had spontaneously 0-byte'd those files, this should not have caused Dropbox to lose the ability to restore it to its original version.
More circumstantially, others are reporting similar issues:
It's tough to tell from these small updates whether these users had their files 0-byte'd only locally, possibly by the ext4 bug, or whether they've also verified that it's not recoverable from Dropbox.
In addressing this bug report, please specifically address Dropbox's loss of version history. The two files that Dropbox engineers recovered had their version history wiped within the 30 day window.
Text of originalgeek's hellbanned(?) reply to this subthread:
I ran the find command suggested by the OP,
and it came up with a long list of files -- all
names that I had intentionally and manually deleted.
It seems when one of my "other" machines booted up,
it put 0 byte files back in their place. A review of
the file's history, by clicking Dropbox -> Browse on
Dropbox Website shows the original file, the day I deleted
it, then a few minutes later, a 0 byte file added back.
Edit: posted because it seemed like the sort of information I'd want if I worked at Dropbox and were looking for clues to the nature of the bug. If this isn't kosher let me know (rather than simply downvoting) and I'll delete the reposted comment, if possible.
I ran the find command suggested by the OP, and it came up with a long list of files -- all names that I had intentionally and manually deleted. It seems when one of my "other" machines booted up, it put 0 byte files back in their place. A review of the file's history, by clicking Dropbox -> Browse on Dropbox Website shows the original file, the day I deleted it, then a few minutes later, a 0 byte file added back.
I have all my data, media, and documents in Dropbox (80GB) and while I have 11255 zero-byte files, none of them are likely Dropbox's fault. Most of them are empty logfiles, .svn and .git noise in old projects, and the like.
How many of you is this affecting? I'd be curious to know how many keep the only copy of their file on dropbox, or any cloud service for that matter. I've never trusted any service enough to be solely responsibly for my important files.
I use Dropbox primarily as a tool to synch content between my desktop/laptop/phone, but any significant change I make to those files gets saved locally 100% of the time.
This affects me, though through backups and using Dropbox primarily as a sync service I don't think I've lost anything (I've got a decent sized list of zero byte files to go through).
I've confirmed files that should contain something (nav images for a website) and that do contain something in the original source folders stored elsewhere are empty in Dropbox. Interestingly, Ubuntu is one of the clients syncing to my Dropbox folders.
I don't keep anything of any serious importance in Dropbox, but it'd be a major drag and surely a large cause of versioning mistakes to maintain two copies on your local filesystem. I mean, it's built/advertised as being something that you can just treat like a regular magic folder.
Don't do this. I had some critical files which this stupid Dropbox sync system deleted without asking because I had removed the local copy of the folder on my machine. I couldn't even find a simple setting there, which says "don't delete files while syncing", or even something like "warn before deleting files". It totally sucks.
I'm afraid you have simply failed to grasp the meaning of syncing. The whole point of Dropbox is to have a folder which always has exactly the same contents on all your Dropbox installations, and the copy on their servers.
Well, if you know a bit more about sync, there are different types of it(One way, two way, 'echo only' etc.). Sure, I failed to understand all of how Dropbox worked (and that's why lost my files), but probably the previous sync software like SyncToy from MS had pampered me. Have you ever done contacts syncing on phones? If you look properly, it will give you all different options for syncing. Many sync software are extremely cautious about deleting multiple files on the server, and that is what I was expecting from Dropbox too.
PSA: with CrashPlan, you can simultaneously backup to your own server(s) or external hard drives right alongside their service. You can also backup to a friend's computer (this was actually their original business model--P2P backup).
What's the big deal? Once it's done - it's done. Unless you're on a metered data-plan, I don't see that initial upload as a huge problem assuming you can keep your system alive for the month it takes to perform the initial upload.
I have a backup on a USB, but updating that daily is a hassle hence why I used DropBox- their added 2 step auth was also a big bonus, but if I lose that file I'll have a hard time getting my passwords back..
As jpadvo said, I might look into making some one way backups to S3 or something.
It surprises me that someone would rely entirely on dropbox as their backup tool AND primary file location, especially when it happy manipulates the files locally.
I backup to an external HDD and to the cloud and still have the originals (as well as having extra copies again of my music and photos synced across my computers) - the more redundancy you have the better.
It sucks that so many people need bad stuff to happen to them to do something about it - I'm so thankful that storage became so cheap before anything really catastrophic happened to me. I've lost data in the past but it was back when so many things were offline, nowadays it's CRITICAL to have a good backup plan.
I didn't go into detail on my backup strategy in the post, but it's (slightly) better than that. I only use Dropbox for media - music, images, photos. Some if it is irreplaceable, but none of it is life-crucial. That stuff is backed up separately, and copied to private space on my web server.
I think I consider photos to be some of the most important stuff to be backed up... obviously you can live without them, but as you said, some photos are irreplaceable, and can hold so much meaning. At least with Facebook / Picasa Web Albums many of the most important photos have been backed up (potentially at a lower quality).
To be fair I never would have considered such a bug when using dropbox - I would probably have considered it safe considering you have a local copy of your data, especially since as other pointed out they present themselves as a backup service.
I almost got hit by this locally actually, I synchronize folders against multiple PCs, and the exact same thing happened, and a number of files had their bytes zeroed out and then this was propagated through the network. Thankfully I spotted it before all copies were overwritten and fixed it, but that's where you also want something like Time Machine.
Ugh, so many ways to lose data, even when you're doing the 'right' thing!
You're right, photos can be irreplaceable and I'm probably going to change how I back them up now. My choice of what is and isn't in Dropbox isn't actually based on importance -- but on what I both want on my laptop, but would want to restore if the laptop were stolen. Kind of weird criteria, but it (at one time) made sense to me.
Wouldn't this seem like something that can be easily solved. Dropbox or any rsync+rcs scheme will never destroy a file unless either one endpoint of the sync has truncated the file or there is a file system corruption. In the case of the former the rcs portion will take care of things, restore the pre-truncated file and have that sync'd around. In the case of a file system corruption Dropbox should be doing file integrity checks. For example some simple things that spring to mind, most file types b&w e built in signatures, header blocks, that validate the files extension type, if the file fails to "look like" the claimed extension type then don't sync it, it flag it for user attention.
I do Dropbox manually, I rsync a list of folders to a linode instance running mercurial. It's simpler, scriptable, more flexible and as fail safe as Dropbox. If something does get corrupted its really simple to go back in the version history. I can't remember the last time I had a file system corruption with ext3, I suppose they still happen, but not to me in solid use for years. Obviously my mercurial repository is also backed up regularly.
I don't agree that a backup system has to be restore tested periodically, instead I believe that the restore process has to be an integral part of the workflow. In my scenario, I rsync to hg, commit and push, then from other machines in my workflow (or more likely vm instances) I pull+rsync back. This way the backup and restore cycles are just part if the workflow and everything is version controlled at the same time.
Are you sure those files were actually truncated? Because the command will simply show files of 0 bytes, which is not necessarily an indication of a bug, only if the file was supposed to contain something.
Although Dropbox isn't a backup service, it offers file versioning, so you would think you could recover from such a situation. Google Drive also offers file versioning, but the devil is in the detail - they do delete older versions:
For RAID like protection with Dropbox and other providers, you can roll your own BRIC (Redundant Bunch of Independent Clouds). I did this using Tahoe-LAFS to stripe data across storage providers. Requires a bit of set up, has some caveats, but does work. If you use with duplicity you have versioning on top of a distributed, encrypted, redundant store.
Part of what makes this such a bad bug is because Dropbox's file versioning is suffering from it. I have file versioning enabled, but not all of my files were able to use it, because their pre-0-byte history had been wiped.
Never trust anyone but yourself with valuable files like family photos. I don't care if they promise 110% uptime and reliability. The files you really care about, in the very practical end, are your own responsibility regardless of how much money you pay to other people. You can sue them till you're blue but it won't get your files back.
To that extent I keep my important files backed up not just in Dropbox, but also to Crashplan, and to a spinning-rust hard drive especially kept for backups that I protect in my home. That's three points of failure I can recover from if something goes wrong; and if all three fail at once, then I probably have worse things to worry about, like the zombie apocalypse.
Well, that's why I trust two other companies, plus myself.
I don't know, doing basic backups aren't super-hard for someone who's reading HN. My setup is really just a regular Linux box with a 1TB HDD sitting in a closet of mine, with dynamic dns pointing to my own domain (not even a strict necessity), and I rsync it whenever I have new data like photos. That box in turn syncs to Crashplan and Dropbox, which is automatic.
Yeah rsync and the concept of a backup PC is beyond mere mortals, but for those of us here--if Dropbox (or any service) is your only backup, I think you can only blame yourself.
I've also had files zeroed out on a few occasions. I run Dropbox over a wide variety of machines and while this is completely anecdotal, the zero update has always come from one of my Ubuntu machines, which leads me to believe that this is specific to the Linux version of Dropbox.
Also, I had several occurrences throughout 2011 but so far hadn't lost anything during 2012, so either something has been fixed or I've just been lucky. :)
So, if you've noticed zero length files lately, can you check the timestamp on the last update? Is it recent or over a year ago? You can also go to that date in your event log in the Dropbox web UI and see what was happening around that time.
As an end user myself on Linux, I was worried to read this. I keep teaching materials in the Dropbox folder so only a small file size, but they are important. As mentioned below at http://news.ycombinator.com/item?id=4704176 I have developed the bad habit of editing files directly in Dropbox. I may need to revisit this and use Dropbox as a sync only, edit elsewhere and copy over.
find /home/keith/Dropbox -size 0
shows only cache files for deleted files, and some backup files that I saved while empty (I know those should be zero bytes).
A personal work around is a simple bash script to copy Dropbox directory to another with the date as directory name, I'm running this once a day or so. Then my normal old-school backup onto an external drive will catch each day's dropbox.
Surprising how convenient I found automatic file sync, and how quickly I came to trust the dropbox daemon running in the background on 3 computers!
If you're on Windows and want to quickly check if you have 0-byte files in your Dropbox, open the Dropbox folder in Explorer and type "size:empty" in the search box, and hit enter. It will show any 0-byte files.
If you'd like a batch file to check this, that you can run periodically (perhaps programmatically), try this:
for /r C:\Users\username\Dropbox %%F in (*.*) do (if %%~zF LSS 1 echo %%F)
If you'd like to echo that list to a file, change the middle line to:
(for /r C:\Users\username\Dropbox %%F in (*.*) do (if %%~zF LSS 1 echo %%F)) > C:\Temp\emptyfiles.txt
Dropbox has lost my files as well. I had kept some very important private files there which I just wanted to keep there without the need of updating them/syncing them. After uploading, I removed the local copy of that private top level folder, as I didn't want to keep a local copy in my computer. With the next sync, this STUPID SH*T removed the folder from their server as well, without any warning of any sort! Any system which deletes the files without asking, should just not be used or depended on.
It sounds like you don't quite understand how Dropbox is supposed to work - it replicates whatever you have locally to the cloud and back down to your other machines. Obviously with a design like that, deleting files from your local dropbox would delete them from other machines.
This isn't a case of dropbox losing your files, it's a case of you not understanding how the tool was designed and how it works.
Many sync software offer multiple types of sync (e.g. MS SyncToy), and many are extremely cautious before deleting multiple files. Dropbox didn't do that. Yes, I failed to totally understand how did it work, as I deleted one of the root level folders and expected it'd not sync a folder which is not there on the local machine, but one of the duties of software is to prevent or try to reduce common human errors.
Just added the following to the crontab on my laptop so when it boots up, it emails me about any zero byte files.
# Look for zero byte files in Dropbox
@reboot /usr/bin/find ~/Dropbox/ -type f -size 0
I also have full system backups going back three months, with two hourly incremental backups for the last two. So if I get an email about any zero byte files, I should be fine restoring them from my own backups.
Can someone clarify the pricing for Dropbox's packrat? Is it free for pro users and free users have option to pay 3.99/month to get it, or do you have to be a pro user and then you can pay $3.99/month to get it? I think its the later, but I'm finding the Dropbox help page for it really unclear (sounds like it is free for all pro users).
How viable would it be to keep all of your data in a git repository? Let's say I'm backing up 25gb of music, could I have Dropbox sync everything but the .git folder, and just do a git revert if shit goes down? Will it end up using twice as much space?
Look at git-annex, it works on top of git and is good at handling large files, and will not require twice the space.
If you use plain git, you'll probably need twice the space. Files stored in git are compressed, but music is already compressed, so there will be little savings. You'll also need a lot of memory if you want to copy files VIA git.
Plain regular git does keep a copy of it's data in your .git folder, and every checkout/clone/copy of that repository stores the data, and it stores all old copies of all files. That's how git works. It also makes it a bit unweildy for large files like that.
What's cool (about git) is that the hash revisions (i.e. what git uses instead of version numbers) is basically a checksum of every file and every old version of every file. So if an old version of a file changed, the checksum would change and you'd be on a different branch!
0 byte files are not necessarily a sign of a bug. There are many cases where you, or the tools you use, may intentionally create 0 byte files which would appear in DropBox. There is a problem only if you see empty files which you expected to contain data.