Hacker Newsnew | comments | show | ask | jobs | submitlogin
Dropbox Bug Can Permanently Lose Your Files (konklone.com)
193 points by joshuacc 852 days ago | comments



It's always important to remember the difference between a syncing service and a backup service. A syncing service sometimes feels like a backup, because you can use it to recover files if a local device is destroyed or lost. HOWEVER, any service capable of syncing files is equally capable of destroying them.

It's important to have an automated one-way backup system that you can manually restore from. Something like Tarsnap [1] looks like a really good possibility (I haven't used it myself, but it seems solid)

[1] http://www.tarsnap.com/gettingstarted.html

-----


This distinction is really without a difference. Dropbox consistently advertises themselves as both a syncing service as well as a backup service. Their tag line is even "Dropbox - Secure backup, sync and sharing made easy," with secure backup being ahead of sync and sharing. [1]

"Even if your computer has a meltdown, your stuff is always safe in Dropbox and can be restored in a snap.

In fact, if you're using the Dropbox desktop application, your files are backed up several times. The primary copy on your computer's hard drive is synced online and that copy is then backed up again for safety (emphasis mine). If you are using Dropbox to sync files between multiple computers, your files are backed up on those computers as well. If that isn't enough, Dropbox also keeps backups of all of your deleted and changed files too.

...

It's hard to imagine a scenario where Dropbox could lose your files. Hypothetically, let's say a nuclear bomb blows up the data centers where your files are saved. Even then, your files are still safe and sound on your computer and any other computers linked to your Dropbox account."

Clearly, though, Dropbox appears to have lost files (at least if we take it on faith that "these are bugs in Dropbox's syncing logic"), despite the fact that I see no mushroom cloud nearby.

[1] https://www.dropbox.com/help/122/en

-----


>Even then, your files are still safe and sound on your computer and any other computers linked to your Dropbox account."

Unless, as happened here, Dropbox erased the file and synced a blank version across all your computers. They they're safe and sound on any computer linked to your dropbox connection that hasn't been connected to the internet since the file got corrupted.

If you catch it in the 30 day window when they keep old versions you're find, but there are files in my Dropbox that I don't use every 30 days.

-----


As you said, "Dropbox erased the file and synced a blank version across all your computers." I agree. As such, I'm not sure why individuals are so quick to blame the end user just yet.

On the other hand, I don't want to immediately blame Dropbox just yet either. If you backup garbage (say, because you have disk corruption), then you can't blame Dropbox for backing up exactly what you told it to.

And Dropbox does offer a premium Packrat service if you want file history indefinitely. Perhaps the user can be blamed for assuming that he/she would only need 30 days of history, but this is really contingent on who caused the corruption to happen in the first place -- and that's unknown at the moment. [1]

[1] https://www.dropbox.com/help/113/en

-----


Agreed on not being sure that it's Dropbox's fault. But even if the user killed the file, you've got the same problem.

A person running Time Machine (or similar incremental backup system) is a lot safer from this sort of problem than a free tier Dropbox user. Free Dropbox is better than nothing, but people can't keep assuming their files are safe because "they're in the cloud" and get synced to a few places.

-----


"This distinction is really without a difference." "In fact, if you're using the Dropbox desktop application, your files are backed up several times." "It's hard to imagine a scenario where Dropbox could lose your files..."

Please stop. Right there. Stop believing some vendor marketing blindly (even if you do come to another conclusion later), stop reassuring other people who do so and stop calling a sync a backup. There is a very important distinction: Sync has mechanisms in place that are capable to touch the files on your backup. At least in dropbox' case these mechanisms are not completely separated from the initial backup mechanism of each version. I've had two almost catastrophical data losses with Dropbox until I was able to make that distinction. I've come to the conclusion that IT professionals should never ever treat a sync system as a backup system, and if you still think so please don't spread that advice to others.

-----


Also, never ever treat any system as a backup, if you haven't tested that restore actually works. Do that, and a likelihood of loosing your files would get a lot smaller.

-----


Agreed, and it helps to have 2 backups of all your data using different techniques/products. I run a local backup to cheap disk, network backup to remote location, and I suppose my remotely hosted svn is my third backup of most assets I really care about.

Most of us here are developers. How many of our products are perfect? Always assume something will fail in a new and interesting way in the future. I'm not letting Dropbox off the hook - their product shouldn't do this - but you'll be happier if you treat backups with the same level of redundancy and planning as the rest of your infrastructure.

-----


Remote backup is indeed very important. Imagine a fire in the server room. Where was the backup tape? Lying on top of the server? Oops...

-----


Yes - sync is not a backup but services as Dropbox are designed to be backup too (with their revision feature).

So I think the underlaying problem here is that any backup/syncing system might have a bug (like this one) or there might be operator or user error (deleting your revision history is just a couple clicks away). Recovery oriented computing website has a lot good papers on this topic [1].

This is very similar to problems with outages on Amazon EC2 - yes Amazon cloud is great but in order to make your service highly available you do need to have standby system on some other cloud (for example, we run on Rackspace but our standbys are on Amazon).

One approach to protect yourself against problems like this is to replicate/sync all your files from one cloud storage (your primary one) to some other cloud service (GDrive, SugarSync, Box, etc.). So should Dropbox have a bug, then you still have everything in other cloud service: including all revisions.

Services like cloudHQ [2] (that is my baby) can replicate and sync all your files from Dropbox to, for example, GDrive. And of course cloudHQ has options like "two-way" sync, "don't replicate deletion", "backup" (weekly incremental are in folders - so your will be fine even if "revisions" feature fails), etc.

[1] http://roc.cs.berkeley.edu/

[2] http://cloudHQ.net

-----


This is one of the reasons that SpiderOak (an encypted sync AND backup service) keeps historical versions of everything forever, until customers explicitly chose to remove them.

For services that purge old versions and deleted files at 30 days, you lose if you don't notice a problem promptly. You can't be expected to be watchful over gigabytes of data; that's the whole point of a backup service.

-----


Just for the record (no big issue but I am accustomed to founders/employees disclosing their affiliation)

  "I'm Alan Fairless, a co-founder at SpiderOak" [1]
[1] http://news.ycombinator.org/user?id=rarrrrrr

-----


Dropbox give you the choice, you can pay and have the versions stored forever or accept the lower level of service with 30 days for free. I am sure you have calculated the economics on keeping the extra copied for all non paying customers, it is possible it would be less sustainable.

As an aside, it seems Dropbox is biding on the [SpiderOak] keyword on Adwords, you should probably at least outbid them on your brand terms to reduce confusion/ misdirection for potential customers.

-----


As an aside, it seems Dropbox is biding on the [SpiderOak] keyword on Adword

So they are. Is it legally acceptable to bid on competitor trademarks? I thought that was regarded as being over the line these days - anyone know for sure?

If you look at the adwords link, it's clear that Dropbox is bidding on "competitor keywords" as a class.

-----


Depending on the country there are different rules about this[1]. There are different rules for trademarks in Ad copy and in the keyword.

In terms of a trademark in your keyword, this is only unacceptable in: Australia, Brazil, China, Hong Kong, Macau, New Zealand, North Korea, South Korea, or Taiwan, and only after the trademark holder files a complaint.

[1] http://support.google.com/adwordspolicy/bin/answer.py?hl=en&...

-----


So in general, Google's position is that (for the US, UK and EU at least) if the ad text doesn't contain the Tradmark text in question, it's ok if the keyword used to match on does, except for the countries you listed.

Thanks for the link.

-----


To be fair, a good sync service with history is supposed to be a backup at the same time. I'm using git rather than dropbox but if I push something, it's there forever. So, I'd have to think that Dropbox being able to completely destroy files is a bug, not a feature. Maybe a more experienced dropbox user can correct me?

-----


The difference is that a backup service shouldn't ever be able to corrupt your primary copy.

Bugs happen. If a bug happens on the sync-ing service and it trashes your 'back-up'/history then syncs and trashes your primary copy, your toast.

Sync'ing != Backup. They are for different problems and have different restrictions/pitfalls.

-----


The thing is, Dropbox treats their version as the primary copy. For instance, I added several directories to Dropbox, and decided that I didn't want it syncing a few of them. I de-selected the directories under "Selective Sync" and it removed them from my computer - even though my computer was the original source.

-----


Yes, sync != backup [1].

I do agree with you but, I can tell you that selling backup service is harder than you think. Also as pointed by paper [2], the human error accounts for ~50% of all system failures. And the worst thing is that majority of users who accidentally delete data, don't even notice data loss until lost data is needed and they don't recollect doing something wrong.

What I found out interesting that people (i.e., small business owners) will are scared of losing a credit card (even though you can call the bank and cancel your lost credit card and get a new one - inconvenience but not a big deal), but they will not backup critical company documents and data (even if they lose them the company will be pretty much closed - there is no "bank" to go to and get data back).

[1] http://blog.cloudhq.net/post/33844549768/the-difference-in-d... [2] http://roc.cs.berkeley.edu/talks/pdf/HP.pdf

-----


Not sure it is harder than I think. Other than that, I haven't nothing else to disagree with. :)

-----


Why would a good sync service with history be a backup? They don't store a history indefinitely. Of course destroying files and history is a bug, but something that instantly replicates changes to those files from machine-to-machine shouldn't be considered a reliable way to recover them weeks or months later.

-----


Dropbox does have "Packrat unlimited undo history" for $4/month. I wonder if he could have recovered these files if he had this option.

-----


Does git push+pull not count as a "good sync service"?

-----


Tarsnap is amazing for Unix systems. Flat out, one of the coolest services I've ever used, and one of the cheapest. I'm backing up my VPS with it, and I can't recommend it enough.

I have a friend that strongly recommends CrashPlan, but I haven't tried it out yet on my Mac. I'm curious to though.

-----


I use CrashPlan both on Mac and on Windows. On the Mac it works very well. On Windows it is okay, though it needs some hand holding -- sometimes the service gets stuck and needs to be restarted manually. Overall I find that it is better than Mozy (a competing service).

-----


If the service tends to hang, you might need to assign it more than the 512MB RAM it defaults to.

The setting is controlled thru the CrashPlanService.ini file.

-----


Seconded. I switched from Mozy and would not go back.

-----


I love Crashplan -- especially because with essentially the same client, you can do backups to a public service, a hosted business service (multiple machines, centrally managed keys, etc.) or an enterprise hosted-by-yourself service.

-----


Doesn't Tarsnap store your data on their S3 account, rather than your own? If so, if how do you get your data back from Amazon if Tarsnap vanished tomorrow?

-----


Tarsnap isn't going to vanish tomorrow. It's steadily profitable so I don't need to worry about "runway".

Even if I get hit by lightning tomorrow, the service runs perfectly fine on its own for months at a time, so you'd have plenty of time to get your data back.

-----


How would people know if anything had happened to you and that they should begin to recover data - is there a dead man switch or notification procedure in place?

p.s. Please avoid golf courses this weekend!

-----


  >> is there a dead man switch or notification procedure in place?
The absence of weekly HN posts.

-----


is there a dead man switch or notification procedure in place?

There are people who should send out that notification if needed, yes.

p.s. Please avoid golf courses this weekend!

Don't worry, I don't play golf. :-)

-----


It would be worth formalizing your bus plan. I specifically chose a mainstream backup service over Tarsnap because you're running a one man shop.

-----


Thanks for the feedback -- yes, this is something I plan on doing (amidst all the other tasks I'm juggling...).

I very commonly hear why people are using Tarsnap, and from time to time I hear why people are no longer using Tarsnap, but I very rarely hear why people never started using Tarsnap, so I really appreciate you taking the time to comment.

-----


Please do! When I explained your service to a friend, this was the only "except for that ...", the only tar-snag, if you will.

-----


You can say that about any service--what if Crashplan's data center caught fire tomorrow? Tarsnap might be run by a single guy (I think?) but saying S3 is "more" or "less" reliable than any other private company isn't a great comparison. In any case a massive company like Amazon is the most likely to be reliable in these cases, I imagine.

-----


That's why I cobbled together a poor man's cloud RAID :-) http://news.ycombinator.com/item?id=4689238

My earlier point was that I think data is stored on Colin Percival's S3 account (he is the creator of Tarsnap) and therefore you might lose access to the data (if he couldn't pay the bills or got hit by a bus) even though S3 itself is fine.

-----


I use Arq to backup files on my MacBook - it saves them to my S3 bucket, so even if everyone working on Arq dies (I sincerely hope they don't!) and their servers all explode, my backups are still intact on S3 - which is less likely to go down permanently and lose all my files than Tarsnap (not that that is particularly likely either!).

-----


There are definitely tiers of reliability. An external drive drive is easily lost or damaged. A backup service can take individual drive/server losses but that might be the limit. S3 can lose an entire data center - the only real risk comes from software bugs.

-----


This isn't really a concern for me, but I'd probably email Colin and ask for it. The tarsnap program is open-source, presumably if he stopped hosting it the server-side component could be replaced. The client does all the work.

-----


What if Colin is dead? It's a backup system for the truly paranoid so this is a legit question.

-----


The truly paranoid use multiple backups. The encryption ensures that without the keys, nobody else could make use of the data anyway.

-----


Tarsnap is really good, I use it from some of my servers to do backup.

Maybe I could plug my new app here as well, tidy.io[1] lets you archive or backup your files directly to and from your Dropbox. Feedback is always appreciated!

[1]: https://www.tidy.io/

-----


I've been a very happy Backblaze customer for over a year now.

http://www.backblaze.com/

-----


If backblaze backs up my dropbox folder and this dropbox bug affects my files that are getting backedup, if I don't catch the 0 byte files within the 4 week backblaze revision window, won't I still lose my files?

-----


Crashplan hangs on to deleted files for quite some time. I just used it last weekend to recover some stuff I'd deleted about a year ago and realized I wanted back. I was pleasantly surprised to find them still in the system.

It has a lot of options for retaining old versions, too: http://support.crashplan.com/doku.php/reference/version_rete...

-----


Yep. And your wife will divorce you.

-----


Agreed. Early in my Dropbox days I lost some files due to my account going over the 2GB limit. Since then, I've treated Dropbox as a syncing server, not a backup service.

My current scheme is to make rolling snapshots of my Dropbox folder backed up to a local RAID array which backs up to a separate RAID array nightly. More info: http://aaronparecki.com/2010/190/article/1/how-to-back-up-dr...

Hackernews link for that post if you're in to that sort of thing: http://news.ycombinator.com/item?id=4704667

-----


Good idea.

You could also add a bit to the script explicitly looking for zero byte files and alert yourself to their formation.

-----


HOWEVER, any service capable of syncing files is equally capable of destroying them.

That's simply not true (I've built sync systems that are incapable of destroying files).

IMO, syncing designs that do have unrevokable overwrites are inherently brittle. (I don't know if Dropbox is built that way, but AFIAK, iCloud (and MobileMe before it) is -- and it sucks.)

-----


Dropbox is revocable, the issue is for how long. In addition to online versions, the client holds on to old versions for three days. It used to be longer but the cache could get unreasonably large.

-----


"Permanently Lose Your Files" -> "I am able to restore each file from Dropbox's version history"

It's also important to remember the meaning of the word "permanently", while this is a UX disaster, the title is misleading.

-----


...except the author permanently lost files that were 0-byted before some sort of internal Dropbox process wiped the version history.

-----


Tarsnap isn't the best choice in most cases, it's just by a HN user. But backblaze, carbonite, crashplan, spideroak are more practical/windows available options.

-----


Tarsnap is great and secure. Not just any HN user, Colin Percival, security officer for Freebsd and a Putnam fellow. It's a very practical and well priced option.

-----


No, it's only for UNIX like OSs and is command line only, making it impractical or putting it out of reach of many users, even those on HN. Putting it as the drop in replacement for Dropbox for backup is simply inaccurate. The services I listed are more applicable. And Percival's stature is irrelevant here.

-----


You're the one who brought up that it's "just by a HN user", as if that was the most salient aspect of his life. And I don't think the top-level put Tarsnap forward as a drop-in replacement for Dropbox anyway.

-----


There really is no difference between syncing and backup. Think of the backup/restore cycle as a sync and it will be clear. Backup is one way sync, restore completes the bidirectionality of sync.

Sync doesn't necessarily have the capability to destroy files, rsync has a switch to delete files that are locally deleted. However Dropbox is supposed to be rsync + rcs so this kind of problem is supposedly easy to fix by simply reverting to a good previous version.

-----


(Hi all, I'm the PM for the Dropbox desktop client team.)

I just wanted to let you all know that we take any claims like this really seriously. There aren't any known bugs on the Dropbox side that would cause this, and unfortunately there are potential causes such as hardware errors, filesystem corruption, and other OS issues (including those like http://www.phoronix.com/scan.php?page=news_item&px=MTIxN... which another poster pointed out) that can corrupt data or create zero byte files.

Nevertheless we will continue to look into this just to be sure, and we also work hard to find ways for Dropbox to shield users even when the OS, disk or other components fail (our undelete/file revisions and Packrat are among these).

-----


Matt,

Thanks for posting here. The reason I'm confident that this bug is not due to filesystem errors on my machine's part is because Dropbox's version history for these files has become corrupted. If you read over the details and correspondence, you'll see that Dropbox engineers were able to recover two of my files that Dropbox's version history had reported as only having a 0-byte version. These files were edited recently, within the 30-day window that all Dropbox users, Packrat or not, have version control for.

The files had clearly been whole when first synced to Dropbox, but that version was not listed in Dropbox's history, and so I had no power to restore them. Even if my own disk had spontaneously 0-byte'd those files, this should not have caused Dropbox to lose the ability to restore it to its original version.

More circumstantially, others are reporting similar issues:

https://twitter.com/dangillmor/status/261921738441515009 - https://twitter.com/pc1oad1etter/status/261957001234505728 - https://twitter.com/frr149/status/261957708746469378 - http://news.ycombinator.com/item?id=4704236 - http://news.ycombinator.com/item?id=4704178 - http://news.ycombinator.com/item?id=4704063 - http://news.ycombinator.com/item?id=4704485

It's tough to tell from these small updates whether these users had their files 0-byte'd only locally, possibly by the ext4 bug, or whether they've also verified that it's not recoverable from Dropbox.

In addressing this bug report, please specifically address Dropbox's loss of version history. The two files that Dropbox engineers recovered had their version history wiped within the 30 day window.

-----


Text of originalgeek's hellbanned(?) reply to this subthread:

  I ran the find command suggested by the OP, 
  and it came up with a long list of files -- all 
  names that I had intentionally and manually deleted. 
  It seems when one of my "other" machines booted up, 
  it put 0 byte files back in their place. A review of 
  the file's history, by clicking Dropbox -> Browse on 
  Dropbox Website shows the original file, the day I deleted 
  it, then a few minutes later, a 0 byte file added back.
Edit: posted because it seemed like the sort of information I'd want if I worked at Dropbox and were looking for clues to the nature of the bug. If this isn't kosher let me know (rather than simply downvoting) and I'll delete the reposted comment, if possible.

-----


Why was this reply hell banned?

-----


The user who posted it presumably ran afoul of a rule (possibly an unwritten one) in an earlier comment. You'd need to dig through his/her comment history to try to figure out what happened.

-----


Sometimes. There is also a flawed system that can auto-hellban people unlucky enough to get a bunch of downvotes on an early comment, before they have a karma buffer.

-----


I ran the find command suggested by the OP, and it came up with a long list of files -- all names that I had intentionally and manually deleted. It seems when one of my "other" machines booted up, it put 0 byte files back in their place. A review of the file's history, by clicking Dropbox -> Browse on Dropbox Website shows the original file, the day I deleted it, then a few minutes later, a 0 byte file added back.

-----


I have all my data, media, and documents in Dropbox (80GB) and while I have 11255 zero-byte files, none of them are likely Dropbox's fault. Most of them are empty logfiles, .svn and .git noise in old projects, and the like.

-----


How many of you is this affecting? I'd be curious to know how many keep the only copy of their file on dropbox, or any cloud service for that matter. I've never trusted any service enough to be solely responsibly for my important files.

I use Dropbox primarily as a tool to synch content between my desktop/laptop/phone, but any significant change I make to those files gets saved locally 100% of the time.

I am not a very trusting man.

-----


This affects me, though through backups and using Dropbox primarily as a sync service I don't think I've lost anything (I've got a decent sized list of zero byte files to go through).

I've confirmed files that should contain something (nav images for a website) and that do contain something in the original source folders stored elsewhere are empty in Dropbox. Interestingly, Ubuntu is one of the clients syncing to my Dropbox folders.

-----


Somewhere between 2000-2892 Files...

I'm turning Dropbox off now and disabling it on startup.

I'm thankful for this post, now to figure out what's going on.

-----


My photos folder was zeroed... =/

-----


2 files here. it did delete a boilerplate site i was working on...thank goodness i backed it up locally.

-----


I think I have a couple files affected, but I can't tell for sure--thankfully, none of them are critical, and may just be generated as temp files.

That said, I'm definitely writing a script to do nightly backups of the contents of my dropbox folder going forward.

-----


Just use something like rsnapshot http://www.rsnapshot.org/

-----


Two files were affected, but I have local copies fortunately.

-----


I don't keep anything of any serious importance in Dropbox, but it'd be a major drag and surely a large cause of versioning mistakes to maintain two copies on your local filesystem. I mean, it's built/advertised as being something that you can just treat like a regular magic folder.

-----


Don't do this. I had some critical files which this stupid Dropbox sync system deleted without asking because I had removed the local copy of the folder on my machine. I couldn't even find a simple setting there, which says "don't delete files while syncing", or even something like "warn before deleting files". It totally sucks.

-----


I'm afraid you have simply failed to grasp the meaning of syncing. The whole point of Dropbox is to have a folder which always has exactly the same contents on all your Dropbox installations, and the copy on their servers.

-----


Well, if you know a bit more about sync, there are different types of it(One way, two way, 'echo only' etc.). Sure, I failed to understand all of how Dropbox worked (and that's why lost my files), but probably the previous sync software like SyncToy from MS had pampered me. Have you ever done contacts syncing on phones? If you look properly, it will give you all different options for syncing. Many sync software are extremely cautious about deleting multiple files on the server, and that is what I was expecting from Dropbox too.

-----


Dropbox for syncing.

Crashplan for backing up.

That combo hasn't failed me yet.

-----


I quite like Crashplan, but they have lost data for customers before.

http://jeffreydonenfeld.com/blog/2011/12/crashplan-online-ba...

-----


PSA: with CrashPlan, you can simultaneously backup to your own server(s) or external hard drives right alongside their service. You can also backup to a friend's computer (this was actually their original business model--P2P backup).

-----


Clarification: you can back up to your own servers iff said servers are also running Crashplan. What you can't do, unfortunately, is back up to an ordinary shared folder.

-----


Further clarification: this is correct, but (if someone is reading this and thinking about doing it), you can install the free version of CrashPlan and it works just fine as a backup server.

-----


It's kind of a ram hog, though.

-----


I'd love to use CrashPlan for the bulk of my older projects but I have 350GB data and as I'm from the UK there is no way to seed it. It would take me something like a month to upload all that data.

-----


What's the big deal? Once it's done - it's done. Unless you're on a metered data-plan, I don't see that initial upload as a huge problem assuming you can keep your system alive for the month it takes to perform the initial upload.

-----


Yep, I'm affected by this bug too.. Thankfully nothing important, but makes me wonder if I should leave my truecrypt folder on DropBox.

-----


You should both distribute your file you want to keep to several devices, as well as a service who's job it is to back things up.

Dropbox is only the former.

-----


I have a backup on a USB, but updating that daily is a hassle hence why I used DropBox- their added 2 step auth was also a big bonus, but if I lose that file I'll have a hard time getting my passwords back..

As jpadvo said, I might look into making some one way backups to S3 or something.

-----


Buy a time capsule and back it up locally or get a subscription to something like crash plan pro.

-----


It surprises me that someone would rely entirely on dropbox as their backup tool AND primary file location, especially when it happy manipulates the files locally.

I backup to an external HDD and to the cloud and still have the originals (as well as having extra copies again of my music and photos synced across my computers) - the more redundancy you have the better.

It sucks that so many people need bad stuff to happen to them to do something about it - I'm so thankful that storage became so cheap before anything really catastrophic happened to me. I've lost data in the past but it was back when so many things were offline, nowadays it's CRITICAL to have a good backup plan.

-----


I didn't go into detail on my backup strategy in the post, but it's (slightly) better than that. I only use Dropbox for media - music, images, photos. Some if it is irreplaceable, but none of it is life-crucial. That stuff is backed up separately, and copied to private space on my web server.

-----


I think I consider photos to be some of the most important stuff to be backed up... obviously you can live without them, but as you said, some photos are irreplaceable, and can hold so much meaning. At least with Facebook / Picasa Web Albums many of the most important photos have been backed up (potentially at a lower quality).

To be fair I never would have considered such a bug when using dropbox - I would probably have considered it safe considering you have a local copy of your data, especially since as other pointed out they present themselves as a backup service.

I almost got hit by this locally actually, I synchronize folders against multiple PCs, and the exact same thing happened, and a number of files had their bytes zeroed out and then this was propagated through the network. Thankfully I spotted it before all copies were overwritten and fixed it, but that's where you also want something like Time Machine.

Ugh, so many ways to lose data, even when you're doing the 'right' thing!

-----


You're right, photos can be irreplaceable and I'm probably going to change how I back them up now. My choice of what is and isn't in Dropbox isn't actually based on importance -- but on what I both want on my laptop, but would want to restore if the laptop were stolen. Kind of weird criteria, but it (at one time) made sense to me.

-----


Isn't this related to Ext4 bug that was found recently? [1]

[1] http://www.phoronix.com/scan.php?page=news_item&px=MTIxN...

-----


If you're on windows the following Powershell may help diagnose if you've been affected:

    gci $env:USERPROFILE\Dropbox -r | where { $_.Length -eq 0 }

-----


Wow.

15 Files affected here. I haven't checked to see if any of them are unrecoverable (none of them are vital), but this does seem like a very bad bug.

-----


Are you sure those files were actually truncated? Because the command will simply show files of 0 bytes, which is not necessarily an indication of a bug, only if the file was supposed to contain something.

-----


Although Dropbox isn't a backup service, it offers file versioning, so you would think you could recover from such a situation. Google Drive also offers file versioning, but the devil is in the detail - they do delete older versions:

https://support.google.com/drive/bin/answer.py?hl=en&ans...

For RAID like protection with Dropbox and other providers, you can roll your own BRIC (Redundant Bunch of Independent Clouds). I did this using Tahoe-LAFS to stripe data across storage providers. Requires a bit of set up, has some caveats, but does work. If you use with duplicity you have versioning on top of a distributed, encrypted, redundant store.

http://news.ycombinator.com/item?id=4689238

-----


Part of what makes this such a bad bug is because Dropbox's file versioning is suffering from it. I have file versioning enabled, but not all of my files were able to use it, because their pre-0-byte history had been wiped.

-----


Never trust anyone but yourself with valuable files like family photos. I don't care if they promise 110% uptime and reliability. The files you really care about, in the very practical end, are your own responsibility regardless of how much money you pay to other people. You can sue them till you're blue but it won't get your files back.

To that extent I keep my important files backed up not just in Dropbox, but also to Crashplan, and to a spinning-rust hard drive especially kept for backups that I protect in my home. That's three points of failure I can recover from if something goes wrong; and if all three fail at once, then I probably have worse things to worry about, like the zombie apocalypse.

-----


The last person I trust with valuable files like family photos is myself.

I am a bad sysadmin, bad at back-ups, bad at security, bad at redundancy. And I would guess that describes 99.99% of people who care about family photos.

-----


Well, that's why I trust two other companies, plus myself.

I don't know, doing basic backups aren't super-hard for someone who's reading HN. My setup is really just a regular Linux box with a 1TB HDD sitting in a closet of mine, with dynamic dns pointing to my own domain (not even a strict necessity), and I rsync it whenever I have new data like photos. That box in turn syncs to Crashplan and Dropbox, which is automatic.

Yeah rsync and the concept of a backup PC is beyond mere mortals, but for those of us here--if Dropbox (or any service) is your only backup, I think you can only blame yourself.

-----


If you really care about photos, you will print them out, and put them in a photo album, and print them again, and put them in a safe deposit box or mail them to some relatives.

-----


I've also had files zeroed out on a few occasions. I run Dropbox over a wide variety of machines and while this is completely anecdotal, the zero update has always come from one of my Ubuntu machines, which leads me to believe that this is specific to the Linux version of Dropbox.

Also, I had several occurrences throughout 2011 but so far hadn't lost anything during 2012, so either something has been fixed or I've just been lucky. :)

So, if you've noticed zero length files lately, can you check the timestamp on the last update? Is it recent or over a year ago? You can also go to that date in your event log in the Dropbox web UI and see what was happening around that time.

-----


Happened to my mother and her friend a week ago. Only they lost all their files. They weren't permanently unrecoverable, thankfully, only deleted.

The trust my mother had in dropbox is now gone, and probably will remain so for the next couple years.

-----


As an end user myself on Linux, I was worried to read this. I keep teaching materials in the Dropbox folder so only a small file size, but they are important. As mentioned below at http://news.ycombinator.com/item?id=4704176 I have developed the bad habit of editing files directly in Dropbox. I may need to revisit this and use Dropbox as a sync only, edit elsewhere and copy over.

The command

   find /home/keith/Dropbox -size 0
shows only cache files for deleted files, and some backup files that I saved while empty (I know those should be zero bytes).

A personal work around is a simple bash script to copy Dropbox directory to another with the date as directory name, I'm running this once a day or so. Then my normal old-school backup onto an external drive will catch each day's dropbox.

Surprising how convenient I found automatic file sync, and how quickly I came to trust the dropbox daemon running in the background on 3 computers!

-----


If you're on Windows and want to quickly check if you have 0-byte files in your Dropbox, open the Dropbox folder in Explorer and type "size:empty" in the search box, and hit enter. It will show any 0-byte files.

If you'd like a batch file to check this, that you can run periodically (perhaps programmatically), try this:

  @echo off
  for /r C:\Users\username\Dropbox %%F in (*.*) do (if %%~zF LSS 1 echo %%F)
  pause
If you'd like to echo that list to a file, change the middle line to:

  (for /r C:\Users\username\Dropbox %%F in (*.*) do (if %%~zF LSS 1 echo %%F)) > C:\Temp\emptyfiles.txt

-----


Thinking about how dropbox might be implemented I wonder if the following series of events is possible:

1) Process which syncs files to the server fails to get access to a local fils, sees it as length 0

2) Process proceeds to tell the server file is length 0 and the server updates the file to be 0 length.

3) Access to the local file is restored and client notices that the server file is 'newer' than the local client version and it was length zero so it truncates the client copy to length 0.

The thing is, I can imagine a number of scenarios where the local file might seem to be zero length (oplocks on NTFS volumes being one)

-----


Dropbox has lost my files as well. I had kept some very important private files there which I just wanted to keep there without the need of updating them/syncing them. After uploading, I removed the local copy of that private top level folder, as I didn't want to keep a local copy in my computer. With the next sync, this STUPID SH*T removed the folder from their server as well, without any warning of any sort! Any system which deletes the files without asking, should just not be used or depended on.

-----


It sounds like you don't quite understand how Dropbox is supposed to work - it replicates whatever you have locally to the cloud and back down to your other machines. Obviously with a design like that, deleting files from your local dropbox would delete them from other machines.

This isn't a case of dropbox losing your files, it's a case of you not understanding how the tool was designed and how it works.

-----


Many sync software offer multiple types of sync (e.g. MS SyncToy), and many are extremely cautious before deleting multiple files. Dropbox didn't do that. Yes, I failed to totally understand how did it work, as I deleted one of the root level folders and expected it'd not sync a folder which is not there on the local machine, but one of the duties of software is to prevent or try to reduce common human errors.

-----


Dropbox is way simpler than that. It syncs precisely one folder.

There is really no way for it to not sync file deletions without making a massive clutter.

-----


You deleted the folder, and dropbox synced the deletion. This is expected behaviour. I don't know why you entrusted crucial private files to a service you didn't fully understand.

-----


You can always restore files using their web interface. But hurry because you have only 30 days!

-----


I would say that Dropbox's actions were almost Darwinian in nature.

-----


Just added the following to the crontab on my laptop so when it boots up, it emails me about any zero byte files.

  # Look for zero byte files in Dropbox
  MAILTO=my.email.address@example.com
  @reboot /usr/bin/find ~/Dropbox/ -type f -size 0
I also have full system backups going back three months, with two hourly incremental backups for the last two. So if I get an email about any zero byte files, I should be fine restoring them from my own backups.

-----


No problems on my system, but I make a point of backing up everything sensitive to an external hard drive occasionally, and using Arq to do hourly backups to S3.

-----


For all the people saying you need a local backup, in this case you not only need to keep a local backup, but also it have to be a versioned one.

-----


Backup isn't backup unless it's versioned. Otherwise, how do you restore that file you accidentally deleted a few days ago?

-----


FYI: I'm pretty sure Windows users can navigate to their Dropbox folder and type 0 bytes in the search bar to see if they were affected.

-----


I had to type `size: 0 bytes`

-----


Can someone clarify the pricing for Dropbox's packrat? Is it free for pro users and free users have option to pay 3.99/month to get it, or do you have to be a pro user and then you can pay $3.99/month to get it? I think its the later, but I'm finding the Dropbox help page for it really unclear (sounds like it is free for all pro users).

-----


How viable would it be to keep all of your data in a git repository? Let's say I'm backing up 25gb of music, could I have Dropbox sync everything but the .git folder, and just do a git revert if shit goes down? Will it end up using twice as much space?

-----


Look at git-annex, it works on top of git and is good at handling large files, and will not require twice the space.

If you use plain git, you'll probably need twice the space. Files stored in git are compressed, but music is already compressed, so there will be little savings. You'll also need a lot of memory if you want to copy files VIA git.

-----


Thanks for the suggestion - it also led me to some other cool projects like git-media.

If git doesn't keep its own copy of the data, a Dropbox "failure" would still be unrecoverable, right?

-----


If git doesn't keep its own copy of the data

Plain regular git does keep a copy of it's data in your .git folder, and every checkout/clone/copy of that repository stores the data, and it stores all old copies of all files. That's how git works. It also makes it a bit unweildy for large files like that.

What's cool (about git) is that the hash revisions (i.e. what git uses instead of version numbers) is basically a checksum of every file and every old version of every file. So if an old version of a file changed, the checksum would change and you'd be on a different branch!

-----


That makes a lot of sense, thanks!

-----


> 2 other files (precious family photos) were also affected, but it happened recently enough to be recovered manually by Dropbox engineers.

It's awful that it had to come to that, but it's reassuring that they will be willing to work with you on that level.

-----


Huh. Is there an easy way to confirm files affected versus files that were intentionally set to zero bytes?

-----


My content is fine, I have about 5.5 GB of content, and usually push content using the Windows client.

-----


Remember, cloud storage doesn't count as a backup if its the only place you've stored your files.

-----


Well, ... it is DROPbox...?

-----


This guy didn't backup his computer before upgrading the OS? That was his first mistake.

-----


I backed up everything else. :) I usually back up my Dropbox just in case, too, but I just didn't this time. I won't make that mistake again.

-----


I do this all the time. It is the strength of dropbox that I can lean on it to do a fresh install of the OS and simply get my files back via DB.

-----


I've had a similar thing happen to me, but I was able use a python script to pull the (hidden, old) files from a cache on my machine. http://www.dropboxwiki.com/TipsAndTricks/RestoreFilesAndDire...

-----


Everything can.

-----


Wouldn't this seem like something that can be easily solved. Dropbox or any rsync+rcs scheme will never destroy a file unless either one endpoint of the sync has truncated the file or there is a file system corruption. In the case of the former the rcs portion will take care of things, restore the pre-truncated file and have that sync'd around. In the case of a file system corruption Dropbox should be doing file integrity checks. For example some simple things that spring to mind, most file types b&w e built in signatures, header blocks, that validate the files extension type, if the file fails to "look like" the claimed extension type then don't sync it, it flag it for user attention.

I do Dropbox manually, I rsync a list of folders to a linode instance running mercurial. It's simpler, scriptable, more flexible and as fail safe as Dropbox. If something does get corrupted its really simple to go back in the version history. I can't remember the last time I had a file system corruption with ext3, I suppose they still happen, but not to me in solid use for years. Obviously my mercurial repository is also backed up regularly.

I don't agree that a backup system has to be restore tested periodically, instead I believe that the restore process has to be an integral part of the workflow. In my scenario, I rsync to hg, commit and push, then from other machines in my workflow (or more likely vm instances) I pull+rsync back. This way the backup and restore cycles are just part if the workflow and everything is version controlled at the same time.

-----


At least one of my files was effected and not recoverable (hopefully the zero-size test is reliable).

-----


0 byte files are not necessarily a sign of a bug. There are many cases where you, or the tools you use, may intentionally create 0 byte files which would appear in DropBox. There is a problem only if you see empty files which you expected to contain data.

-----


I mean, I know there are idiots on the Internet, but at some point your assumption that someone is an idiot is idiotic. I know where this file came from and I haven't touched it. Dropbox hammered it.

-----


Take it easy, I wasn't trying insult you or suggest that I believe you don't know what you are doing.

I read "hopefully the zero-size test is reliable" as "is a reliable way to detect if there is a problem with my files" and that is what I was trying to comment on. Apparently I misunderstood.

-----


Obviously...

-----




Applications are open for YC Summer 2015

Guidelines | FAQ | Support | Lists | Bookmarklet | DMCA | Y Combinator | Apply | Contact

Search: