Bitcasa: Infinite Storage (really?)

seldo · on Sept 13, 2011

I don't care how smart your technology is, representing your product as "infinite storage" is inaccurate and misleading. There is obviously no such thing as infinite storage on a finite device, and saying you don't have to use a "clunky web service" implies it's not a web service, when it obviously is.

I respect their need to find a sexy way to sell a cloud-storage-and-sync product, especially since it's an increasingly crowded space, but this seems a bit sleazy.

freshhawk · on Sept 13, 2011

Well the "You are at the back of the queue, but post this link into your twitter or facebook stream and get your beta invite sooner" message after signing up for the beta didn't help with the sleazy/scammy vibe, that's for sure.

achille · on Sept 13, 2011

They never mentioned it was free. So it's probably pay-for-what-you-use amazon backed storage. I haven't seen anyone complain about Jungledisk using the "Unlimited cloud drive storage" tagline.

sp332 · on Sept 13, 2011

Oh, I like this interpretation. Unlimited data for unlimited money :)

persona · on Sept 13, 2011

maybe :) they announced it at $10/mo.

itsnotvalid · on Sept 13, 2011

The finite part is the bandwidth you gonna get. If you can't get them uploaded, you ain't gonna get it filled.

sp332 · on Sept 13, 2011

I didn't see anything that looked like a webapp. It could be like Dropbox, right?

cleverjake · on Sept 13, 2011

what would you call a service that allows you to upload an unlimited amount of files then? I honestly felt it sounded like most of "unlimited" plans - backblaze, yahoo inbox, amazon prime, etc

seldo · on Sept 13, 2011

My problem is not the "unlimited" part so much as the way it obfuscates the fact that it achieves this by storing your files online. To a technologist it's obvious what's going on; to the average user watching this video, I don't think it's clear that this product requires Internet connectivity to function -- in a couple places, it even appears to suggest that it doesn't, e.g. the "clunky web service" line.

cleverjake · on Sept 13, 2011

Understood, but I think that everyone save the most novice of all users understands that any subscription service they are paying for is not a piece of hardware. And therefore, since you cannot increase storage on a harddrive, it must go somewhere other than your computer, which implies a connected network. Am I being too optimistic?

itsnotvalid · on Sept 13, 2011

Clearly they need to work on a better video. The disrupt showcase was much more informational on those aspects. They mentioned caches, pricing, etc.

troymc · on Sept 13, 2011

Oh good, I can finally build a simulation of a Turing Machine (with an infinite tape).

X4 · on Sept 13, 2011

hahah, made my day!

evgen · on Sept 13, 2011

When reading this so many red-flag warnings popped up that I could not decide if they were deluded or frauds. I will give them the benefit of the doubt and simply guess that they are deluded regarding the storage savings they will eventually realize via de-dupe. Given that the founding team includes a bizdev guy, a marketing guy, and a sysadmin/ops guy I am guessing that they can put together a nice pitch and powerpoint, have some easy answers to the operational problems of competing in this space, and have a lot of hand-waving answers to the harder technical problems. The latter problem will probably end up killing them unless we are just seeing a PR dump in preparation for a pivot to being just another online backup company.

One major fly in this whole "de-dupe" claim is that it will probably not even work out for them even if they did have some magic sauce to dance around the de-dupe/crypto conflict others have noted. The problem is that the files which users actually care about and want to back up are not the common files but the ones which make their data unique; it is not the mp3s or hollywood videos that matter, it is the data/content that each person has created that matters. If my disks crashed, my online and offline backups were corrupted, and I needed to rebuild my system I could get the common OS files in a hour, the mp3s and videos would take a few weeks of passive torrenting, but the pictures, home videos, and personal documents would be gone forever. It is these files that matter for a backup services, and they are not going to be something that you can de-dupe even if they were not encrypted.

Back when the term "de-dupe" did not exist and convergent encryption was something we were inventing before the term existed the thought was that a backup service employing these techniques would enjoy a massive savings in storage costs -- it turned out that people cared less about backing up the data that was easy to de-dupe and original data was a much larger portion of what users uploaded than we expected. This was back when pics were a meg or two and personal video was low bitrate, now that even a mobile phone is dumping multi-meg pictures and you can get a HD video camera for a hundred bucks I cannot imagine how anyone would convince themselves that de-dupe is going to make any significant difference to the operational costs of such a service.

gregable · on Sept 13, 2011

Maybe it's like the yahoo mail infinite mailbox. They just severely limit the bandwidth in/out, but storage is infinite. Given an infinite amount of time...

SODaniel · on Sept 13, 2011

I have to say that I find it a 'little' sad that the entire post reads like a Press Release, touting 'client side encryption' like something new and exciting and two thirds down it's revealed that CrunchFund is an investor.

Poor author didn't even know when she interviewed them.

esutton · on Sept 13, 2011

I'm confused, they mentioned at techcrunch disrupt that they are able to offer infinite storage because they are deduplicating data. Fine. But they also said that they will be doing client side encryption. Contradiction - encrypted data cannot be deduplicated.

patrickgzill · on Sept 13, 2011

What if in the header of the file they put an md5sum that is not encrypted?

I don't think that would represent a data leak, and you could then dedupe based on md5sums...

sp332 · on Sept 13, 2011

But only one of the two users could decrypt the block - useless.

downvoters: If I encrypt a plaintext, and you encrypt the same plaintext, we'll have different ciphertexts. If we detect that we encrypted the same plaintext, how do we deduplicate? (Also, I would consider that a leak.)

wmf · on Sept 13, 2011

I encourage everyone in this thread to read up on convergent encryption.

http://research.microsoft.com/apps/pubs/default.aspx?id=7421... http://www.ssrc.ucsc.edu/Papers/storer-storagess08.pdf

alenlpeacock · on Sept 13, 2011

I encourage anyone who is thinking of implementing convergent encryption to read "Convergent Encryption Reconsidered" from the tahoe guys: http://www.mail-archive.com/cryptography@metzdowd.com/msg089...

(tldr; there are serious security/privacy vulnerabilities with convergent encryption)

X-Cubed · on Sept 13, 2011

Client-side encryption does not exclude server-side decryption, ie: only encrypt it on the wire, not the storage device.

Like Dropbox.

esutton · on Sept 13, 2011

from their faq: Your Data Is Secure Bitcasa encrypts your data before it is sent to the cloud. It is actually impossible for Bitcasa to access any of your data for any reason.

sp332 · on Sept 13, 2011

"Client-side encryption" generally means that only the client can decrypt the data. Unfortunately, if you forget your key, you can't reset your password to recover your data.

xtacy · on Sept 13, 2011

> Contradiction - encrypted data cannot be deduplicated.

That's correct; there might be a middle ground that's yet to be explored and proved secure.

Check previous discussion here: http://news.ycombinator.com/item?id=2570538

lukesandberg · on Sept 13, 2011

wouldn't deduplicating encrypted data still work but your hit rate would just be a lot lower. you would have to work on a block instead of a file level but in theory you could dedupe file blocks that had been encrypted, because either way in the end its just bits.

Or is the assumption that with encrypted data you are so incredibly unlikely to see hits that its not worth doing...

esutton · on Sept 13, 2011

if a file is encrypted correctly, you should not be able to compress it or match it to the same file encrypted a second time.

lukesandberg · on Sept 13, 2011

I understand that, but couldn't two different files encrypted with two different keys theoretically share some identical blocks. Obviously this is way less likely than two unencrypted files sharing blocks but still it could happen.

count · on Sept 13, 2011

The hell it can't.

Block-level de-duplication can easily work for encrypted data.

on Sept 13, 2011

[deleted]

count · on Sept 13, 2011

While a different password / key would make the same data decrypt to different data, you don't necessarily have to do de-dupe on blocks of the encryption-cipher-size, so these values can easily overlap.

Blocks of data can be treated completely independently of the encryption. The block size is not necessarily the same as the encryption cipher block size (if you're using a block cipher).

Say, a chunk of 4000 different files all have a few disk blocks that contain the same pattern of data. You can store those few blocks once, instead of 4000 times. With encryption, this is still possible, but it's going to be slower, as the data segments will be more randomly distributed. It is still, however, possible and will result in savings.

esutton · on Sept 13, 2011

the problem with this theory is that deduplication works so well with file backup as the largest files we have are most often music and video, something that is highly replicated amongst file sharing users. i.e millions of users have the same Beatles album. Yet, when each persons album is encrypted it will be entirely different than every other persons album. If they were in any way similar it was not correctly encrypted. Thus the hit rate will be almost negligible. So than you say OK, but maybe the block level code of the encryption of my Beatles album is similar to the block level code of the encryption of my Eminem album, and than we can save storage there. The problem with this is that encryption algorithms aim to make their input files output as close to a random distribution as possible. Meaning not only are those two copies of the beetles albums going to be as different from each other, but they will be different from any other file. To sum it up, a well encrypted file should be uniformly random -> meaning it is uncompressible - > two different uniformly random files are still random relative to each other and when put together still cannot be compressed.

count · on Sept 13, 2011

At the 'infinite' scale of storage that they're discussing, you should still see a savings. It's not going to be 30% or 80%, but it will be savings. De-duplication CAN still take place.

esutton · on Sept 15, 2011

actually it should be close to nothing, encrypted data cannot be compressed as it is supposed to be uniformly random. If it can be compressed, the encryption algorithm is weak.

also on stage at tech crunch disrupt the founders were talking about the majority of the data be deduplicated

indrax · on Sept 13, 2011

The easy solution is to segment easily deduplicated data from 'private' data. Perhaps sufficiently deduplicated data doesn't count against your quota. (as amazon does with music)

evgen · on Sept 13, 2011

The problem with this is that the data which you want to de-dupe are the big files that everyone has a copy of; files like mp3s, videos, etc. Unfortunately these are the same files that users are most likely to not want the RIAA/MPAA to know about and keep private.

The short version of all of this is that Bitcasa can't actually pull off what they are claiming and still maintain any user security/privacy.

anrope · on Sept 13, 2011

They really pushed the "infinite storage" point in their presentation. But didn't they mention something about predicting which files to cache before they are requested? I would have liked to see more focus on that. Otherwise, it doesn't seem much different from dropbox, which is more established.

itsnotvalid · on Sept 13, 2011

We will see. Actually some backup companies also have such claims on storage (e.g. http://www.crashplan.com http://www.backblaze.com)

Well this is also backed by CrunchFund.... we would see (seen via TCDisrupt)

htp · on Sept 13, 2011

Er, I think you meant http://www.backblaze.com/ . The link you've got goes to a somewhat NSFW site.

itsnotvalid · on Sept 13, 2011

I guess somebody got it fixed for me, thanks.

rorrr · on Sept 13, 2011

Also

http://www.mozy.com

http://www.carbonite.com

itsnotvalid · on Sept 13, 2011

My last time visit on mozy I didn't see "unlimited" offerings.

amcintyre · on Sept 13, 2011

For, uh, small values of infinity? Or something?

I find that I don't run out of storage on my computers any more, so perhaps I'm not their target audience. Best of luck competing in that space, though--it seems pretty crowded already. :)

sp332 · on Sept 13, 2011

Yeah, it would be funny if the "service" is just to rent a Drobo attached to a PogoPlug :)

Wait, I should do that...

nddrylliog · on Sept 13, 2011

Just a reality check on your landing page: your #1 asset is a vimeo embed, which causes several problems:

  1) I'm on roaming 3G, I'm not paying that much to know what you're about
  2) Flash is blocked by default so all I see is a black square
  3) You're royally pissing on blind/disabled people.

Besides, the color scheme / general theming is generally sub-par and difficult to read.

The "Learn More" link is more informative, but still, I think you really badly need a designer.

LeafStorm · on Sept 13, 2011

From what I understand the concept is something similar to Dropbox, except instead of simply syncing files between the folders on different computers where all the files are physically present on each, it focuses on remote access (possibly with an AFS-like caching layer).

And by infinite, they probably mean "as much as you are willing to pay for," or, during the beta, "as much as you want until it gets ridiculous."

covercash · on Sept 13, 2011

I watched them present on the TC Disrupt feed. I wish one of the panelists would have asked them how the file system stream impacts data caps imposed by many ISPs.

joshu · on Sept 13, 2011

Shit, I should have asked that.

rorrr · on Sept 13, 2011

The problem with all these online storage services is uploading your data.

At my 1 mbit uplink it would take 97 days of non-stop uploading to upload 1TB. That's if my ISP doesn't ban me.

TheAmazingIdiot · on Sept 13, 2011

Ive got an "infinite stotage device" called /dev/urandom.

It does take a while to wait for your data to come back out :)

tianshuo · on Sept 13, 2011

Even if /dev/urandom gives 10Gbps, a 1kb file would take 2^1018 Seconds, which is 1.4E688 years. That's well over the length of the universe (1.3E10)