I don't care how smart your technology is, representing your product as "infinite storage" is inaccurate and misleading. There is obviously no such thing as infinite storage on a finite device, and saying you don't have to use a "clunky web service" implies it's not a web service, when it obviously is.
I respect their need to find a sexy way to sell a cloud-storage-and-sync product, especially since it's an increasingly crowded space, but this seems a bit sleazy.
Well the "You are at the back of the queue, but post this link into your twitter or facebook stream and get your beta invite sooner" message after signing up for the beta didn't help with the sleazy/scammy vibe, that's for sure.
They never mentioned it was free. So it's probably pay-for-what-you-use amazon backed storage. I haven't seen anyone complain about Jungledisk using the "Unlimited cloud drive storage" tagline.
what would you call a service that allows you to upload an unlimited amount of files then? I honestly felt it sounded like most of "unlimited" plans - backblaze, yahoo inbox, amazon prime, etc
My problem is not the "unlimited" part so much as the way it obfuscates the fact that it achieves this by storing your files online. To a technologist it's obvious what's going on; to the average user watching this video, I don't think it's clear that this product requires Internet connectivity to function -- in a couple places, it even appears to suggest that it doesn't, e.g. the "clunky web service" line.
Understood, but I think that everyone save the most novice of all users understands that any subscription service they are paying for is not a piece of hardware. And therefore, since you cannot increase storage on a harddrive, it must go somewhere other than your computer, which implies a connected network. Am I being too optimistic?
When reading this so many red-flag warnings popped up that I could not decide if they were deluded or frauds. I will give them the benefit of the doubt and simply guess that they are deluded regarding the storage savings they will eventually realize via de-dupe. Given that the founding team includes a bizdev guy, a marketing guy, and a sysadmin/ops guy I am guessing that they can put together a nice pitch and powerpoint, have some easy answers to the operational problems of competing in this space, and have a lot of hand-waving answers to the harder technical problems. The latter problem will probably end up killing them unless we are just seeing a PR dump in preparation for a pivot to being just another online backup company.
One major fly in this whole "de-dupe" claim is that it will probably not even work out for them even if they did have some magic sauce to dance around the de-dupe/crypto conflict others have noted. The problem is that the files which users actually care about and want to back up are not the common files but the ones which make their data unique; it is not the mp3s or hollywood videos that matter, it is the data/content that each person has created that matters. If my disks crashed, my online and offline backups were corrupted, and I needed to rebuild my system I could get the common OS files in a hour, the mp3s and videos would take a few weeks of passive torrenting, but the pictures, home videos, and personal documents would be gone forever. It is these files that matter for a backup services, and they are not going to be something that you can de-dupe even if they were not encrypted.
Back when the term "de-dupe" did not exist and convergent encryption was something we were inventing before the term existed the thought was that a backup service employing these techniques would enjoy a massive savings in storage costs -- it turned out that people cared less about backing up the data that was easy to de-dupe and original data was a much larger portion of what users uploaded than we expected. This was back when pics were a meg or two and personal video was low bitrate, now that even a mobile phone is dumping multi-meg pictures and you can get a HD video camera for a hundred bucks I cannot imagine how anyone would convince themselves that de-dupe is going to make any significant difference to the operational costs of such a service.
Maybe it's like the yahoo mail infinite mailbox. They just severely limit the bandwidth in/out, but storage is infinite. Given an infinite amount of time...
I have to say that I find it a 'little' sad that the entire post reads like a Press Release, touting 'client side encryption' like something new and exciting and two thirds down it's revealed that CrunchFund is an investor.
Poor author didn't even know when she interviewed them.
I'm confused, they mentioned at techcrunch disrupt that they are able to offer infinite storage because they are deduplicating data. Fine. But they also said that they will be doing client side encryption. Contradiction - encrypted data cannot be deduplicated.
But only one of the two users could decrypt the block - useless.
downvoters: If I encrypt a plaintext, and you encrypt the same plaintext, we'll have different ciphertexts. If we detect that we encrypted the same plaintext, how do we deduplicate? (Also, I would consider that a leak.)
from their faq:
Your Data Is Secure
Bitcasa encrypts your data before it is sent to the cloud. It is actually impossible for Bitcasa to access any of your data for any reason.
"Client-side encryption" generally means that only the client can decrypt the data. Unfortunately, if you forget your key, you can't reset your password to recover your data.
wouldn't deduplicating encrypted data still work but your hit rate would just be a lot lower. you would have to work on a block instead of a file level but in theory you could dedupe file blocks that had been encrypted, because either way in the end its just bits.
Or is the assumption that with encrypted data you are so incredibly unlikely to see hits that its not worth doing...
I understand that, but couldn't two different files encrypted with two different keys theoretically share some identical blocks. Obviously this is way less likely than two unencrypted files sharing blocks but still it could happen.
While a different password / key would make the same data decrypt to different data, you don't necessarily have to do de-dupe on blocks of the encryption-cipher-size, so these values can easily overlap.
Blocks of data can be treated completely independently of the encryption. The block size is not necessarily the same as the encryption cipher block size (if you're using a block cipher).
Say, a chunk of 4000 different files all have a few disk blocks that contain the same pattern of data. You can store those few blocks once, instead of 4000 times. With encryption, this is still possible, but it's going to be slower, as the data segments will be more randomly distributed. It is still, however, possible and will result in savings.
the problem with this theory is that deduplication works so well with file backup as the largest files we have are most often music and video, something that is highly replicated amongst file sharing users.
i.e millions of users have the same Beatles album.
Yet, when each persons album is encrypted it will be entirely different than every other persons album. If they were in any way similar it was not correctly encrypted. Thus the hit rate will be almost negligible.
So than you say OK, but maybe the block level code of the encryption of my Beatles album is similar to the block level code of the encryption of my Eminem album, and than we can save storage there. The problem with this is that encryption algorithms aim to make their input files output as close to a random distribution as possible. Meaning not only are those two copies of the beetles albums going to be as different from each other, but they will be different from any other file.
To sum it up, a well encrypted file should be uniformly random -> meaning it is uncompressible - > two different uniformly random files are still random relative to each other and when put together still cannot be compressed.
At the 'infinite' scale of storage that they're discussing, you should still see a savings. It's not going to be 30% or 80%, but it will be savings. De-duplication CAN still take place.
actually it should be close to nothing, encrypted data cannot be compressed as it is supposed to be uniformly random. If it can be compressed, the encryption algorithm is weak.
also on stage at tech crunch disrupt the founders were talking about the majority of the data be deduplicated
The easy solution is to segment easily deduplicated data from 'private' data. Perhaps sufficiently deduplicated data doesn't count against your quota. (as amazon does with music)
The problem with this is that the data which you want to de-dupe are the big files that everyone has a copy of; files like mp3s, videos, etc. Unfortunately these are the same files that users are most likely to not want the RIAA/MPAA to know about and keep private.
The short version of all of this is that Bitcasa can't actually pull off what they are claiming and still maintain any user security/privacy.
They really pushed the "infinite storage" point in their presentation. But didn't they mention something about predicting which files to cache before they are requested? I would have liked to see more focus on that. Otherwise, it doesn't seem much different from dropbox, which is more established.
I find that I don't run out of storage on my computers any more, so perhaps I'm not their target audience. Best of luck competing in that space, though--it seems pretty crowded already. :)
Just a reality check on your landing page: your #1 asset is a vimeo embed, which causes several problems:
1) I'm on roaming 3G, I'm not paying that much to know what you're about
2) Flash is blocked by default so all I see is a black square
3) You're royally pissing on blind/disabled people.
Besides, the color scheme / general theming is generally sub-par and difficult to read.
The "Learn More" link is more informative, but still, I think you really badly need a designer.
From what I understand the concept is something similar to Dropbox, except instead of simply syncing files between the folders on different computers where all the files are physically present on each, it focuses on remote access (possibly with an AFS-like caching layer).
And by infinite, they probably mean "as much as you are willing to pay for," or, during the beta, "as much as you want until it gets ridiculous."
I watched them present on the TC Disrupt feed. I wish one of the panelists would have asked them how the file system stream impacts data caps imposed by many ISPs.
I respect their need to find a sexy way to sell a cloud-storage-and-sync product, especially since it's an increasingly crowded space, but this seems a bit sleazy.