Hacker News new | past | comments | ask | show | jobs | submit login
WHAAAAT? Unlimited storage for $4.99 a month not a viable business model? (spideroak.com)
52 points by DanLar75 on Feb 3, 2011 | hide | past | favorite | 45 comments



The argument seems to be that average online storage needs are simply growing beyond what can be provided by a flat-fee unlimited plan.

I don't know if that's true, but there's something important the post doesn't address: the potential declining costs of providing online storage. Might the two not balance each other out for the foreseeable future?

I refer to this most excellent post by BackBlaze, which outlines how they do storage: http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-h...

While we might not see many additional leaps of over 90% reduction in cloud storage costs, I (a) wonder how much headroom such innovation bought Backblaze, and (b) whether their main costs, hard drive, will keep pace with user demands.


I think there is a difference between 'unlimited' and 'absurdly high.' Many hosting providers offer 'unlimited' storage but limit how fast you can upload -- this effectively caps their storage but makes it high enough for most users to never care. Instead of dubious 'unlimited' marketing I would prefer up-front pricing. GMail tells you exactly how much storage it offers for free with the idea that this is much too high for a normal user to touch. Much easier to deal with than an 'unlimited' service that breaks down under load.


I've backed up a few hundred GB with Backblaze. I think the implication being made here is unfair. I'm much more likely to believe any issues with my upload were caused by shady AT&T QoS.

You know the kind. Goto any speedtest site and somehow you're getting exactly 12MB down, 3MB up, but it never really seems to add up that way anywhere else. Even to my own office, which I know has an extra 100MB to burst, yet somehow my downloads from the data-center across town are more likely to end up in the 3MB range 99% of the time.


The likely reason Speedtest doesn't add up is because the file you are downloading to test with is hosted by your ISP.

Speedtest was originally made to test the 'last mile', assuming your ISP's connections to the rest of the world is always faster than that. In most cases that is true, but it is possible it isn't in your area.

As a particular clear example of that, here's my little write up on Speedtest.net usage in Rwanda: http://blog.nyaruka.com/stuff-0


The model I had in my head was web hosting providers with "unlimited" data storage vs. ones with 1TB of storage. Even though a normal user would probably get nowhere near 1TB, it gives the company a fallback on what 'acceptable' usage is. If the price ever does go down, they can always adjust the limit.


Explicit pricing is better for moderate users and worse for heavy users. "Unlimited" effectively subsidizes heavy users by overcharging moderate users. As a heavy user, I'm a fan of "unlimited".


It depends. When "Unlimited" service is sold with a Terms of Service that includes an "Excessive Use Policy" like Mozy's [1], the heavy user is operating in an ambiguous zone where continued use of the service at the advertised price is at the whim of the provider.

We've seen this with unlimited data plans with the cable ISPs; they're not really unlimited.

To me, it's similar to having a traditional credit card versus one with no pre-set spending limit. Which would you rather have? The latter has a limit, you just don't know what it is until you hit it.

---

[1] http://mozy.com/terms/ "....excessive use of the Service, which means usage over a given period far exceeds the average level of usage by users of the Service generally..."


> the heavy user is operating in an ambiguous zone where continued use of the service at the advertised price is at the whim of the provider

That suggests an interesting business model where: we sell everyone on 'Unlimited', then we wait to see who is the most egregious customer, and drop off the ones on the top of the pyramid (and keep the revenue positive customers).

Reminds me of how the cellular operators used to be towards roaming: sure, you can roam, just don't do it so much that we cancel your contract.


That's a separate issue - discounting the value of the service based on uncertainty over its continuance. But even within the limits of the unstated limit ("excessive use"), the heavier users are still subsidized by the light users.

You can still freeload up and until you hit the limit.


FYI -- The backblaze storage hardware is similar (at least conceptually--specifics differ greatly) to what all the big self hosted backup companies (Mozy, Carbonite, SpiderOak, etc.) do. Custom hardware and data center storage software is the minimum price of entry for the industry these days.

The issue here I think is that "unlimited" really isn't unmetered and it never really was. The industry has often seen that marketing gimmick accompanied with abundant exclusions, fine print, and far less ethical tricks like upload rate limiters, that effectively impose limits behind the scenes.


Costs of storage are of course declining rapidly, however so far the massive increase in available bandwidth and storage needs for customers is simply growing MUCH faster.

Part of the catch-22 here is also that with decreased storage costs for online storage providers (that increasingly are building their clusters with consumer component hard drives) keeps an even pace with decrease in cost for home-storage and the amount of data users are wanting to upload.


This is why I'm really liking Crashplan as a storage service; they offer the ability to use their software to backup to your friends' drives (and vice versa). It perhaps doesn't work out well for everyone, but I have enough friends that keep up with the latest hardware that I am assured of finding someone with which I can trade backup storage. (My alternate plan was dropping off a disk / picking up the old disk whenever I visited my relatives every couple of weeks!)


Please don't shout in uppercase in the titles. The guidelines are here:

http://ycombinator.com/newsguidelines.html

I realise the original article uses this title, but a little more peaceful when posted to HN would be nice.


I had about 50GB of backed up data with mozy for that last 3 years, growing from ~20GB when i started. I've been paying $5 per month for almost 3 years now, which means I have paid ~50 a year (very rough to take out transaction fees and bandwidth fees, etc). That should buy them at least 320GB of average sata disk drives per year (assuming all money going into storage).

This means for "unlimited" they can recoup 6x the storage i was using per year through my fees. So the question becomes why it's not sustainable? Too large a company and not sustainable due to employee salaries? Not economical enough storage prices (ie using enterprise SAS disks rather than cheap SATA)? I'm guessing since they buy large quantities of disks, they could get drives for even cheaper than what you get on NewEgg.

This is why I believe that this model _is_ sustainable, assuming that it's done right. Also why I switched from Mozy to Backblaze because I felt that Mozy was gouging me by taking away their unlimited plan and replacing it with a tiered plan.


Just playing devil's advocate, since I largely agree with you. I do think you are missing 3 key costs: redundancy, power (including cooling) and people.

I actually don't think bandwidth is in the same ballpark as these three items.


I'd stick with mozy, but 50GB is too small. I have about 89GB in Mozy right now and if I could get that for the $5 per month, I wouldn't go anywhere. Mozy is pretty slow, so backing up even more data isn't very practical. Also, their system doesn't handle large files very well. My 89GB is just pictures, personal movies, documents, and source code.


Have you ever tried a restore from Mozy? I backed away from them when I did. Apparently, there is no 'resume' functionality when restoring. That means putting down a laptop to restore 50 gb, have a connection failure or needing to move the computer from office to home after 30 gb, and then needing to restart from scratch. That make me go %( - and I still haven't been able to cancel my Mozy account, after 2 months, because despite what the documentation says there is no 'unsubscribe' link in your profile and the customer support seems to be an email black hole.

Just venting I guess, but beware if you are trusting your data to Mozy...


I have less pictures and data. I am using Google Storage. It costs very less ( $5 per 20GB per Year).


It is a good thing that hard drives connect to the Internet by themselves and do not require rack space, servers, power or bandwidth.


Given the change Mozy just instituted, my backup costs are going to go from $4.99/month to $23.99/month. This is an intolerable jump, regardless of the reasons they have for it. I don't expect something for nothing, but bait-and-switch is BS.

Looks like I'm in the market for a new backup solution.


If you are a Dropbox Pro user I think you get the packrat addon, which allows for unlimited history / undeletes. So, while the amount of files you can keep in your DB is limited, the amount of data that DB has to keep up with for you could get very large.

I don't see that they have anything to counter this in their model, and it kind of worries me that if people abuse this then they will remove the feature for all users, and I like my unlimited revisions.


I would say that Dropbox (being a really super company with a great product) has issues that are actually even larger than the storage cost.

Dropbox is based on Amazon S3, which means that not only do they have storage costs to a 3rd party that are so to speak 'out of their control' but they are also dealing with bandwidth and transaction costs.

I wish them all the best, however I can imagine this being quite expensive for them considering the amount of free users they have.


As the Dropbox clients talk to Dropbox servers instead of directly to S3, Dropbox can just transparently migrate their data to a storage facility of their own when they decide that that has become cheaper than S3.


This left me wondering; wouldn't use of deduplication storage backend (like http://en.wikipedia.org/wiki/Venti) lessen the incremental cost of servicing each new customer?

Couldn't it be well expected that, encrypted data aside, files with same content are often used by more than one persona?


They are already doing some deduplication. From https://spideroak.com/whyspideroak :

> Greatly reduce backup & sync time through comprehensive compression and advanced de-duplication (saving you time)

> You are only charged for the compressed de-duplicated data amount (saving you money)

Still it is not clear if they do cross-user deduplication, but I think it is very unlikely because all the content is encrypted with an user-specific key, which I think they don't have access to.


They don't do cross-user deduplication, yes. They only deduplicate data that belongs to you.


I'd expect most people use these services for backups of business documents, which are almost guaranteed to be unique.

I seem to recall that Dropbox (or another well known online storage startup) implements this strategy. Maybe it works.


Which will be tiny.

Videos, audio and photos and game media will take the bulk of the space. Of those - only photos and a proportion of the videos are likely to be totally unique to a user.


I doubt that's true for Dropbox, at least.

Consider that Dropbox gives you 50gb for the basic plan. I'm guessing most people don't back up videos, games or their OS using that space, but rather back up their documents, projects they're working on in whatever field, photos, and music.

Of those, only with music is there a chance to use deduplication, and that's assuming you can figure out two music files with different ID-tags are the same.

(Come to think of it, in my Dropbox, music easily takes up 70% of my quota, so maybe is is worthwhile after all.)


You need a decent hash of every file anyway (to check for changes etc) so it's pretty trivial to deduplicate. I don't think you'd need to do stuff like check the ID-tag.


But then my music files, which I edit the id-tags for, will show up as different than other people's when hashing.

It would be interesting for Dropbox to release numbers on how many music files are identical between different people.


I believe they could (and probably do?) de-dupe at a lower than file level to handle this issue.


Good point. The ID3 tag is probably only in the header anyway. They'll just do it at a block level.


I didn't read the article but wanted to comment on spideroak:

Really love their "realistic" pricing model, even cheaper with a .edu email address.

Had a lot of problems with CPU usage, may have been the thousands of files in my .git directories...

This leads me to support, they have been overwhelmed and it has been difficult getting then to review my logs. they gave me multiple months free due to my non usage but I decided to cancel when I found arq for mac.

I asked then to cancel my account and give me a years credit so I can give it a try in the future and they credited my account for a year... pretty cool.

Wish them the best of luck!

Written from my mobi...


I get horrible CPU usage as well. It's to be expected, I guess, since they have to encrypt data, but what is the encryption code written in, Python?

Also, it's slow to update things. You wouldn't expect this, given that Linux has inotify, but it is.

Their support is rather bad, I've emailed them about legitimate bugs and high CPU usage and SpiderOak not syncing and a whole lot of things, but they never credited me anything. I decided to buy it for a year to back my photos up because my disk is making weird noises and it took them three days to reply to my "PayPal won't let me pay from my balance and I don't want to add a credit card" email, to tell me to add a credit card.

I replied "yes, I don't want to add a credit card", and they haven't replied since I sent it three days ago. With that sales support, I wonder how they sell any copies.


Apologies for the support delay. As mentioned above, we have seen overwhelming growth lately and are training support staff to scale up right now. Feel free to mail me directly if I can help.

For inotify, the biggest limitation we run into on Linux is that the default system configuration limits a user to watching a relatively small number of folders (6,000 I think, and that includes all subfolders recursively.) You can change this in sysctl if you like, and we may add this change to future packages. In case your curious, the SpiderOak directory watchers are tiny C programs for each platform, and are open source.

FYI -- We've definitely seen high CPU use when syncing hundreds of thousands of small files (source code etc), but this has been greatly improved in the latest beta, which just went out yesterday.

Thanks very much for the feedback.


Thanks for replying, in the end I sent the money to three PayPal accounts consecutively before relenting and adding a card. However, your person did contact me after I wrote the comment.

I don't know about hundreds of thousands of files, but I copied a Django app and CPU usage has been at 100% for a few minutes now. In the end, I closed SpiderOak and I'll turn it on when I'm done.

By the way, do you keep an entire backlog when syncing? I mean, if I use the software for a year and then need to add another computer, will I need to download a year's worth of changesets to get it up to speed?


Pardon the somewhat OT question, but do any of these services offer standard protocol support so that any OS can store data? I know DropBox supports Linux, and some services support OS X, but I haven't seen anything generic enough for my FreeBSD workstation, though the concept of cloud storage sounds great.

Give me NFS or SMB access if you must, but I'd love to get in on this cheap consumer cloud storage thing without resorting to lame hacks like using VMs or emulation due to lack of native access for my platform.


FYI, the reason the big backup providers don't do "standard protocol access" is because it's actually far more expensive to provide.

Take for example, the case of backing up a folder full of files using rsync over ssh, vs. using the SpiderOak client.

Every time you run a backup job, rsync must examine the local folder _and_ ask the server to examine the remote folder, so it can make conclusions about what needs to be transferred. In short, to do a new backup (a write operations) many reads are also required. Furthermore, those reads tend to be non-sequential (seeking to a bunch of different inodes to stat files, etc.)

If you compare that to the SpiderOak client, it already has a near real-time accurate database of exactly what exists on the server. There's no need to burden the server with a bunch of disk seeks (or any actually) to assess what needs to be done. In short, the backup operation can be accomplished by the server using mostly sequential IO, writing only, because of this added intelligence in the client.

Aggregated across a large population of users, this difference in usage patterns greatly influences the hardware requirements and therefore the cost per GB.

...and by the way SpiderOak will run on just about any platform that Python will, with or without a GUI.


Strato claims to offer WebDAV and CIFS support on HiDrive:

http://www.strato-hosting.co.uk/online-storage-hidrive/index...

(not a Strato customer myself)


Crash plan runs anywhere there's a java vm. Even headless.


rsync.net might still be around...


spideroak has a cli linux client.


What is your experience with online backup solutions in general?

I tried mozy and a few others, but I always struggled with rather slow upload speed, and ultimately found the much easier way is to buy USB harddisk, which I hide in my workplace and just bring it home once per month to make backups. (I store online only a few files I am actually working on, using Dropbox).


I think this (and SpiderOak in general) is pretty instructive for a company starting out and looking to find a way to generate revenue. I plan to roll out a service (unrelated) later this year and will be looking at SpiderOak for a compelling delivery/pricing model.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: