Hacker News new | past | comments | ask | show | jobs | submit login
We will no longer use the phrase “zero knowledge” to describe our software (spideroak.com)
228 points by remx on March 12, 2017 | hide | past | favorite | 127 comments

I still won't trust SpiderOak with my data. Their service is unreliable and slow, their client is horrible to work with and their support is disgraceful.

I'll post my usual story:

In February 2016, SpiderOak dropped its pricing to $12/month for 1TB of data. Having several hundred gigabytes of photos to backup I took advantage and bought a year long subscription ($129). I had access to a symmetric gigabit fibre connection so I connected, set up the SpiderOak client and started uploading.

However I noticed something odd. According to my Mac's activity monitor, SpiderOak was only uploading in short bursts [0] of ~2MB/s. I did some test uploads to other services (Google Drive, Amazon) to verify that things were fine with my connection (they were) and then contacted support (Feb 10).

What followed was nearly 6 months of "support", first claiming that it might be a server side issue and moving me "to a new host" (Feb 17) then when that didn't resolve my issue, they ignored me for a couple of months then handed me over to an engineer (Apr 28) who told me: "we may have your uploads running at the maximum speed we can offer you at the moment. Additional changes to storage network configuration will not improve the situation much. There is an overhead limitation when the client encrypts, deduplicates, and compresses the files you are uploading"

At this point I ran a basic test (cat /dev/urandom | gzip -c | openssl enc -aes-256-cbc -pass pass:spideroak | pv | shasum -a 256 > /dev/zero) that showed my laptop was easily capable of hashing and encrypting the data much faster than SpiderOak was handling it (Apr 30) after which I was simply ignored for a full month until I opened another ticket asking for a refund (Jul 9).

I really love the idea of secure, private storage but SpiderOak's client is barely functional and their customer support is rather bad.

If you want a service like theirs, I'd suggest rolling your own. Rclone [1] and Syncany [2] are both open source and support end to end encryption and a variety of storage backends.

[0]: http://i.imgur.com/XEvhIop.png

[1]: http://rclone.org/

[2]: https://www.syncany.org/

After looking through our records, I think perhaps you might mean 2015 instead of 2016. I took over the company that year and some things have changed. At the time customer satisfaction ratings were around 84%, and we're in the high 90s now. If I've found the correct case, we did at least suspend billing when the issue started and eventually issued a full refund.

I'm sorry we weren't able to determine the cause of the slowness for you. Troubleshooting an end-to-end encrypted product is hard because you can't just see everything that's happening by looking at the server. We have seen ISPs deny or aggressively throttle connections to our destination networks. Palo Alto firewalls classify traffic to SpiderOak as an online backup service and often block it outright or put it at least priority. I'm not saying that these were necessarily the causes in your situation.

SpiderOak keeps improving and the 2017 road map is action packed. If for some reason you would ever like to try SpiderOak again on me, you're welcome to contact me directly, or write to support@spideroak.com where these days we do a pretty good job of taking care of everyone. Otherwise I'm glad you've found backup solutions you're happy with. Cheers!

Thanks a lot for the response, it's nice to hear from you and it sounds like you've done a lot to improve, just from the numbers.

> I think perhaps you might mean 2015 instead of 2016.

Looking back, yeah, you're right. Sorry about that.

> If I've found the correct case, we did at least suspend billing when the issue started and eventually issued a full refund.

Yep, that sounds like me.

> If for some reason you would ever like to try SpiderOak again on me, you're welcome to contact me directly, or write to support@spideroak.com where these days we do a pretty good job of taking care of everyone.

I've downloaded the latest client and I'm running it on OS X right now. There doesn't appear to be any change. The client just sits for long stretches of time doing absolutely nothing, just like it used to (no disk activity, no CPU, no network activity). The UI shows a bunch of pending "actions" which the logs show the server has already confirmed. I'll gather some more details and email you the results tomorrow at some point.

EDIT: I've tested with a dedicated server with a gigabit Hurricane Electric pipe (notably, HE peers with WANSecurity, the AS that announces SpiderOak's IPs) and I'm able to get 80Mb/s. On my 30Mbit Comcast connection at home, I'm able to get around half speed (though I'm able to saturate it to other services). I've emailed you more details but this is more for the benefit of others.

While I'd prefer the ability to saturate my pipe completely, this is good enough that it's no longer a huge problem for me.

Have you tried connecting through a VPN? That could potentially rule out ISP throttling.

I've connected on a colocated server that's on a switch with a gigabit pipe from Hurricane Electric, which peers with SpiderOak's ISP. This should rule out ISP throttling completely, as the only ISPs involved are transit providers. I've only ever heard of throttling from consumer ISPs. I might try from a server with premium bandwidth from GCP or something later though.

Hey, just wanna say I appreciate the time you've taken to reply, even though you're putting yourself out there. It would be really great if more CEO's and personnel could come here to have honest discussions and respond to information about things they're particularly knowledgeable about.

> Hey, just wanna say I appreciate the time you've taken to reply, even though you're putting yourself out there.

As a customer / user, it's always good to have additional feedback channels.

> It would be really great if more CEO's and personnel could come here to have honest discussions and respond to information about things they're particularly knowledgeable about.

If it's at the expense of having an actual support channel then that's be terrible. More and more the only way to get a response from a companies is to make a stink on some form of social media (including HN).

I'm not saying SpiderOak is in that category (haven't used them so can't say). But responding to Q&A about your product on HN is not a substitute for responding to email, chat, phone, or having a feedback forum on your own site.

Hey rarrrrrr, looks like it might be time to update the about box on your hacker news profile. :-)

> a co-founder at SpiderOak: A "zero knowledge" encrypted, space efficient, multi-computer, perpetual offsite backup, sync, and sharing solution

Done, thank you! We'll probably be playing whack-a-mole on this for the next couple months...

The "Welcome to SpiderOak" email that users receive when they first signup for the service mentions the "Zero Knowledge Technology".

Seems like you're doing hard and honest work. One thing remains odd to me, the slow communication part. Do you need more people to handle customer relationships ? or was it difficulties in admitting technical issues to them that caused such delays ?

I don't know how I would react, but from a comment I think I'd like being told soon, even if it was a failure (in that case it's not even clear if you were failing or if it was some system in the middle) from a company I paid. I'd be less angry after 3 days with a disappointing news than after a month.

Then it's just me; I don't know how other feel about this. I'd be curious to know though.

Any plans to support upload from mobile devices?

Thats my main issues with spideroak.

Let me see if I have this straight. In response to a story from a customer, you went through all of your records and then shared information about the case in public.

I'm glad your customer satisfaction scores are higher, but I'd rather not share any of my information with you. I'm not the op, but if I was, I would feel rather violated.

> I'm not the op, but if I was, I would feel rather violated.

Well then don't choose a public forum to vent a complaint, and since you're not the OP it doesn't matter anyway.

Turnabout is fair play: if you go into a thread about a product and make a strong play that their customer service sucks and then an officer of that company steps in and does what they can to see if there is a way to re-engage you on their dime that's about as good as you could possibly expect. On top of that he did not volunteer any info that wasn't already in the OP's post except for a possible correction of the date.

So you're wrong, twice.

FWIW absolutely no affiliation whatsoever with Spideroak.

Thank you for the feedback. I'm sensitive to this issue too.

For what it's worth, we do have very strict policy about what information regular customer service staff can access or share publicly (i.e. none in most cases) and how someone who calls in must establish beyond a reasonable doubt that they really are the original customer before we will communicate with them. We're very careful about allowing any customer data to exist in 3rd party systems. We don't even use Google Analytics. https://medium.com/@mccamon/yeah-we-ditched-google-2fa644578...

That said, many people also expect customer support delivered over Twitter so some flexibility is required. In this case I made a judgement call and decided to answer.

All he said was that they refunded the guy, seems like information that is pretty OK to share publicly especially in response to the claim that their customer support was terrible. In addition to that the person who commented shared a large amount of information about the support ticket themselves so it seems like they are fine with sharing at least that much information about this case.

All-in-all this hardly seems like a reason to feel "violated"...

I'm the customer in question and I'm totally fine with what he said. Nothing personal was disclosed, only that SpiderOak wasn't able to find the issue (which I said myself) and that billing was suspended (I neglected to mention) and I got a refund (did mention).

If he'd named my ISP, location, address or something like that I'd say it's inappropriate but I think this is totally find and I actually appreciate that he went to the trouble.

Erm, they continued a public conversation about the customer that the customer publicly started themselves regarding customer service, and they kept the details to the delivery of the customer service and discussed the experience all customers were probably having at the time. Context matters.

What private information about the user was shared?

I've been a SpiderOak customer since 2011. Long story short, I agree that performance is slow to the point of abysmal. What I will say though is I have yet to find any alternative that is a complete package solution that comes even close to matching what SpiderOak currently offers. On servers I tend to use etckeeper + tarsnap, but on the go traveling around the world I need a client-focused cloud backup solution with serious security, and frankly nothing else exists that I've found.

If you have a realistic option (e.g. a fully supported solution) from another provider I'm all ears, but frankly SpiderOak has been good in that they have lost 0 data for me and have been able to successfully restore several times. If they could just double or triple the speed for uploading it'd be a huge boon to me, but realistically most of the places I've gone in the world being able to upload 2MB/s is faster than my local Internet is capable of.

Have you looked at Arq? (Blatantly assuming you use Mac or Windows.)


I've looked into Borg/Attic as well. My main hang-up is where to put the repository and what my long-term storage costs would be. Right now I have somewhere in the neighborhood of a terabyte of stuff backed up (mostly my Lightroom catalog). I need to be able to generate and access these backups from anywhere in the works reliably.

I think Borg would be a great solution for my use at home where I have a NAS on my LAN to provide local storage for the repository. But while traveling the world with my MacBook it doesn't cut it.

Arq looks good from that angle but I think would quickly cause my costs to overrun me. Having good backups is what gives me the confidence as a photographer to overcome my inner pack rat. Without good backups I'd never be able to ruthlessly delete images from my collection that don't exceed "okay" into "great". But because I know my images are backed up effectively forever I can really be honest in my art.

Arq can write to Amazon Cloud Drive, which has unlimited storage for $5/mo.

I personally use Arq against Google Cloud Storage. I have around 1TB, and I use it to back up several external drives in addition to my MacBook. (Arq is nice in that it backs up the drive if it's connected, but won't prune olds backups if it's not, unlike services such as Backblaze.)

I think I pay around $6-7/mo. I use a coldline bucket, which is the cheapest kind of bucket. Restoring is super fast (unlike Amazon Glacier, which I used previously) and not too expensive.

Correction: "Unlimited" storage. Chances are high that you'll find yourself getting nasty emails (or worse) if you actually try to put a significant quantity of data up there.

I wouldn't risk my amazon account on it, personally.

Amazon does aggressively monitor and shut down accounts that store pirated content, but I've yet to hear any reports of Amazon cracking down on the amount of storage used.

There are plenty of people who are storing absurd amounts of data [1]. If Amazon were cracking down on this, you'd hear it in /r/DataHoarder first.

The ToS [2] don't mention any size limitations. In fact, it says you can store pretty much any file as long as the data doesn't violate any laws, including copyright.

The only really worrying part is this:

    3.2 Usage Restrictions and Limits. The Services are offered
    in the United States. We may restrict access from
    other locations. There may be limits on the types of
    content you can store and share using the Services,
    such as file types we do not support, and on the 
    number or type of devices you can use to access the
    Services. We may impose other restrictions on use
    of the Services.
[1] E.g. https://www.reddit.com/r/DataHoarder/comments/54bci8/anyone_..., https://www.reddit.com/r/DataHoarder/comments/3zhowv/amazon_...

[2] https://www.amazon.com/gp/help/customer/display.html/?nodeId...

I think Amazon drive is Amazons way to utilize unused storage and old storage capacity from S3.

I use that product as well, and I think it is severely underrated for the professional and poweruser. From the top of my head:

- Client side encryption, with my key (and trust that Arq doesn't phone home). If that is an issue just pre encrypt the data before Arq backs it up.

- Use a storage provider of my choice, i.e. I'm confident that it is secure with Amazon S3 - no unknown shortcuttings.

- Multiple destinations possible.

- Open source tool capable of decrypting the data.

I use Arq too, however, there are some limitations:

- Scanning of new / changed files takes could be faster and less demanding for the file system (on my Macs, Finder sometimes temporarily freezes when Arq is scanning for new / changed files)

- Mail reports for '0 errors' (all the time, so the mail reports for errors only are rather useless)

- Running in user context only, i.e., if you log out while your Mac is still running, the backup will not continue

- Loading of existing backups (reading / caching index) tends to be slow (and sometimes hangs)

- GUI does not scale up for many existing backups

With that being sad, you listed the major reasons to use Arq and all in all, I am a happy Arq user.

I learned my lessons with Dropbox, but couldn't find anything as "simple", secure with the bonus of being Canadian. So I eventually found and switched to SpiderOak but the speeds and Mac client are truly terrible, even at the end of 2016 and start of 2017. I couldn't even complete half a backup. Small files are fine but large files were impossible.

Switched to Sync.com a Toronto based company, with only Canadian servers. That means a lot to me as I've tried to remove myself from storing on American servers (just out of good practice on my part), and the service and support has been spectacular. I tried other options as well and this won me over.

> I still won't trust SpiderOak with my data. Their service is unreliable and slow, their client is horrible to work with and their support is disgraceful.

Unreliable and slow completely describes my experience with both Backblaze (lost the data for one of my drives which unfortunately I learned after the drive failed) and Crashplan (the Java client brings my rMBP to a crawl when it runs). There has got to be a decent online backup service that's reliable, reasonably priced, and respectful of resources out there somewhere.

Two that have been recommended to me recently that I haven't tried yet but would like to are Arq Backup [1] and Tarsnap [2].

[1]: https://www.arqbackup.com/

[2]: http://www.tarsnap.com/

As a counter point my experience with Backblaze has been perfect. Been using it since it first came out I think, backs data up relatively fast, and when my drive failed they shipped me a USB drive with all of my data in a day or two (and I believe it came from US > UK so that's pretty impressive). At $5 a month for unlimited I haven't even thought of switching. I believe - although I could be wrong - they also encrypt the dat and give you the option of managing the keys so they don't have access.

Just clarifying on that encryption bit - they do indeed encrypt the data, but when restoring (e.g. logging into the web view & selecting files to restore) you're handing over your passphrase in plain text, as the decryption is done on their side.

Also if you need to do an "extreme" restore beyond what's available in the web app, Backblaze requires you to give them your password in plaintext via email and if you use a private key to additionally either share that in plain text via email or disable it. Of course, I changed to a temporary password but nonetheless this was an extremely concerning experience to me as a security-conscious person. Backblaze is great until you actually have to do a restore. They are a huge company and could use secure channels if they wanted to.

Most of the issues with Backblaze revolve around having multiple drives / external drives. Also, they give a very incomplete backup (examine the excluded files closely), for example, overall across my drives Backblaze only actually attempts to back up a little over half of my data.

That you can change in preferences, right? And it usually excludes, by default - changeable, system files and the sort.

(Used Backblaze for one month happily during trial period, decided not to switch because of their, imho ridiculous, file/version retention/deletion policy. And an unhappy Crashplan customer for ~5 years actively looking around for similar feature set but sanely usable client and and to switch to. Arq doesn't fit the bill. I wish Backblaze wouldn't delete the files/versions/disconnected drives that soon if not not ever. And also website only restore actually is funny)

Most of the excluded defaults cannot be changed / are read-only in the preferences. System files are some of it, but it includes others as well, such as all Applications. By default it also includes all ISOs and DMGs (this one can be changed but IMO is an insane default) — I store quite a bit of important info in encrypted DMGs.

Using a private key and 2FA should IMHO be the default setting.

(I of course understand that Backblaze wants to make its service as easy as possible, to they have other defaults.)

I use arq, and I can highly recommend it. It's definitely fast, but more importantly, it has an agent that works in the background, has a great UI, is highly configurable, unintrusive and light on the CPU load. I use arq with Amazon S3/Glacier, and the monthly bill is in the cents.

Since you're comfortable with looking into Tarsnap, I'll recommend Borg:


A GUI, even a dead simple one - in fact preferably a dead simple one, that comes with some cron job or other ways to make it run all the time (file watching?) or as an optional feature would help a lot of people (including me) to switch to such excellent open source options.

I chose to replace duplicity with borg backup because I needed to backup a headless server and I found borg pruning to be much simpler to script than duplicity.

`borg prune -v --list $REPOSITORY --prefix '{hostname}-' --keep-daily=7 --keep-weekly=4 --keep-monthly=6`

Now, if I just had laptops to backup I would have chose deja-dup over borg because I could have configured it in seconds.

A nice project would be to code a TUI front-end to borg init/create/prune/check commands that is as quick to setup as deja-dup and automatically adds systemd units or cron jobs to the system, with an alert triggered when something fails.

Yes I tried backupninja with borg but encountered some problems with it.

as far as free alternatives, i saw no mention yet of borg[0]/borgmatic[1], which i quite like -- think 'rsync with encryption and pruning'.

[0] https://borgbackup.readthedocs.io/en/stable/

[1] https://github.com/witten/borgmatic

The problem with Borg is that it requires an actual mounted FS or SSH (with a live mounted FS on the other end). You can't just point it at offline storage.

Totally a great option if you do have live storage to backup to but personally I don't have a full server worth of disk spare at any given time.

Ignore. Deleted. I thought the linked-to article was from a third party. Please enjoy HN frontpage without our butting in.

I totally agree. Just a couple of months ago I had to dump them. Support was awful. This was my experience:

When I purchased my SpiderOak account everything seemed to be going well. The service is twice as expensive as BackBlaze and CrashPlan but has better security. The software UI is also quite intuitive. The only problem was the support. A couple of weeks into my subscription my client died. I’d try to restart it and it would die again. I looked into the problem and then I emailed support at SpiderOak. Then I waited. Then I got an email from someone at SpiderOak saying that an ad campaign really paid off and they were signing up a lot of clients. They’d get back to me.

They never did.

I sent other emails. Sent my log files in. I was very, very polite. Over a week went by. Finally, I asked told them I’d really rather not work with them any more.

More time went by.

Finally, someone got back to me. They gave me instructions on how to cancel my account. They said they were sorry but they understood. Then they also told me how to fix my original problem. I would have stayed with SpiderOak if they had done something, anything more than that. I realize margins are tight but surely they could have thought a little bit about how to give me some confidence in their support. Instead, I was left with the distinct impression that their support staff is either chronically understaffed, under qualified or just doesn’t give a damn. Either way, it made me realize that this was not a company I could count on if something really went wrong.

Ignore. Deleted. I thought the linked-to article was from a third party. Please enjoy HN frontpage without our butting in.

Is info@ the best email address to reach you? I didn't get a reply when I tried a few months ago.

Yes, info@. Check your spam folder - I can't imagine we'd miss an email there ...

Either way, send an email today - we'll be watching.

For rolling your own, I would also highly recommend https://github.com/restic/restic

Note: no support for compression. https://github.com/restic/restic/issues/21

I don't buy remote hosted storage, so when I saw $12/month for 1TB I was like "whoa, that's way too cheap to be reliable". Then I looked up other services, and some are as low as $4-$5/month for "unlimited" data, with some others being significantly more expensive.

It seems to me these providers are basically a web hoster that specializes in storage, which means 24/7 support, very high uptime, capital investment in infrastructure and ongoing maintenance and planned upgrades, and technical specialization for writing and supporting custom software. Many web hosters get away with low prices by not guaranteeing data integrity and putting a premium on large amounts of storage (or simply over-selling capacity). It's very hard for me to imagine how a company can have a large number of customers, each with 1TB of highly redundant data with high service uptime and immediate support, for $12/month.

For just photos, I would buy the second or third to cheapest option and be done with it, but it still gives me the willies that these prices are so low. The prices of different providers seems almost spastic, running from $1.50 to $150 for a terabyte of backup. Unless there's very good reasons for the variance in price, something seems fishy.

> I don't buy remote hosted storage, so when I saw $12/month for 1TB I was like "whoa, that's way too cheap to be reliable".

The standard market rate is actually closer to $10/TB and that's from the big providers like Google.

> It's very hard for me to imagine how a company can have a large number of customers, each with 1TB of highly redundant data with high service uptime and immediate support, for $12/month.

It's actually not as hard as you think. I built and colocate my own 56TB rackmount and over 4 or so years (the warranty on the drives) it works out to around $2.5 per raw TB, inclusive of all hardware and bandwidth. Optimize that for storage (a lot of my server costs are compute), scale it up, assume the server will live 8 years or so until you replace it and you should be able to get that below a dollar per raw TB. Replicate a few times and you're done.

> The prices of different providers seems almost spastic, running from $1.50 to $150 for a terabyte of backup. Unless there's very good reasons for the variance in price, something seems fishy.

The reasons are variations in replication (for example 3 copies vs 2 copies vs no copies vs 10+3 erasure coding), support, location and bandwidth prices I imagine.

> It's very hard for me to imagine how a company can have a large number of customers, each with 1TB of highly redundant data with high service uptime and immediate support, for $12/month.

Some of them (I remember Tarsnap's page about this) outsource the storage itself to AWS.


$12 per TB per month would give a slight profit margin on the S3 Standard - Infrequent Access class (currently, a profit of about $2.20 per TB per month) and a significant profit margin on Glacier (about $10.90), with an appreciable loss (about $6.50) if the average customer block exceeded the Infrequent Access restrictions. It might be possible to make the client optimize accesses in some way that makes it less likely that many blocks will exceed Infrequent Access rules, and maybe even to try to keep the typical block in Glacier instead?


It seems like an interesting challenge. (Yes, support and administrative costs need to come out of that profit margin.)

Also, Amazon apparently offers "lifecycle policies" to try to do this automatically instead of explicitly. It seems like those policies try to migrate blocks into a cheaper tier that assumes less frequent access if the blocks have not, in fact, been accessed on a certain schedule. A cloud storage vendor using AWS as its backend could then try to optimize manually at the per-user level, or just let AWS do it itself empirically, which wouldn't save quite as much money as correct guesses about what particular users will do with particular data, but would require minimal engineering effort on the vendor's part.

Edit: It seems like it will be hard to compete with the AWS/S3 backend for this kind of service! Now I wonder if there are providers who are known not to use S3.

That reminds me of CrashPlan. They throttle uploads and deny it – and they have been promising a native Mac client for years. Scanning of new / changed data is rather slow too. I was testing SpiderOak two or three years ago. The client was awful but slightly less awful than CrashPlan's Java-based client.

These days, I use a mix of Backblaze and Arq (with Amazon Cloud Drive).

Backblaze is great but versioning is limited to 30 days. More versioning is probably not feasible, i.e., other cloud backup providers with unlimited versioning have to find other ways to keep their data usage under control.

My experience with spideroak has overall been very poor as I discussed in a previous thread on HN [1]. I have since moved away from spideroak and am using duplicati [2]. Honestly, things have been much better since then. I have not had any issues, and whats more, duplicati is free and open source.

[1] https://news.ycombinator.com/item?id=13306745 [2] https://www.duplicati.com/

I have been using spideroak for a couple of years.

I agree its slow compared to other providers, but its seems reliable to me, both what it uploads and testing restores.

Right now I have a small backup selection (around 90 gigs) but most of the time I had been running with a approx 800 gigabyte backup set on my mac and except for slow initial upload (I am located in europe so that might also factor in) everything have been running fine.

Shameless plug. Not because I'm affiliated, but because I really like the product. Arc Backup. It's not cheap, and it doesn't include any storage, but you get a good client, and you can choose the storage you want. Most of the popular cloud providers are supported.

Would you say those two last tools are the best option as of today as far as using open source to conveniently store zero knowledge bits on S3? Haven't looked into the space much before but I'd gladly get rid of spideroak if I could self host it.

Check Duplicati, it's based on duplicity.

https://www.duplicati.com http://duplicity.nongnu.org

Will second that. I've been using the OS X build of Duplicati for a while now to backup via SCP and it works flawlessly. And, yes, I did test restoring some files from time to time with it. Note that the new "Duplicati 2.0" is no longer based on duplicity as far as I understand but a complete rewrite. It also uses a completely different backup architecture which should avoid / lessen the need for regular "full backups".

I backup to a 1 TB storage VPS from time4vps.eu, which at 72 EUR for two years is also much cheaper than Crashplan which I used before (and which seemed to be far more resource hungry on my Mac).

Only downside of Duplicati in my opinion is that it performs daily backups not "real-time" backups as some others do.

On the other hand, the post above asks for a "full service solution", in that case I would probably still recommend CrashPlan - it also supports local encryption with your own encryption keys and has been very reliable for me as well.

Wow, time4vps.eu is even cheaper than hetzner for storage. Definitely going to be testing them out for my cloud storage needs.

They are not, check out Hubic - 10TB for 49€ year

Hubic may be interesting for personal use but it's unsuitable for commercial use, which is my actual goal. I need something which I can buy in bulk and scales into PB range.

For example the hubic contract [1] states that bandwidth is limited to 10 Mbit/s upstream and downstream. Compare this to 400 Mbps dedicated port speed for each Storage server at time4vps.eu. Indeed, time4vps.eu offers 32 TB of bandwidth per month with the 4 TB storage plan, while hubic's 10 Mbit/s means that even if I transfer 24/7 for the whole month I'll only be able to transfer 3.3 TB of the 10 TB they offer.

For heavy personal use something like Amazon Drive [2] is probably a better choice, because it offers unlimited storage for $60/year. People on reddit are saying they have hundreds of TBs stored with no complaints.


[1] https://hubic.com/en/contracts/Contrat_hubiC_2014.pdf

[2] https://www.amazon.com/clouddrive/

Syncanny development appears to be on indefinite hiatus, so it's not something I would want to trust my backups with.

I think this is missing the larger problem with the claim. Yes, using the term clashed with academic cryptography to some extent. But the larger issue was even as they intended to use it, it's not a completely accurate description of their product. They learn quite a bit of information compared to you storing the data locally.

Why the collision with academic cryptography doesn't matter: Anyone who had even a basic understanding of their product + some academic crypto background would get what they were going for: they have no knowledge of whats going on. Also, strictly speaking,the academic term is zero-knowledge proof or zero-knowledge proof of knowledge. Ie, zero-knowledge is an adjective used to describe a proof (and indeed, if you look at the history of how these evolved, that is exactly what happened). You could reasonably use zero-knowledge as a modifier for something else and it could be acceptable. Indeed, it's a fairly good shorthand for a particular class of definitions of privacy/confidentiality that require any transcript of the protocol can be produced by a simulator who has no knowledge of what transpired.

The problem is Spider Oak's cloud backup cannot be zero-knowledge or no knowledge. It almost certainly leaks when you update files and when you delete them. Perhaps they don't log this or delete the logs, but they could. And this meta data could matter to businesses.

Yes, of course there's some traffic analysis that would be possible, as there would be with any such service. But for the record: we keep logs for a limited time, and we don't just encrypt each file individually.

Instead there's an encrypted journal and encrypted data blocks. (Having the additional layer of data blocks allows for better deduplicating one version of a file to the next.) So for each transaction that's uploaded to the servers, we know that the journal gets longer, and that data blocks are added or removed (or both.)

All the database work for keeping track of the data blocks (reference accounting, garbage collection) is done client side. More details in this post from 2009: https://spideroak.com/articles/why--how-spideroak-architectu...

This is pretty impressive. I thought that you only encrypt content and filename. But this goes way beyond what I expected from such a service.

Right, some leakage is inherent and what you provide may be good enough or even the best you can reasonably do. However, There's a long history (even in academic crypto) of what seems like insignificant leakage being important. So it's good to be overt about it.

So it looks you are doing blockwise encryption? Which means at least conceptually, not only do you leak when a file is updated, you leak what chunk? At least I'm assuming the journal isn't append only.

For what it's worth, I think of a journal as append only by definition and that's what SpiderOak does. Unless you have millions of very small files, the journal is going to be tiny relative to the backup content so this is fine.

So the server doesn't have a concept of "an existing file was updated" vs "a new file was uploaded" etc. The server only knows "new blocks have arrived." All the "smarts" are on the client.

In general operation, only new journal entries and new blocks are added. The only time blocks are removed is when the user intentionally chooses to remove data (we call that operation "purge") Intentional purges can also reduce the total size of the journal, and this the only operation that does so.

Most backup software removes previous versions and deleted files after 30 days, but SpiderOak keeps these indefinitely by default, to allow for for point in time recovery, restore from ransom ware infections, mistakes you don't catch right away, etc. You can set a different retention policy if you prefer.

> Why the collision with academic cryptography doesn't matter[...]

Good description and I agree that the collision is not confusing to someone who already knows what a zero-knowledge proof is. But I still appreciate the change because people who first hear the term from the website could be pretty confused if they hear about the academic term later.

PS. I see your point about leaking of some metadata but it seems very difficult to expect any cloud service to avoid this. The only solution I see is to continually re-upload a re-encrypted version of all data whether it's been updated or not, and maybe pad the uploads so that they are all some maximum size regardless of how much data there actually is.

Oh, there is no good solution. You need to do something using ORAM. There some clever tricks you can use that might be efficient (e.g this paper does some very interesting things: https://www.internetsociety.org/doc/oblivisync-practical-obl...), but I wouldn't realistically expect Spider Oak to cover it. What I would expect them to do is clearly state what they don't cover. Metadata is a thing people understand.

I'm always a little tweaked when somebody pitching a client product says "we can't access your data".

Am I running your software in a process that has network access? Then you can access my data.

I understand the point you're trying to make, and I totally get that architecting a system so that unencrypted data doesn't leave my device is superior to an architecture where it does.

But I still must, ultimately, trust you, your competence, and your motivations. If I trust that you don't want to access my data, and have tried to architect your systems so that is hard to do, and are competent to do so, then I can trust my data is probably safe.

But that's not the same as it being physically impossible for you to access my data.

In Tahoe-LAFS, it is actually a proven truth that, up to the level of cryptographic unguessability, server operators cannot read stored files without knowing the client-side key. If you can prove otherwise, then you can earn a spot in their hall of fame: https://tahoe-lafs.org/hacktahoelafs/

Tahoe-LAFS providers like Least Authority [1] and Matador Cloud [2] pride themselves on not being able to access your data.

[1] https://leastauthority.com/ [2] https://matador.cloud/

Like OP said, having that design is valuable, but you're still running someone's software on your computer.

With Tahoe LAFS, you're either downloading the pre-built binary or compiling from source. Either way, you're trusting the person who signed the binary/source or the person who hosts them.

This is similar to when Apple says they can't read your messages. Sure, they may not able to decrypt the data on their server, but they're in total control of the client software. They can get access to your messages.

I do think it's in Tahoe/Apple/etc's interest to NOT be able to see their data. For one, they said they can't and reputations are important. It may also be beneficial when dealing with law enforcement requests. So trusting them isn't totally unwarranted, but there's a lot there that isn't mathematics.

(You're never going to get to 100% math, but there are ways to get further. For example, someone might eventually write a Tahoe LAFS client that comes with a machine-checkable proof that it doesn't leak plaintext. You now need to trust the proof checker, but it's progress.)

The client could be open source; indeed the network protocol might even be simple. The client distribution or even implementation might be handled by third party. If for commercial reasons the provider wants a fancy shell and doesn't trust the open source or third party for that, they might consider a locally-untrusted (and possibly even remotely run) GUI that is a frontend for your local (trusted, open-source) backup engine.

I believe you can architect a solution such that it's easy enough to get a third party (or yourself) to verify it is safe and private, even in the face of a hostile backup provider (or more realistically, a backup provider that's met an opportunistic law enforcement agency waving a sternly worded letter).

If enough parties collude, they can still gain access to your data, but at that point I'm pretty sure the backups won't be the weakest link anymore.

I mean, you're still running an OS, and compiling with a compiler, installing and running software from a package distributor, and using a CPU with a management engine, and all those things might have backdoors too.

What do you want then? Only use software you've written yourself, on a non-networked computer?

Like I said, I don't think it's unreasonable to trust Apple or the Tahoe LAFS download server. I think they're improving the state of security and I would use those products. I just want people to be clear about the actual security properties of the whole system.

For example, people should know that blanket statements like "Apple cannot read your messages" are false. And your client/server protocol can be state of the art, but you still rely on the much-maligned HTTPS certificate infrastructure (among other things) to get the client bits.

Tahoe-LAFS is really awesome open project by great people with a rich history of solving complex problems.

If this space is interesting to you, spending a few days reading through their Trac is educational and rewarding! Highly recommended.

Trivia: One of the founders of LAFS went on to found Zcash, the (actually) zero knowledge crypto currency.

Yeah in that case Tahoe-LAFS and the storage providers are different entities. So if you trust the Tahoe-LAFS developers, then you don't need to trust the storage providers.

You wouldn't have to trust them if they made the client open source. It still baffles me that they're aren't doing that given that their entire sales pitch is about security and privacy and "no knowledge".

I'm a paying customer and I use SpiderOak for some non-critical backup/synchronization tasks but I refrain from using it for anything sensitive because of this.

I'd also like to be able to hack the client to remove all the cruft I don't need, it's not exactly pleasant to use currently. On linux I'd actually prefer a good CLI over the weird GUI we have now.

Thank you for the feedback and we'll be working on it!

FYI, many people do use SpiderOak exclusively from the command line. It's fairly scriptable.

Here's the command line options: https://spideroak.com/faq/how-can-i-use-spideroak-from-the-c...

Also, the source code for our other products (Encryptr, Semaphor) is published. SpiderOakONE was first developed in 2006 and open source business models were not so popular at the time. It's been much harder than I thought it would be to make SpiderOakONE open source but one glorious day we will get there. https://spideroak.com/solutions/semaphor/source

> I'm always a little tweaked when somebody pitching a client product says "we can't access your data".

If their privacy policy prevents them from accessing your data unless authorized, then this isn't an unreasonable statement.

Often a company needs data for a certain purpose, but there is no good technical solution for enforcing that. This is where privacy policies can benefit users, by providing a legal agreement to protect users beyond the abilities of the underlying technology.

If the service doesn't enforce end-to-end encryption of customer data, then it's a misleading statement.

How would you recommend phrasing it?

Your typical user is much more at risk with end-to-end encryption than with a well-architected SaaS product, since their likelihood of being a victim of some sort of easily preventable fraud is much more likely than them having their data misappropriated. (Think of how Gmail automatically handles con artists, phishing scams, viruses, XSS attacks, SPF/DKIM/DMARC, etc.)

Because of this, users tend to be better off having their data protected through a combination of technology and legal agreements rather than protected solely through technology, although this can be difficult to communicate.

E.g. how would you communicate to your (archetypal) grandmother that she's better off using Gmail than the same sort of setup as Edward Snowden, and make her feel safe doing so?

> Your typical user is much more at risk with end-to-end encryption

Not all e2ee products are created equal. I think your description is accurate for the PGP ecosystem, because for example it's hard to be sure you've got the right key from the key servers, and anyone who can use email can contact you. In fact most crypto folks I know believe email is an unsecurable platform.

However good e2ee systems provide strong Authentication of the data's origin, which is often more valuable than the encryption. In SpiderOak's Semaphor for example, a team admin approves new team members before they can communicate with other people on the team. It's explicitly a tool intended for safe internal communication with a specific in group, such as an enterprise team.

Whenever a non-technical salesman (or CS rep) tells me "we can't access your data", I usually smile and take it to mean "I personally don't have access to a GUI for sifting through your data". Not that such an interface it likely to exist at all, but that I don't trust them to know anything more detailed than that.

Right, that's why I think SpiderOak (and preferably the whole industry; Signal, WhatsApp, etc. are in the same boat) having a marketing term for this is important: "our product, if properly implemented, is designed in a way where nobody can build me that GUI" is meaningfully different.

Yeah, I wish there was a way to communicate this property.

Well it depends on if you can run the software in an isolated network. I can trust client software a lot more that can run on a machine isolated from Internet access. If someone pitches a product like that to me, then I can proceed without really having to trust them.

I said that -- "process with network access".

If I'm running your software, and there exists any channel by which that software can talk to you, then ultimately you can access whatever that software has access to.

"network access" is not the same as "Internet access". It seems pedantic, but there are very large networks with no connectivity to the Internet.

>But I still must, ultimately, trust you, your competence, and your motivations.

This is true about any entity that has access to your data, including your bank, health insurance, or any other organisation which you have an account with. It is true that they don't say they can't access your data though...

Regarding their motivation - having developed a system with end to end encryption for sensitive information, I can tell you that the worst thing from both a business and professional perspective would be a data security breach - that gives quite a lot of motivation to get things right...

In case anyone is curious about why this was problematic:


Zero Knowledge is something you're most likely going to find in an authentication protocol, not an encryption protocol.

While I have mixed feelings about "No Knowledge", it's at least not a collision with a different concept.

Good on SpiderOak for the effort here. It shows they do listen, at the very least.

Thank you. It would be easier to create friendly terms that describe end-to-end encryption if non-encrypted cloud providers weren't actively trying to mislead people into a false sense of security. My previous rant about this is here: https://news.ycombinator.com/item?id=13303599

I was surprised to read their mobile solution delegates the decryption to a software running on their server: https://spideroak.com/manual/spideroak-on-mobile

This is clearly not "no Knowledge"....

Looking forward to standing corrected if I am wrong.

AFAIK, they make this very clear if you ever try to use the mobile content that you're giving them access to all of your data. I think there's a sufficient amount of big scary warnings and from the way SpiderOak is designed it makes sense to me why a true mobile client wouldn't work out so well.

This applies to the entire industry, we are often inflating our products with terms that do not describe the reality. Technical accuracy is important because it can drive a purchase decision when comparing features with a competitor. Companies and individuals invest time and money on these software/services, misleading them with inaccuracies can harm them directly or their business.

At least it's an honest statement from SpiderOak, it's better to fix a misuse of a term and admit an error than throwing a misleading term describing a product used by thousands of people and then delete it as if nothing happened.

When I see this post, I cannot help but think of how Docker described "Swarm mode" orchestration features during DockerCon 2016 using the terms "self-healing" and "self-organizing". Obviously, "Swarm mode" was neither "self-healing" nor "self-organizing" and a possibility is that they had no idea what those terms meant, but it looked good on paper and from a marketing point of view. While they have fixed it in the documentation after pointing this out internally, these terms have leaked in many blog posts and are still in plenty of talks recording on Youtube. It became hopeless to stop the spread of misinformation.

Despite this change, a lot of SpiderOak customers are still going to use the term Zero-Knowledge to describe the software to their friend/co-workers or business partners. The term will stick to them for awhile.

Previously [0], HN and experts criticized SpiderOak for using the term improperly in their marketing. (E2E is not the same as zero-knowledge storage). And now they admit that they knew at the time that it was used improperly.

So, why did they use it?

[0] https://news.ycombinator.com/item?id=13301936

This is misleading.

They admitted at the time they knew it was being used improperly, they were just attached to the usage.

They are now following up to say they have detached themselves from this usage. This seems like very responsible behaviour that should be applauded.

I'm pointing out that they knowlingly used a word improperly... and for that I am called misleading.

You suggested that they were only coping to the improper use now, rather than being upfront about it at the time. That's misleading.

Probably because it's difficult to explain E2E to users in an easily-digestible way, and most users don't know the cryptographic definition of zero knowledge. I'm not excusing their use of the term, but I can see why they think new terminology is needed.

So, why did they use it?

I do not know why, but good marketing and technical accuracy can easily clash. Even just trying to communicate generally can be tripped up by different uses of the same term. You tend to find that out by stepping in it, not by somehow knowing ahead of time that X term will be drama for some reason.

"zero" is one of those cool words. You'd have to beat the promotional types off with a stick.

"Your data is completely safe from ... any threat."

How is that possible?

Because they don't have your data, they have your encrypted data and no key.

Last week I emailed the Information Commissioner's Office in the United Kingdom https://ico.org.uk/ about whether, in their view, storing encrypted backups in another country, when the key never leaves the UK, counts as moving personal data outside the country, sadly they said yes, I don't really have faith they understood the maths though.

Maybe they understand the theory but have no means of verifying the implementation, while its easier to verify the location of data.

They don't really verify things anyway, they wait for people to inform them of issues.

Encryption can be broken, but, one assumes: a) that your backups aren't likely to be stolen, b) that no one cares enough and c) by the time they are, the people whose information you stored are dead

If you logged in you provided the key: NOTE: Logging in via the SpiderOak website does temporarily allow SpiderOak employees access to your password. Due to this exposure, we discourage users from entering your password online if they wish to fully retain our Zero-Knowledge privacy.

Encryption can be broken, especially in the long run.

If they have no key how come I can login and download files from the web UI without providing my key?

It's not.

It's probably relatively safe at rest from compromise on the server(s). Endpoint attacks (users' own systems), or some form of targeted client update (e.g., client code that's dropped to individual users), either by SpiderOak or through other means (MITM / cert hijacking) strike me as plausible routes.

From context I assume they mean threat levied against the company or data entrusted to it. They should probably change it to "any server-side threat", or just nix that portion entirely. (Nixing it is probably most reasonable, given that they could theoretically push an update that steals secret keys, AFAIK.)

> For our secure group chat, file sharing and collaboration tool Semaphor, it means you can even review the source code.

Has anyone ever tried to "review the source code"?

"review the source code" links to https://spideroak.com/solutions/semaphor/source which leads to https://spideroak.com/releases/semaphor/source which is a 404.

It's worked for me in the past, and still continues to do so.

"As we launch a new website today, we changed every mention of Zero Knowledge to No Knowledge."

Dear god FINALLY. This has annoyed me constantly about spideroak.

You can see the full HN furore here: https://news.ycombinator.com/item?id=13303436

The change was definitely overdue.

For me the problem with Zero Knowledge or No Knowledge is, that the website usually don't say, whether they encrypt the content, directory structure, filename or file size. Often it is just the content. It would be great if the services would explain this in more detail. What does SpiderOak encrypt actually?

I guess a real "No knowledge" storage would be just a container and you read and write blocks, i. e. the filesystem format is implemented on the client side. Of course this make features such as versioning difficult to implement and probable everything would be a little bit slower.

Edit: The post from rarrrrrr explains the technique of Spider Oak and he links to a Blog entry. This is pretty impressive.

Your second paragraph describes SpiderOak quite accurately: it is exactly a logical file system implemented client side, with all the database work to support that done locally. It works because Sqlite is awesome.

I have been using sync.com for about a year now and am very happy with it. It says it's zero knowledge (I'm not qualified to judge the veracity of that) and costs under €60 pa for 500GB. I have also found them exceptionally friendly to deal with. It's ridiculous but I was chuffed to receive a postcard signed by about a dozen people after signing up. I used to use crashplan but after doing a successful restore from a supposedly good archive I found that thousands of files were missing. The other thing I use as was mentioned elsewhere in this thread is duplicati 2. Has worked perfectly for me so far.

Just got my postcard as well! Sadly no signatures but lots of stickers. I love me some stickers. Really enjoying their service so far, and with them I can actually visually see what's happening (encryption and uploading) whereas SpiderOak makes it less clear / obvious.

Maybe they were a little quieter back then or maybe they just liked me more? What can I say? :)

I think in terms of catchy marketing phrases, its a good switch, considering they wanted one that sounds as good as the previous one yet does not clash with existing professional terminology. I remember thinking for a few seconds about what I would change it to back when reading criticism about them using the term "Zero Knowledge" and couldn't off the top of my head think of something different but still catchy. Seems obvious once it's thought of...

Great news.

I tried spideroak and liked the product, but the inability for me to pay using anonymous payment methods has lead me to use sync.com instead.

I had trouble with a prepaid debit card, and they unfortunately don't take bitcoin either.

Odd, but they used "zero knowledge" in their title to describe themselves.

Anyone else pick up on this hilarious irony?

Hacker News is becoming a comedy site similar to The Onion.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact