Spider Oak - Please stop describing your service as "Zero Knowledge" unless and ...

jvehent · on Jan 2, 2017

In cryptography, "zero knowledge" means something very different than "service providers cannot access cleartext data".

> In cryptography, a zero-knowledge proof or zero-knowledge protocol is a method by which one party (the prover) can prove to another party (the verifier) that a given statement is true, without conveying any information apart from the fact that the statement is indeed true.

source: https://en.wikipedia.org/wiki/Zero-knowledge_proof

z.cash is a zero knowledge system and has a good definition of it on its FAQ:

> Zero knowledge proofs are a scientific breakthrough in the field of cryptography: they allow you to prove knowledge of some facts about hidden information without revealing that information. The property of allowing both verifiability and privacy of data makes for a strong use case in all kinds of transactions, and we’re integrating this concept into a block chain for encrypting the sender address, the recipient address, and the amount. A block chain that encrypts transaction data (making it private) and lacks zero-knowledge proofs also lacks the assurance that all the transactions are valid. This is because the nodes in the network can’t determine whether the sender really had that money or whether they previously sent it to someone else, or never had it in the first place. The encrypted data becomes unverifiable by network nodes.

source: https://z.cash/support/faq.html?page=0

trendia · on Jan 2, 2017

A lot of customers are going to assume that zero knowledge means no cleartext data is ever stored. I assumed it, and I'm no newbie.

This seems to be an abuse of the motte and bailey kind: they use a word which everyone believes means one thing, but when questioned they resort to a less commok definition because they 'didn't mean it that way.'

rarrrrrr · on Jan 2, 2017

SpiderOak founder here...

A few cryptographers have noticed SpiderOak's marketing term Zero Knowledge is inconsistent with the academic definition. Maybe it doesn't mean what we think it means[1]? SpiderOak was one of the first companies to use this phrase commercially and the need has only grown stronger.

At the heart of the issue is the difficulty for end users to decipher the terms cloud vendors use to describe their security. Doing so would require discrimination between transport encryption, data encryption, meta data encryption, encryption at rest vs. in motion, and then most importantly evaluate key management and access. This vocabulary is foreign to most folks. Vendors often exploit the inaccessibility of these topics to make a series of statements that, while often factually correct individually, together create a false sense of privacy.

SpiderOak launched a online backup product for Linux, Mac, and Windows in 2007. The competitors were companies like Xdrive, Mozy, Carbonite and SugarSync. Each claimed that customer data was fully encrypted. Even the most credible journalists writing for well funded publications with fact checking budgets were fooled and repeated these misleading claims to end users. [2]

In 2009 when Dropbox launched, they made misleading claims about the encryption of customer files and their internal ability to access customer's data or provide that data to 3rd parties, leading to a well publicized FTC deceptive trade practices complaint. [3] The deception had been so effective that leading software engineers were shocked to discover Dropbox had full access to the data they had stored online. [4]

In response to customer requests on one of their forums, Mozy explained why it would be "impossible" for a storage service to protect users' privacy by encrypting the file and folder names customers store in a way Mozy could not read. SpiderOak customers had benefited from the impossible for years.

Recently Slack made the unbelievable claim on Twitter that their service includes end to end encryption (it doesn't.) Perhaps they mean from your end to their end?

Lately there's a new phrase "customer managed keys" used by cloud providers, which sounds really great, but is typically just elaborate hand waving that ultimately allows the vendor and their staff the same level of data access as if it were not encrypted.

In 2007 we found ourselves frequently explaining "we don't know the names of your files, the names of your folders, the date they were created or last modified or accessed, their size, their checksums or hashes... in short we know nothing about your data except how much you store." We started using the phrase Zero Knowledge as a headline to this long explanation.

It's important to recognize that cryptographers already understand encryption and the terminology is intended for everyday folks. When I'm speaking with a technologist about how SpiderOak products work, I would typically use the phrase end to end encryption.

If we want to end mass surveillance, the only way this can happen is through viral adoption of end to end encrypted products and services. Great UX, education, and terminology are powerful tools, and unlike phrases involving the word "encryption", to my knowledge no company has yet been shameless enough to deceptively use the term Zero Knowledge.

[1] https://www.youtube.com/watch?v=G2y8Sx4B2Sk

[2] http://allthingsd.com/20080403/sugarsync-offers-the-best-met...

[3] https://www.wired.com/2011/05/dropbox-ftc/

[4] http://tirania.org/blog/archive/2011/Apr-19.html

rsync · on Jan 2, 2017

"Doing so would require discrimination between transport encryption, data encryption, meta data encryption, encryption at rest vs. in motion"

...

"This vocabulary is foreign to most folks."

Please, please keep taking these customers. Can we send you leads directly from our pre-sales inbox ?

"If we want to end mass surveillance, the only way this can happen is through viral adoption of end to end encrypted products and services."

Actually, what we need to do is throw some money at the guy writing borg[1][2]. Or maybe sponsor a code audit. I think I am going to put that on our to-do list for this spring ...

[1] https://borgbackup.readthedocs.io/en/stable/

[2] https://www.stavros.io/posts/holy-grail-backups/

Ar-Curunir · on Jan 2, 2017

The issue is not you vs. other companies; it's you vs 25+ years of cryptographic literature.

> no company has yet been shameless enough to deceptively use the term Zero Knowledge.

Except you guys? Why use the phrase "zero knowledge" when you fully know that it has a predefined meaning? Call it no information, no leakage, zero leakage, whatever, but why the one term that is already used to refer to a different concept?

I get that it's a sexy name, but that's why cryptographers use it to refer to a much cooler concept than mere encryption.

danbruc · on Jan 2, 2017

A lot of words are overloaded, across domains as well as within domains, that is not ideal but also no unsurmountable problem, you can always clarify your usage by providing definitions. There is certainly not much of a point to explain things in precise and correct terminology if this prevents the intended audience from understanding you. On the other hand, people aware of the technical details will have no big difficulties to understand something despite simplifications or inaccurate terminology.

I am actually not even sure whether zero-knowledge is not technically correct here. Terms like zero-knowledge proof or zero-knowledge protocol have very specific meanings and certainly do not apply here, but is zero-knowledge on its own really used for something more specific or other than not leaking knowledge? I also immediately thought of zero-knowledge proofs and protocols but nothing like that is mentioned anywhere, at least as far as I can tell, so it was kind if my mistake to read something into it that was not actually there.

EDIT: Zero-knowledge seems to indeed have a very specific technical meaning on its own [1], at least in the context of zero-knowledge proofs.

[1] https://en.wikipedia.org/wiki/Zero-knowledge_proof#Definitio...

tptacek · on Jan 2, 2017

This particular term is not overloaded. People familiar with encryption know it to mean something specific.

rarrrrrr · on Jan 2, 2017

Thanks for the feedback. For what it's worth, we did try a bunch of alternative wordings, and Zero Knowledge was the phrase that non technologists found most accessible.

We prioritized making the explanation clear to non-experts vs. to the community of cryptographers.

Esau · on Jan 2, 2017

I believe your explanation but it still strikes me as the wrong thing to do.

stavros · on Jan 2, 2017

I'd like to propose "Zero Access", as in zero access to the plaintext.

nickpsecurity · on Jan 3, 2017

That's actually not bad. I was fine with SpiderOak going for a term that is simple, catchy, and easy to market. Zero Access is the kind of alternative that might work. That specific one might have a problem: send perception of user having zero access to their own data when most clouds constantly reinforce "access from anywhere any time."

You're thinking along the right lines. I think variations of the words safe and vault have worked for other companies, too, given people understand what they do. "Your data is in a locked vault that we hold for you while you keep the keys or combination." That sort of thing.

B1FF_PSUVM · on Jan 4, 2017

They could go with "Know nothing".

(Or "Jon Snow", for short, if they can get away with it without being sued - "Jon S. crypto, we dont know nuthing" ;-)

jameskegel · on Jan 2, 2017

This doesn't feel like the correct solution.

syshum · on Jan 3, 2017

This happens all the time with marketing, surely this is not the first time you have seen a company co opt a term for marketing purposes.

This seems like a case of perfect being the enemy of good.

tptacek · on Jan 2, 2017

Since you're speaking the language of product marketing, which is one I sort of speak too, can I gingerly offer you some advice?

Until you come up with some other cool-sounding term for end-to-end encrypted storage, every time your product is discussed in a forum that includes people familiar with cryptography, the discussion is going to be dominated with threads about how your product doesn't do what its name claims it does.

If it were me, I would think of this as a very suboptimal situation; sort of the worst case for what a product name can do.

iaml · on Jan 3, 2017

Unless your target audience is not people who read such forums.

nickpsecurity · on Jan 3, 2017

Exactly. The obvious counterpoint. Those people aren't buying shit from them and represent basically no market share. It's privacy-conscious users of services like DropBox they're going after. Most of them don't know crypto or a lot of terms people suggested here. Marketing wisdom dictates you call it whatever gets them to see its value & buy it. Then ignore the haters as you roll around in cash.

Business 101 if goal is max adoption & profit.

tedunangst · on Jan 3, 2017

Being the storage service that no cryptographer will recommend seems like a gambit that may backfire. Any day now the mainstream press will start asking experts for advice, and you'll be left out of the recommended list.

nickpsecurity · on Jan 3, 2017

We've had some time for that to happen. So, whose top players in online storage, whose top in secure storage, and did that match your prediction? Im betting against it.

Ar-Curunir · on Jan 3, 2017

Except when people look for third party reviews of said products, and find threads upon threads of cryptographers calling out SpiderOak for misleading advertisement. Surely that's just bad PR?

iaml · on Jan 3, 2017

It really isn't. For the third party reading this thread only shows that some cryptographers are calling bullshit on semantics of their advertisement, which is more of a nitpick to most people, and some replies from people who say they use it and it's pretty good.

tptacek · on Jan 3, 2017

I get that, I do, but "zero knowledge" isn't all that compelling to begin with; this strategy just seems like it's almost all downside.

hedora · on Jan 2, 2017

I really want to give you guys money, but can't trust you without having client and server side source. I need client side source so third parties can freely audit your work. I need server side source so I can store my data at some random colo and wrap the rack in tinfoil (more realisitcally, so I know I can just switch providers if you are out of business in ten years).

Have you considered licensing your stuff using the BSL: http://monty-says.blogspot.com/2016/08/applying-business-sou...

This would let me pay you to continue your (very important!) work, and let me recommend your service to others.

[edit: For people not familiar with the BSL: It makes it easy to say things like:

"This release of the software is free for the first 100MB, then $10/TB after that. Licensing the software gives you non-transferrable rights similar to a BSD license. On Jan 2, 2027, the above usage restriction will expire, leaving the software with a BSD-style license"

You bump the expiry date on each release, so cheapskates have to wait 10 years for new features, and the developers have to continuously improve the software to maintain a revenue stream.]

rarrrrrr · on Jan 2, 2017

Thank you for your interest in SpiderOak and valuing work to improve the choices available that preserve privacy.

For what it's worth, everything we've built since 2008 has published source code. Most recently that's Semaphor[1], which is written in Go and React.

I think it's very important that products have what Zooko calls an "economic feedback loop" to be successful. As just one example, volunteer projects rarely have staff that do the grinding but necessary work of testing that each release works well on every version of all support operating systems and platforms, because it isn't fun. I think this is why although some teams publish their client source code, very few service providers publish their server source code. It would make it too easy for competitors to emerge and undercut on price while giving little back (the biggest cost is the often the development work itself.)

That said, we've been in business for 10 years and are not going away! Thanks for your feedback.

[1] https://spideroak.com/solutions/semaphor/business/tour

hedora · on Jan 2, 2017

I'm sympathetic to the economic feedback problem you're describing. I think the BSL addresses the concern about undercutting. Sure, people could pirate your software, but short of that, it probably makes more sense to implement from scratch than either wait ten years, or fork code that is ten years old, which is all your competitors could do with the source. Honestly, I would probably just pay for your service after spending a few hours spot checking the source.

Anyway, I'd love to hear your thoughts on the licensing model, even if you're not considering it at spideroak.

sreitshamer · on Jan 3, 2017

Is open source really required? We at Haystack Software make a backup app (Arq) with client-side encryption that doesn't require any server side code (it supports a number of cloud providers' APIs). We don't publish the app's source, but the data format is open/documented, the data go in your cloud account, and you can monitor network traffic from it to ensure it's not connecting to us or anyplace unwanted. Your data don't come to us at all -- they're sent to your cloud account.

nyolfen · on Jan 3, 2017

any plans for a linux client?

pvg · on Jan 2, 2017

I don't think you have to be a cryptographer to notice this and it makes you sound likes snake-oil salesmen even if you aren't. The misuse of 'zero knowledge' doesn't seem any clearer to non-technical users but it does a good job of confusing the sort of people you want recommending your product.

rarrrrrr · on Jan 2, 2017

Thank you. I'm all for switching if we can find a phrase that's accessible to non technical people.

Ideally it would be a phrase that's adopted by many sites, the press, etc. (as Zero Knowledge has been, for better or worse.) It should accurately convey the situation that 1) the data is meaningfully encrypted 2) the meta data is meaningfully encrypted and 3) only the customer has access to the encryption keys.

kakarot · on Jan 2, 2017

Thank you for the clarification. I really appreciate all the hard work you guys do in trying to combat unwarranted breaches of privacy.

I've had my reservations about companies that make such bold claims as yours but I will look into your platform more and give the free trial a whirl.

innocentoldguy · on Jan 2, 2017

I think it is worth giving SpiderOak a try. I've been a SpiderOak customer for several years, and have been quite satisfied with it. The UI wasn't the best at first, but it has gotten better recently. I haven't used rsync.net for a while, but their service is great too. It just takes a little more work to set up.

SilasX · on Jan 2, 2017

This came up in a previous thread (can't find atm) and I suggested alternate, more cryptographically correct terms: "provider-obscured" and "homomorphic" (this is like homomorphic encryption, but where the only operation allowed is retrieval).

rarrrrrr · on Jan 2, 2017

Thanks for jumping in with suggestions. It's actually a harder problem than it seems! I have a feeling "provider-obscured" and "homomorphic" (while accurate!) are significantly less accessible to end users. I'll try to break it down:

Do end users talk about the services they purchase using the word "provider"? Is for example, Facebook or Twitter commonly referred to as a social media "provider?" The most common example I can think of is ISP, but I rarely hear non technical people say "ISP". They say something like "I get Internet from Time Warner" instead.

Is "obscured" a commonly used, highly accessible word? Can you think of a few popular movies or books with that word in the title? Is it commonly used in news headlines?

So the proposal is a hyphenated phrase of these two uncommon words. I think it's likely "Zero Knowledge" would crush "Provider-Obscured" in an A/B test. Ditto for "Homomorphic."

Seeing highly technical terms in headings makes non technical people believe that the software is complex and hard to use, and is therefore not for them. IMO, this is one of the classic failings of security products in general. It needs more study.

SilasX · on Jan 2, 2017

Thanks for the explanation of your thinking there. FWIW, I don't think (contra the other posters) that it's that much of a stretch (of standard terminology) to call this "Zero knowledge" -- you are, after all, preventing information from flowing in a certain direction, just like in the ZK proofs.

With that said, what about "opaque" instead of "obscured" and "host" or "cloud" instead of "provider"?

Host-Opaque Cloud Storage

Cloud-Obscured Cloud Storage (okay, bad acronym)

(And I know it's kind of late for a name change anyway.)

jjeaff · on Jan 3, 2017

You do realize that he is talking about terminology for the general public? The names you are suggesting are just terrible. They don't sound good and are not even close to widely understood terms (no offense intended). Perhaps you are not a native English speaker?

Strom · on Jan 2, 2017

Well said and I encourage you to continue promoting your product with this strategy. Being overly precise with lingo will result in a worse world where only domain experts understand what's going on.

mark_l_watson · on Jan 2, 2017

I am a paying customer for storage and use your free Encryptr app and service.

I have no problem with your use of the phrase zero knowledge, but I understand the complaints.

fjrieiekd · on Jan 7, 2017

My main beef with SpiderOak is that I have no way of verifying these claims, so why should I use it over, say, Crashplan, which actually does do a better job of backup. I'll just use a tool like Veracrypt when I need the extra layer of protection, and Crashplan does a better job of incremental backup of these files, too, while SpiderOak uploads the entire file each time.

eternalban · on Jan 3, 2017

> If we want to end mass surveillance, the only way this can happen is through viral adoption of end to end encrypted products and services.

I strongly disagree. The "only" way mass surveillance will end is when is made illegal and the entities that practice it are treated as pariahs by civilized humanity.

It is purely a political issue.

Jivanyan · on Jan 2, 2017

Following to their architectural design, they do not get access to any encryption key and no key leaves user device in unprotected form. Is not this enough to be advertised as "zero-knowledge" service provider?

danbruc · on Jan 2, 2017

As a technical term zero-knowledge has a very specific meaning [1] and is not what they are using. Here it is just a marketing term and may confuse people knowing about the technical meaning but that is certainly only a very small fraction of the population and so it is probably not a huge issue.

[1] https://en.wikipedia.org/wiki/Zero-knowledge_proof

carussell · on Jan 2, 2017

Your link is for zero-knowledge proof. They aren't claiming anything in the realm of proofs, zero-knowledge or not.

If "zero-knowledge" implicitly meant "zero-knowledge proof", there would be no reason to ever use the latter phrase. Zero-knowledge is an adjective. It's a modifier. It's the "proof" part in "zero-knowledge proof" that's important in describing what it is. "Zero-knowledge" is a property of the method employed.

The irony is that, wrt the original comment, it's end-to-end encryption that would be a misleading and misapplied label.

I'm not affiliated with this company and I've never even used this service before, and yet it's immediately clear what zero-knowledge means in the context of a cloud storage provider: you never need divulge your keys, so the question of whether you trust your provider or not is moot.

Back when Firefox Sync first launched, I was chasing the idea of referring to it and any similar service as "zero-trust" systems. But building a service and referring to it as "zero-knowledge cloud storage" is totally acceptable.

danbruc · on Jan 3, 2017

Not sure whether I was clear enough, but I understand both sides of the argument. It is a very specific technical term and the zero-knowledge proof Wikipedia article states that zero-knowledge is the name of one of three properties that zero-knowledge proof have to satisfy, on the other hand it is also a nice, catchy and probably understandable marketing term if you want to express that you know (almost) nothing about the users' data.

Developer me would certainly prefer technical accuracy but we all know that users certainly could not care less what is the technically correct name for the thing. So I don't care at all whether they call it zero-knowledge or not, they are not trying to trick anybody into believing they are doing zero-knowledge stuff in the cryptographic sense. I actually like zero-trust but I can see how this could easily be interpreted in the wrong way, should or must not be trusted instead of need not be trusted.

Ar-Curunir · on Jan 3, 2017

Zero knowledge means a very specific thing in cryptography, and is used exclusively to refer to zero knowledge proofs; in all of cryptographic literature over the past 25 years I have not seen any other usage of "zero knowledge".

Either way, this system isn't "zero knowledge", even if that term were well defined for this situation; you leak file sizes and access patterns.

bad_user · on Jan 2, 2017

That's kind of bullshit though, you can't claim 2 common words from the English language in order to only describe a concept many of us don't understand. I'm a software developer, have been for 15 years, I've stayed fairly awake in college during my cryptography classes, have implemented hashing functions (mentioning this because such a history already place somebody in the 0.01%) and I've never heard of "zero knowledge proof".

Not surprisingly, the link you've given is about a phrase with 3 words in it, not 2.

And while I've always been annoyed about overloads of "open source", at least that's a words association that you won't hear from non-technical folks and that wasn't in use before OSI happened. And even so, note that OSI couldn't trademark it.

Ar-Curunir · on Jan 3, 2017

Just because people don't know the term doesn't mean cryptographers don't know the term; any cryptographer with any formal cryptography training has heard of the term, and it's not used to refer to any other concept in the cryptographic literature.

The usage of the term matters when it'll be cryptographers reviewing the work; almost every thread about SpiderOak I've seen calls them out on misleading marketing. Hardly good for PR.

geofft · on Jan 2, 2017

The term "zero knowledge" has a specific technical meaning in cryptography: https://en.wikipedia.org/wiki/Zero-knowledge_proof

Passing encrypted data through a storage device isn't a "zero-knowledge protocol" in a cryptographic sense, it's just normal cryptography.

federicobond · on Jan 2, 2017

No, that's called end-to-end or client-side encryption. Zero-knowledge is a property of a certain class of methods that allow one party to prove to another that a certain statement is true, without revealing anything else about it.

unstatusthequo · on Jan 3, 2017

Have you built something better? Its amusing to hear comments like this when there are really no commercially available alternatives that come close without managing your own. My time is much too valuable to run my own. If yours isn't, build one and share it and charge and market it using your favorite parlance and consider avoidin trolling the one company that's at least trying not to suck.