
ZeroDB (YC S16) Provides Security for Enterprise Big Data in the Cloud - stvnchn
http://themacro.com/articles/2016/08/zerodb/
======
myth17
Microsoft offers "Always Encrypted" for Query Processing over encrypted data
in SQL Server and SQL Azure : [https://msdn.microsoft.com/en-
us/library/mt163865.aspx](https://msdn.microsoft.com/en-
us/library/mt163865.aspx) (Disclaimer : Microsoft Employee)

~~~
mankash666
Does Microsoft use deterministic encryption for searchable encryption? I'm
sure OPE, Pallier etc. schemes are in use for columns that require those
properties

~~~
jahewson
Yes. It's configurable.

------
nickpsecurity
Whitepaper from Arxiv describing their scheme for others who like seeing
underlying details of security tech:

[https://arxiv.org/pdf/1602.07168v3.pdf](https://arxiv.org/pdf/1602.07168v3.pdf)

@ ZeroDB developers

I know the goal is to have the cloud provider untrusted in this model for
storage/processing. However, have you considered them actively malicious in
how they handle the protocol steps with the on-site client? As in, at each
step (eg a query), they might try to attack the protocol or especially the
implementation (a la OpenSSL infamy). Just make sure you have defenses for
that sort of thing.

~~~
michwill
Malicious server, apart from possibility of removing data, would look pretty
unusual for the client (like returning incorrect data etc). So it's a good
idea to look for these patterns. An attack would probably look like trying to
use the client as an oracle, and that's pretty detectable.

But as MacLane pointed out in another comment this is only the case for the
standalone database described in the ArXiv paper. The Hadoop scheme works
quite differently

------
dangerlibrary
> Currently, ZeroDB works with Hadoop and will soon will expand to other parts
> of the big data ecosystem, like Spark and Impala, as well as legacy
> databases, like Oracle, DB2, and MySQL.

I guess mysql is a legacy database, now. Someone should let all the committers
know that they should transition to support roles and stop new feature
development. Hadoop is the future of sql.

~~~
michwill
Yeah, I guess a little early to retire MySQL and Oracle :-) Thanks for
pointing out, legacy is more about DB2

------
mfrager
The biggest technical hurdle for this type of database right now is index
lookup. Since the nodes on the indexes are encrypted the client/server
requires a round trip for every binary tree index level that needs to be
traversed. This makes what is usually one of the fastest database operations
into a slow one.

~~~
mwilkison
This is indeed true for our standalone, open source database
([https://github.com/zerodb/zerodb](https://github.com/zerodb/zerodb)).

However, it is not the case for our Hadoop scheme (nor our future support for
structured database). In these cases, there is no round-tripping required. In
fact, it's significantly more performant than existing Transparent Data
Encryption in Hadoop, from both a latency and key rotation perspective.

We'll likely release a paper describing this new scheme later in the year as
well as publish at some conferences.

~~~
njkanjsd
Having worked on a similar product and heard a very similar description of the
'proprietary' method, I'm guessing either security, speed or both are actually
compromised.

~~~
michwill
There are many proprietary methods which are based on deterministic encryption
+ obfuscating word distribution, that's what most companies do.

We avoid doing that because of questionable security of such method. Also we
tend to publish what we do (stay tuned for Hadoop paper :-)

------
mwilkison
Hey HN, cofounder of ZeroDB here. Michael (/u/michwill) and I are excited to
be a part of YC and happy to answer any questions about the company!

~~~
jihoon796
Love the idea behind ZeroDB - kudos to you guys, and cheers from a fellow Tar
Heel!

What's your expansion strategy for Oracle/DB2/MySQL?

~~~
michwill
We have ideas how to make relational databases secure while running everything
server-side, thanks to recent research publications [notably CipherBase from
Microsoft Research
[http://www.cidrdb.org/cidr2013/Papers/CIDR13_Paper33.pdf](http://www.cidrdb.org/cidr2013/Papers/CIDR13_Paper33.pdf)]
and advances in CPU hardware. Early days, but we'll probably test it first in
the open source ZeroDB database
[[https://opensource.zerodb.com](https://opensource.zerodb.com)] and then
apply the same method to existing relational databases.

------
mankash666
CryptDb offers DB querying without having to load certain parts of the db to
your local machine, which is the model of operation by zeroDb in its current
incarnation.

Which is more secure? Does zeroDb use non deterministic encryption?

~~~
michwill
ZeroDB doesn't use deterministic encryption (neither Hadoop product, nor open
source database)

------
sobinator
For those who may know, how does ZeroDB stack up against an incumbent like
MarkLogic in the 'Security for Enterprise Big Data in the Cloud'?

------
Animats
That's a very strong security claim. What's behind it?

------
jhugg
As the founding engineer at VoltDB, I can tell you that no matter how cool
your name is, it can be annoying to be always alphabetized last.

~~~
simonebrunozzi
That's one of the reasons behind the name change from Cadabra to Amazon.com.

p.s. if you change it to 0db perhaps you might be listed before any "a"
company? Not sure if numbers precede letters in those lists.

~~~
michwill
0db - that's I was thinking of. Way to fight 0days

~~~
doublerebel
0db _definitely_ sounds like an audio brand. I don't recommend it.

------
dmix
Off-topic: whenever I see (YC __) in a title I always assume it 's a job
posting and automatically ignore it. Might be detrimental to label a blog post
appear that way.

~~~
nickpsecurity
Better to read it as advertising YC's investments on YC's forum.

~~~
dmix
You can advertise it without using the exact same format as job listing.

~~~
nickpsecurity
Looked different to me. Those usually don't have comments and other links
under the title. Just amount of time since post appeared. Sounds like two of
you are doing spot judgments on threads that ignore important info. Your
filters are title only instead of title + peripheral info. Prevents the
problem you're having.

