

ZeroDB – A Peek Under the Hood - mwilkison
http://blog.zerodb.io/a-peek-under-the-hood/

======
geoelectric
So, if I get this right, ZeroDB ends up being a service to persist (to it) a
flat dictionary of encrypted buckets.

When a given bucket is requested and decrypted by the client, it will have the
data as well as the references that let the client treat them as a B-tree,
presumably because the clients put them there?

I'm taking that assumption from "The server doesn’t know how individual
objects are organized within a tree structure, or whether they even belong to
a tree structure at all."

What does this mean for adding a node? Does the client have to traverse the
whole B-tree (requesting log(n) entries) down to the point of addition? How
about rebalancing, etc? It seems like the clients would be wholly responsible
for maintaining the tree.

~~~
michwill
You're right, clients will be responsible for inserting data, re-balancing the
tree etc.

~~~
beagle3
Ouch. Now, any memory overrun bug or a hardware memory error in any client can
kill index integrity (and possibly data itself, if data is on pages rather
than individual objects). I wouldn't run an important database on a server
without ECC memory - but ZeroDB doesn't seem to make that an option.

Are you addressing this in any way?

~~~
geoelectric
Well, if the indexes can all be updated with transactional integrity around
the whole thing (which I think you need anyway, per thread below about having
to do client-side rotates to self-balance the tree and keep log n) in theory
this is no worse than a memory overrun/etc. in the original database server.

Most fatal errors just mean you lose an entire atomic update, and the type
where you "successfully" write bad data imply the same sort of repairs as
server-side maintenance would.

In practice, I think the biggest issue is that if you had to restore a backup,
there'd be no way to restore only -your- tree, since the server doesn't know
structure or ownership of the buckets. You'd have to roll back everyone in the
bucket store.

I don't want to be too negative, because I think having something like this is
a good idea. It's plain there are challenges to be solved here though.

Edit: though I do catch your point that N clients all maintaining the same
index tree is potentially like having a non-redundant N-wide storage array--
failure rate is cumulative. Think you could possibly mitigate this with a
single-source-of-truth per table/index client-side design, but that's an
obvious potential bottleneck.

~~~
beagle3
> Most fatal errors just mean you lose an entire atomic update, and the type
> where you "successfully" write bad data imply the same sort of repairs as
> server-side maintenance would.

That's right. Except, if clients maintain the tree, there must be some
"maintenance" client, or each client must do some maintenance (find and fix
things that go wrong).

My issue is, indeed, the cumulative failure rate; and I have enough experience
with faulty hardware, especially memory, to know that important data needs
reliable memory (e.g. ECC), and a "client does maintenance" model basically
makes that impossible, unless you can mandate that all connecting clients have
ECC memory - essentially, server class machines.

The problem with clients maintaining indices runs very deep: Uniqueness
constraints are often implemented through index. If the index integrity is
violated (easy to do - you have 1000 clients, anything that goes wrong in any
of them - power surge, virus, ..., may corrupt the index), then it is possible
for the entire database integrity to be violated.

I'm sure there are way to mitigate this, reducing the probability down to
negligible (which is what ECC does - it reduces memory error probability to
negligible, not to zero which is impossible); for example, you could have
every client maintain their own index, not relying on any other client's index
but only on their immutable rows. That would still let a client violate
database integrity, but any other client would immediately notice and refuse
to work with the database.

However, I have so far not seen any reference to these issues by the ZeroDB
guys. I am waiting patiently.

~~~
geoelectric
Yeah. You mention ECC, and I was previously thinking some sort of integrity-
checking solution modeled on parity bits could work. Problem with that, of
course, is I think something needs to have complete knowledge of the data
structure to make it work, plus now you have yet another thing to update over
network.

I'm not sure that ZeroDB will necessarily have all the answers here
prepackaged. I'm still waiting to see exactly how intelligent their client
modules are. I doubt they're just delivering a bucket store, so I assume
they'll have client-side code that'll encapsulate the BST handling. The
question is whether it's naive or accounts for some of this. If nothing else,
I assume michwill is probably taking notes!

At any rate, my hope is this ends up being an interesting enough solution to
build some degree of pattern or best practice over to handle some of these
aspects. These might be as simple (and limiting) as "one table, one source of
truth, period," or "Use for write-seldom; read-often applications only," or
some other thing like that.

Even if so, this could be quite useful for -some- subset of applications, or
at the very least a useful step on the way to figuring out how to do this sort
of thing.

Certainly there are other challenges here as well. Beyond integrity issues,
there's still the information leakage issues. It's already been mentioned that
a binary search leaks order information to begin with, but if you know what
kind of tree is being used you leak a lot of order information as the rotates
happen.

But again, maybe surmountable. I'm eagerly awaiting the source implementation
so we can dig in.

~~~
michwill
> I assume michwill is probably taking notes!

Yes, you're correct! ;-)

> "one table, one source of truth, period," or "Use for write-seldom; read-
> often applications only,"

Right. Another model - "write and read only _your_ private information". So,
three use-cases here

> information leakage issues

I already think, for solving this we probably should just switch to ORAM. Very
valid concern! That said, we cannot really deduce the order of objects
referenced in leaf nodes (and there could be a thousand of them).

In any case. Expect our implementation to be extremely simple (but useful)
first. Probably suitable for "users record their private info" application.
Then we'll be addressing scalability issues, information leakage etc.

------
shanemhansen
I've heard nothing but bad things about ZODB. I question the wisdom of
building a database on top of a python database.

~~~
the-dude
I agree and I have worked with ZODB. It isn't a database, it is a mediocre
persistence ... thingy. You store objects and now your database is tied to
your code.

Have experienced nightmarrish scenarios when updating codebases.

~~~
michwill
I consider ZODB as more like a framework for quickly building databases. ZODB
developers (we've talked to many of them) consider upgrading objects as no-
problem (because they have very well understood patterns for that) etc.

While that's ok for Zope/ZODB developers, it's not for an average user. We
need to wrap it up to make easily usable, without requiring developers to read
tons of documentation and "Design patterns" in advance!

And, obviously, some universal json query language is needed

~~~
the-dude
How are you going to manage querying ad-hoc fields? AFAIK every field within
your ZODB-query needs to be indexed in ZODB first?

~~~
michwill
Indexes are created where you told them to. So, if you include an arbitrary
field, it's probably not going to be indexed to start with. And yes, query is
for indexed fields.

I guess, we can have another type of behavior where all new fields are
indexed. It doesn't sound like a "clean way" to me, however some developers
could desire that

------
bkeroack
Interesting that it's built on ZODB. For those that aren't familiar, in
essence ZODB gives your Python application a 'magic dictionary' which
automatically persists any keys written to it. It was probably my ignorance,
but while I thought the concept was interesting I ultimately abandoned using
it since I couldn't find any out-of-band way of inspecting saved data. Eg,
sometimes you want to run some ad hoc SQL to see what is stored in a
relational database but the only way to access data in ZODB (AFAICT) is within
the application itself.

It looks like ZeroDB abstracts away these issues, luckily. Is it intended to
be a full-scale application data store, or something more specialized like a
secrets store?

~~~
michwill
You're right, good that someone is familiar with ZODB!

We plan to make it a full-scale database. For that, we obviously require to
have some convenient query language (similar or even compatible with Mongo's).

We've talked to original developers of ZODB and they (independently) suggested
that having a language-independent query language is something they constantly
thought of :-) In fact, it seems like many of them dream to take ZODB (or a db
based on it ;-) where Mongo is now.

------
pjc50
_" The server doesn’t know how individual objects are organized within a tree
structure, or whether they even belong to a tree structure at all. ... When a
client performs a query, it asks the server to return buckets of the tree as
it traverses it, in a just-in-time fashion"_

The server can however infer quite a lot from the order of bucket request,
especially which buckets constitute a tree.

~~~
michwill
Yes, we think about it. This information is not rock-solid and changes with
time as the tree re-balances. But for ultimate security, I think, we need
something like a proper Oblivious RAM protocol

------
oh_teh_meows
What are the advantages of ZeroDB over a database that uses homomorphic
encryption like CryptDB?

Do they cater to different uses?

~~~
michwill
CryptDB doesn't use homomorfic encryption. It is doing sortable deterministic
encryption. E.g. after encryption the server can determine whether a > b or a
== b. They have a very good understanding of what data does it leak, when can
you do that and when cannot.

In our db, random observer cannot see how data are ordered or whether elements
are equal to each other.

Homomorfic encryption would be ideal of course, but it is slow and impractical
for the moment

~~~
quizotic
Actually, CryptDB does (or at least did originally) use homomorphic encryption
(HE) for evaluating SQL arithmetic expressions. While the earliest suggestions
for HE were terribly slow, CryptDB found one that was practical. If memory
serves, there have since been HE improvements claimed by IBM and/or Google.

But as michwill states (and CryptDB acknowledges), their approach either leaks
information or precludes classes of expressions. For example, their order-
preserving encryption could be susceptible to a brute-force attack via 'a > b'
... '(a+1) > b' ... '(a+2) > b'

------
bojo
How does this work without exposing the client keys in server memory? You
still need to send the server your private key to query and decrypt data you
specifically didn't make via other connected clients, right?

Edit: I phrased that wrong. How do you usefully query other user's data with
only your keys?

~~~
michwill
Nope, query logic happens on the client, so server doesn't know the key.
Client is the one who records data and indexes encrypted, and who uses indexes
from remote.

~~~
bojo
Which doesn't sound very scalable at all. What is the use case for this?

~~~
michwill
It is scalable enough because you need to download log(index_size) rather than
full index.

The use case - any private data which you'd like to encrypt on the server
while preserving ability to search. Medical records, financial data, emails
are some examples.

Easier to say what this database is _not_ for. Big data applications where you
need to use a significant portion of data in the db (like 10%) in map-reduce
queries is probably not what ZeroDB would be practical for.

~~~
bojo
I appreciate the answers. I'm genuinely curious because I've contemplated such
a database on the side but could never see my way to the goal.

The final outstanding question I have though is: How do you deal with data you
aren't the owner of? Is this strictly targeted at scenarios where you are the
sole owner?

~~~
michwill
Good question. We thought about two cases. One is when your are sole owner of
the data.

The other is when the owner of the data is some group (like JP Chase). In this
case we can have server-side quotas for handling cases when one or several
clients are compromised. So, in this case it's more about distributing trust

------
siscia
I definitely miss something, but...

How it is different from a simple mongodb where I store encrypted documents ?

~~~
michwill
With mongodb, you either pass your key along with query or lose your ability
to search. In our case, we don't pass the key to the server, yet having the
ability to query

~~~
siscia
Can you explain better how ?

And what you mean by "search" ?

~~~
michwill
For example, full text search over your text queries, or range queries for
your numerical data.

The idea is that data in databases is organized in B-Trees, so we traverse
those from a remote client. It is fast enough because usually only logarithmic
fraction of a tree needs to be seen by the client.

~~~
mapgrep
OK but aren't you thus pushing much of the intelligence typically part of the
database server to the database clients?

In this case, the "database server" is basically a dumb remote B-tree server,
right? It's not running queries of any sort, or maintaining indices?

Not putting it down, just clarifying that the structure of this database
system is very different. Or appears so, at least.

~~~
michwill
Yes, you are right. The server stores indexes (which are actually trees), but
the query logic is on clients.

~~~
mapgrep
Very interesting. How much additional data, compared to a conventional fat
server architecture, needs to be sent across the wire with a fat client? The
blog post makes it sound not too bad.

~~~
michwill
Usually it's order of 100-200 KB per query but caching things like the tree
root makes this footprint smaller (e.g. second query is always faster than
first one).

We tested it with pretty ad-hoc parameters (just to check if it's practical at
all!), will soon write an automatic optimizer for them, minimizing query time.

~~~
michwill
HN doesn't allow to reply too deep in the tree of comments, so I continue
here.

ZODB on which we base is ACID-complaint, so it cares about simultaneous
writes. You either don't use cache, or get invalidation requests if you do (so
that you have up-to-date tree if you want to update).

Though, it sounds like the ideal situation is when each user has his own
private data, so there are not so many simultaneous writes into the same tree.

~~~
geoelectric
Well, I think what makes it interesting is that the simultaneous write issues
don't just apply to ZODB, but the indexes (and/or balancing data) inside the
encrypted buckets being maintained by the clients to maintain their own bsts,
per my question above. You can't just put a transaction around one bucket
because they have to, together, form a consistent tree.

That's especially true when you consider that to maintain log n the tree will
have to be a self-balancing tree, so any given modification can touch a larger
number of nodes during rotation.

I might be missing something obvious, but if the client has to maintain the
tree I have a hard time seeing how you wouldn't have to queue access into the
tree for modifying operations. That seems like it could be a pretty
significant issue for some applications.

~~~
michwill
I think, your points are largely correct. You indeed need to lock multiple
buckets when you commit data which could be slow.

My point is that you can tolerate that if your application is something like
gmail. Client A has his own tree saved on the server, not intersecting with
the tree of client B. Client A probably is not going to write from multiple
places simultaneously too often.

But if you have _groups_ of clients writing to the same tree, I think it's
better to have some writing client which handles multiple commits of others.

------
sbt
This is a very exciting project, but since the use cases are security related,
I am hesitant to use a closed source project. Are there any plans to open
source this?

~~~
mwilkison
Yes, we plan on making this open source.

------
bmh100
Do you have any comments on using this technology for (1) health information
and (2) financial information?

~~~
mwilkison
We think it could be very useful in verticals that need to comply with privacy
regulations or store otherwise sensitive data (healthcare/HIPAA,
education/FERPA, finance). We're in the process of exploring specific use
cases, so we'll have more precise examples for you soon.

------
taariqlewis
Nice follow-up ZeroDB. Looking forward to the beta!

------
swswsw
good to see that you are adding more info about how it works. please keep
posting more info. sounds like an interesting technology.

------
jdub
Closed source database? Good luck with your acquisition!

