Hacker News new | comments | show | ask | jobs | submit login
Zbox – Zero-knowledge, privacy-focused embeddable file system written in Rust (github.com)
112 points by gbrown_ 10 months ago | hide | past | web | favorite | 39 comments

I will be a bit harsh with the project since I understand it aims to be a technology to be used in production and not just a pet project... (not "Show HN" tag was present.)

I only see here yet another key value store with an application programming interfaces that resembles the typical filesystem operations, but it is only accessible within a programming framework.

I know it is not finished but the flaws I see at the moment are completely fundamental: it aims to be general purpose, not unix compliant, not mountable, not language or truly operating system agnostic and in conclusion does not permit leverage existing filesystem tools such as mount/find/grep/sshfs etc... not to speak about support for multiple processes, users, time validations...

I wish you all the luck in the world as pet project (really! especially since filesystems are quite hard to get mature and stable) but I hope it never succeeds as a serious alternative to filesystems, because that could be worse than mongodb to rdbms.

Now trying to be constructive: there is value in the concept of runtime embeddable file systems. Why don't reframe the project as a key-value store which is embeddable within applications and mountable? (for example via fuse so you can even mount it mac/linux/others). Even if it is mountable only as RO that could bring several advantages over existing key-value-stores and advanced filesystems which are not embeddable.

Again, impressive work, good luck.

EDIT: the way you store data is also a great discussion topic... who is interested in so many possible ways to store the information anyway? that mostly depends on the requirements of the consuming application... for example if you target users which want to have a file-system-alike embedded runtime, why they would like to use this project if they have access to advanced unix compliant filesystems which support encryption and RAID and advanced tools out of the box?

I'm also surprised that there's no Fuse interface, it's the first thing I expected while reading the title.

For single process crypto storage I'd sooner use a crypto shim for SQLite, I don't know if there's any good one available out there (besides the one sold by the SQLite devs).

You can use SQLCipher [1] with libsqlfs [2], which has a FUSE interface.

1. https://github.com/sqlcipher/sqlcipher

2. https://github.com/guardianproject/libsqlfs

Don't be surprised as this no Fuse support is by design. :-)

Zbox author here. Woo, I didn't even prepared for a HN debut. :-) Anyway thanks for your comments.

One of the major goals of this project is to keep app files private, which means exposure surface must be as minimal as possible. To this end, it intentionally does not support mount/FUSE or other mechanisms which can 'share' access across processes. Zbox doesn't allow multiple processes access, even those processes are under same user account.

For your question why not make it an embeddable key-value store. I think, IMHO, file store is more generic. We have VERY long history using file system, it is the base of every OS, it has solid API, every language provide similar access interface, everybody familiar with how to talk with file system. For Zbox it also provides similar API to Rust's file system API, that will make most developers much easier to adopt it.

I don't really understand what you mean, for me what you're saying is a contradiction:

>We have VERY long history using file system, it is the base of every OS, it has solid API, every language provide similar access interface, everybody familiar with how to talk with file system.

I agree, but then you don't really do that, do you? Merely mimicking the Rust fs API is not enough to call your project a "filesystem" in my opinion. It's actually probably the least of my concerns when it comes to using a filesystem.

On the other hand not being able to use the familiar "cd", "grep", "cp", "mv" and "cat" the contents of this so-called filesystem means that it's really not that familiar at all.

I'm not sure I see the point of the "minimal attack surface", you can create private fuse mounts that can't be accessed by other UIDs. And if you're worried about same-UID compromised programs snooping around then it's game over anyway. That smells a bit like FUD to me.

IMO, "cd", "grep", "cp" and etc. are provided by shell and os, not by file system. They are just human interface to help you interact with the underlying file system. Can you call "cd" on an ext4 file system directly? No. The shell will invoke "cd" or similar process and that process will do the system call to kernel, and then kernel will call the ext4 kernel module through VFS.


For Zbox, it is just like a kernel module but runs inside application memory space. To interact with it, you have to use a set of 'standard' API, in kernel that is VFS, in Zbox that is Rust's fs API (not exact same, but similar).

Do you often interact with an ext4 file system using the Linux VFS? I'm not sure it helps your "familiarity" argument. It's not just about familiarity either, it's also interoperability. Tons of 3rd party applications know how to interact with real filesystems using the standard POSIX "open", "stat", "unlink" etc... You can make incremental backups using rsync, you can use a choice of many browsers to explore its contents, you can use logrotate, you can use tail -f to monitor data appended to a file etc...

IMO if you can't do this you don't have a filesystem, you have a database with filesystem semantics.

Anyway, by now anybody reading this discussion will have made up their mind so let's leave it at that. And good luck with your project nonetheless, semantics aside it's rather impressive.

Before an entire bikeshed discussion arises over the misused term "zero-knowledge", know that the author is sensitive to this issue and will likely correct it soon [1].

[1] https://github.com/zboxfs/zbox/issues/4

Seems to me the bigger issue is calling it a file system. As far as I can tell it is nothing of the sort.

So it's not a file system and not zero-knowledge :-)

Or perhaps it is zero-knowledge, but about file systems. ;)

Well there I was, assuming it wouldn't provide your file but just prove that it could.


>The reason why I am using it is mainly for marketing purpose.

>I am going to change the term to zero leakage, what do you guys think?

The author may change it to Uber Super Duper The Best of The Filesystems in the entire Observable and the non observable parallel universes, makes more sense.

Great project, I applaud the effort! Some curious questions:

- Where exactly does "zero-knowledge" play a role here? Does this provide steganography? Or, what is meant?

- Why do they use libsodium (excellent choice, but C) instead of a native Rust crypto library?

- Does this only work with Rust applications?

Thanks for your praise. To answer your questions:

- Where exactly does "zero-knowledge" play a role here? Does this provide steganography? Or, what is meant?

As I said in https://github.com/zboxfs/zbox/issues/4, this term was originally for marketing purpose. It means no data or metadata can be leaked to underlying storage, so for an outsider there no knowledge about its content. But it turns out many people think it is misleading to zero knowledge proof in cryptography, so I have already changed it "zero-details".

- Why do they use libsodium (excellent choice, but C) instead of a native Rust crypto library?

libsodium is a great library, it is fast, sophisticated, mature and has some advanced crypto features I like, such as Argon2 hash and AES nonce extension. I know there are some Rust crypto libraries are great, but overall it is still young and need improvement.

- Does this only work with Rust applications?

Yes, for now only Rust. But I am considering add FFI interface so it can be used by other languages.

Does Rust have official implementations of Crypto algorithms (in pure Rust)?

I am not talking about some github repo in some dudes repository which contributes to it in his spare time.

I'm not sure what you mean by "official" exactly, but I think for any of them, the answer would be "no".

That said, https://crates.io/crates/ring is what most people suggest you should reach for: it's a slow and steady port of BoringSSL's code to Rust + asm. Of course, that only has a subset of crypto; there's other great projects too, the tor people have been putting out crates like https://github.com/isislovecruft/curve25519-dalek

Is there something like "ring" for libsodium instead of BoringSSL?

Also, does it really make sense to create a separate crate for every crypto primitive (or small set of primitives)? This has so many obvious disadvantages compared to the "libsodium approach".

I think it has pros and cons, honestly.

I'm not aware of anyone porting libsodium, though it's also not my area of specialty.

Of course maintaining some kind of "libsodium for rust" is more work, but on the other hand, a central project for rust crypto stuff is highly desirable: Even if initially run by mere crypto enthusiasts, such a central place is needed for the real crypto experts to condensate. Then, obsolete crypto can centally be deprecated and put with a warning, and competing implementations can be resolved at a central place getting (hopefully) all knowing people together to discuss.

What's the alternative? That each individual developer has to pick and combine their favourite set of crypto libs, making crypto libs spread by popularity (which is more often than not a self-fulfilling prophecy) rather than actual quality.

Tweetnacl (http://tweetnacl.cr.yp.to/) would be a good candidate for porting to Rust, since it's very small. Others have done it successfully in other languages: https://github.com/dchest/tweetnacl-js

Someone has given it an attempt: https://github.com/jmesmon/sodalite But the port doesn't seem to be too mature yet.

(Edit: For those not aware, Libsodium is based on a portable rewrite of NaCl.)

I don't see how these two things are at odds; said "central project" could author individual crates? They could even live in the same project, thanks to workspaces.

>> some github repo in some dudes repository which contributes to it in his spare time

This is a little too close to how some real official crypto libraries are maintained. OpenSSL for example.

Apparently "Zero Knowledge" is used in the more modern way; it's encrypted with a password.

That is not exactly what "zero knowledge" means, even in modern times...

Why do cryptographers insist that "zero knowledge" on its own, WITHOUT (!!!) the words "proof" or "protocol", can't be used as a generic term to mean whatever?

Nobody suggests that you cannot make statements such as, "The President has zero knowledge on every important issue." However, in the context of cryptography, privacy, or security generally, yes there is a need to avoid muddying the waters by appropriating terms with specific technical meaning. We need to have a common agreement about what words mean before we have discuss anything.

Spideroak disagrees.

To be fair, they use the term "no knowledge", not "zero knowledge": https://spideroak.com/no-knowledge/

That's what they now use, previously they did use zero knowledge.

I'm aware, but they were using it wrong, which I was implying with my comment.

Looks interesting - on my checklist for serious testing.

Currently using cryptomator[0] for encrypting local directories under Linux. Works very well with Nemo (File Manager), browsers (I keep local html pages here..), gedit. Some apps don't like accessing the DAV-based links (geany), while others generally work will enough that I have long-term confidence (ZIM Desktop Wiki = issues only with moving pages around within ZIM itself).

[0] https://cryptomator.org/

Perhaps there are use cases where this would make sense (mobile?), but for a server-side application, wouldn't encrypting the filesystem be better?

I'm just imagining an operator trying to debug the system only to find out that he can't read any of the files from the terminal and will have to build custom programs using this library to operate with the data.

And it seems like there are API abstraction issues too... Imagine you use a library that writes some state to the file system using the standard lib. None of that data would end up encrypted because it didn't go through this library.

Whereas using the operating system's filesystem allows transparent compression and encryption without having to change your code at all.

I could see it being used alongside Intel SGE for DRM or security reasons, in situations where the OS and system are considered hostile adversaries, but you still need extensive mutable external storage outside the application binary.

I wonder how can this coexist with docker. Docker volumes maybe could be encrypted?

Can it run in the browser and store files on IndexedDB?

Not yet, but could be possible if you can compile it into WebAssembly.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact