Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Hypergolix, “programmable Dropbox” with client-side encryption (pyhgx.readthedocs.io)
130 points by nbadg on Jan 12, 2017 | hide | past | favorite | 23 comments



Hi HN, basically a one-man show here. Hypergolix just entered its alpha release, and I thought it would be an excellent opportunity for feedback. The fundamental motivation of the project is for individuals to retain autonomy over information they store on untrusted third-party servers; the hope is that Hypergolix can make "IoT" development easier with client-side encryption than without it.

Some links:

[1] Project website: https://www.hypergolix.com/

[2] Hypergolix source (~36k LoC): https://github.com/Muterra/py_hypergolix

[3] Golix docs (the crypto protocol that powers Hypergolix): https://github.com/Muterra/doc-golix

[4] Golix Python implementation (~5k LoC; needs rewrite): https://github.com/Muterra/py_golix

[5] Hypergolix Slack channel: https://hypergolix.signup.team/


As a fellow distributed software developer (http://firestr.com) great work! Quick question, how are you doing conflict resolution if there is a long term network partition? Does a client have any control over merge conflicts?


Thanks! All mutable objects have an internal monotonic counter. Currently, whichever counter is highest takes precedence, and any decreasing counter is rejected by the server. The "older" (lower counter) copy will receive an error when it tries to push, which will automatically pull in the most recent state. The application can then decide what to do.

That being said, contention issues like this require:

1. the same account 2. to be accessing the same object 3. at the same time 4. from multiple computers 5. one (or more) of which goes offline 6. while both are still producing data

This is the primary reason why support for concurrent instances of the same account is very experimental. All objects are single-author, so if you don't have that concurrent sign-ons, you have no contention.

Conflict resolution will always be an application-level concern. I would like to expose some synchronization primitives (distributed locks, semaphores, etc) for use within accounts, but this is a ways down the road.


Additionally, if you're interested in supporting the project financially, but don't have need for an account, click the red "here" link at the bottom of this page [1] to register with a placeholder account. You can then convert that to a full account at any time.

[1] https://www.hypergolix.com/register.html


How does one go about getting this to work? Clicking "here" brings up a modal asking for my name and address. Entering that populates the fingerprint field in the middle tier, but then it asks for payment info.


Are you trying to run Hypergolix or register it?

Follow this guide to get Hypergolix installed and running:

https://pyhgx.readthedocs.io/en/latest/setup-1-installing.ht...

After you have Hypergolix running (you need to start it once to generate a fingerprint), you can register your fingerprint via

    hypergolix config --register
which will bring up a browser window that will propagate the box with your fingerprint. From there, checkout is powered by stripe.

Does that answer your question? If not, can you clarify a bit what you're asking?


How is locating of the other end done? Do you have a DHT for this? LAN discovery? Or are we basically relying on your server to stay online?

Not a lot of notes about internals like this on the site - as an end-user developer it looks very good, so I'm sure someone will use it, but as a small time operations guy I worry about it.


Because the system is asynchronous, you have to have a persistence server somewhere -- think email, not P2P. Since everything needs that server to work anyway, it's doing double-duty as a relay server. Each endpoint pubs/subs, and the mutual server handles the rest. So for example, when I was monitoring my home server from my flight over the holidays, all traffic was passing through hgx.hypergolix.com.

But it's specifically designed to use as many relay servers as you'd like, at the same time. So if you're worried about uptime, you can run your own servers. You do that like this:

    hypergolix config --addhost HOSTNAME PORT TLS
So, when I'm at home, my laptop will get updates over my LAN home server, in addition to hgx.hypergolix.com. Not only is this more reliable, it also reduces the receiver latency (sending latency is unaffected, because it's still pushing upstream to both servers).

LAN discovery (of both services and actual users) is planned but not currently supported; there are a whole host of P2P operations that Hypergolix is very well-suited for, but that haven't yet been implemented due to time constraints.


the first application you can provide (maybe as the project sample app) is to observe one file and sync it when changed.

most people I know that uses dropbox use it with expensive media workflows that are extremely slow to adopt anything.


The idea isn't to replace Dropbox. Hypergolix doesn't sync files, it syncs objects. That might seem like a small distinction, but when you're writing application code, it makes a big difference. For example, the second half of the sample app (which hasn't been written up yet, but has source on github [1]) uses a different object to remotely control the logging frequency on the server.

[1] https://github.com/Muterra/py_hypergolix_demos/blob/master/t...


I got that. but the first thing I can think of is to have a client that syncs files :)

even applications that syncs files already treat them as "objects" as you need decisions on which side has a more up to date version for conflict resolution and such.


Am I right in reading that everything is free, except if you want the server to store your data? I.e., if I store data on my home server, I can use this for free?

Does your server have access to any plaintext?


Correct. This is yet another "hosted platform for revenue" open source project; if you deploy on your own servers, you don't pay us.

Servers have no access to plaintext. They also have extremely limited metadata: only the "author" [1] of the data and its ciphertext length is known.

[1] Technically not the author but the "binder", which is a specific term used in the protocol, but we're getting a little deep into the weeds. See here for more info about binders: https://github.com/Muterra/doc-golix/blob/master/whitepaper....


That's pretty cool, thanks. So I don't need to worry about deploying MQTT servers and authenticating between them, I can just use this.

Can I suggest this alternative API:

obj = hgxlink.new_threadsafe(cls=hgx.JsonProxy, state='Hello world!')

obj.share_threadsafe(bob)

becomes:

bob.send({"some": "serializable object"})


The API is definitely more cumbersome than I would prefer.

I really like this idea:

> bob.send(obj)

However, I don't think object creation and sending will ever be combined into the same operation, because:

+ objects don't need to be shared (imagine using an object to track application settings; you don't want to send that to Bob but you want it to persist across sessions)

+ objects can be shared with more than just Bob (and we'd like it to be the same object!)

So hopefully, in the future the API will look more like:

> obj = hgx.JsonProxy('hello world')

> bob.share(obj)

Unfortunately because of the async/await syntax, this gets a little complicated to implement. But it's definitely on the horizon.


Oh, I see, so not everything is 1:1. That makes sense then, thanks.


That looks great! How are you encrypting communications between nodes?


With a purpose-built protocol called Golix [1]. The documentation goes into a lot more detail but it has three main aspects:

1. Encrypt things like PGP, except key encapsulation is separate from ciphertext delivery. Specific primitives used are AES-256, RSA-4096, and X25519, though deprecation of RSA is planned soon

2. Everything is content/hash addressed, which helps substantially with the above. Specific primitive: SHA-512

3. Data retention is governed like a reference-counted programming language; data gets a container, and then you make a signed "binding" to give the container an address. You can then sign binding revocations ("debindings"). When no addresses are left, the server removes the content.

[1] https://github.com/Muterra/doc-golix


One thing I didn't get from the docs is some clearer explanation justifying what at first glance seems like a great deal of bespoke, custom crypto. Just browsing through the source I'm immediately hit with talk of running out of entropy, use of deprecated implementations, use of low-level functionality from a library that actually tries to provide safer, higher-level constructs, etc.

Maybe this is all necessary. But it isn't at all obvious why.


Thanks for the feedback. Can I ask which parts of the documentation you were looking at?


Skimmed the security paper and a bit of the protocol lib source.


Sounds interesting...the use-cases are not clicking in my head just right...So I'll dive in and rummage around some more.


Nice work, keep it up and thanks for sharing!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: