Hacker News new | past | comments | ask | show | jobs | submit login
Gun – Distributed, embedded graph engine (marknadal.com)
108 points by marknadal on Apr 14, 2014 | hide | past | web | favorite | 51 comments

This actually sounds like a pretty cool and useful project, but I had to force myself to overcome the site, which I think does a poor job of marketing.

The headline is a bit confusing -- it's not obvious that something can be "embedded" and "distributed", let alone "massively." You might rewrite it to tell me what the project does, not what it is.

Besides that, as a developer, what I want to know is:

- Why should I use it? (Instant in-browser storage synced to a distributed server network)

- How is it implemented? (it seems to be a javascript project, but that's only mentioned once)

- How do I run it? (It run daemons on servers and stores the data in S3)

- What do I have to do to configure it? (Seems to me like "The granularity and frequency of these snapshots can be tweaked by you" means "Prepare to get elbow-deep into telling gun how to shard")

If you just take the answers to these questions and make them either bullet points up above the video, or headers for your marketing essay, it would help people like me digest the project enough to be interested and read more.

Finally, while I'm being a grinch, I'd adjust your styles just slightly to make the site more readable. At the very least:

- Make `#main` narrower; perhaps 700px or even less (long lines are hard to read)

- In `body`, apply a line-height of maybe 20pt (the lines seem too close together for the font size)

- Also in `body`, drop the text-shadow. Shadow on white tends to just make things look blurry.

All that said, I am interested in learning more. A self-hosted Firebase would be a real boon for many projects.

Great suggestions! Already fixed one of the things, and working on updating the others! (Although content will take a while longer). Thank you so much.

This title is bad. It's not an embedded database engine. It's not a database at all. The author mentions this directly in the article (How are you different from every other database that is trying to reinvent the wheel? 1. Because gun is not a database (NoDB), it is a persisted distributed cache.)

There are other possible issues, but this short summary doesn't give nearly enough information about the system to make a possible determination on its efficacy. I will be interested to see further work.

Exactly. "...this isn't a database" :/

Distributed caches aren't new either. [e.g. https://github.com/golang/groupcache ]

Can we get a title change to something that labels it a Distributed Cache? [e.g. Shown HN: gun – Massively Distributed Cache ]

opendais, I'm curious if these other distributed caches actually support persistence? I wasn't able to find such a system. But I'm limited in my research, I'd love to learn more about other people's work! Please share more!

Reminds me of running pouchdb in the browser and couchdb on the servers. Except for the automatic conflict resolution bit, which of course is where the real magic is.

> It gets data synchronization and conflict resolution right > from the beginning, so it never has to rely on vulnerable > leader election or consensus locking

Super curious about how this works.

Just as a note, conflict resolution is fairly close to automatic in PouchDB, its a matter of listening to the changes feed and providing a function which combines conflicts into a seperate document, in future versions we will make this a lot more seamless.

That aside, reading this it does sound similiar to PouchDB, we are also working on automated provisioning of servers (https://github.com/daleharvey/janus). Excited that more people are working on and becoming aware of similiar infrastructure.

People are demanding answers to this the most, so it'll probably be my first follow-up post.

http://ehcache.org/ http://ehcache.org/documentation/configuration/fast-restart#...

Based on a quick googling, this looks like it would be an example of that. Frankly, I don't normally want my cache to persist so it isn't something I know much about. [e.g. I store persistent data in Cassandra or MySQL]

You hit on a really good point, in the several years that I've worked in the synchronization field (I do consulting) I've found that it has actually been harder to deal with /what you don't want to sync/ than with what you want to sync. It would be really bad if our temporary variables that are used on the client and server to do transformations on our actual data got somehow jumbled up into our real data.

I have some clean delineations for this, but wow - I didn't know that this was going to explode so quickly. If I had known, I would have prepared some more resources and materials to explain this stuff. That said, everybody who signs up on the mailing list will get notified about the posts I will write on this! I'm looking forward to getting some serious counter-arguments/refutations to challenge gun.

I would like to hear about the way you address the semantics of the data for merging. For example if the number of items in stock has a conflict because one node reports a 500 -> 1000 change and another node reports a 500 -> 400 change, then I probably want the result to be 900. If the same conflict occurs for the budget of something, then I probably want either 400 or 1000 but not 900.

I'm not entirely familiar with your example cases, but the seem to be indicative of something where you would want strong consistency. By default, gun is only eventually consistent - but this does not mean it is incapable of strong consistency, it just means you have to write/use a plugin that abstracts this on top. Obviously such a merge would be slow, because you are giving up high availability while you sit locked waiting for ACKs from other nodes. Could you expound on your example problems more so I can address them more thoroughly?

Point being, that is the beauty of eventually consistent systems, you can always combine the together to create stronger forms of consistency - but you can't go the other way. If you start with a strongly consistent system, you can't downgrade. With gun, strong consistency is not given to you out of the box - but it is capable of being built on top when needed.

Also GridGain and my personal favorite, Hazelcast.

Hey _dark_matter_,

Thanks for the comment! I'll be doing follow up posts soon on the technical details of my conflict resolution algorithms, and will be preparing Jepsen tests for gun. I just wanted to formally announce this now, though, as I have a tendency to overly-perfect stuff, and getting feedback early on is great. Mind listing things you would like me write in-depth posts on? Thanks for the feedback!

Hey Mark, I'm interested in several different features.

First off, you make a great point about "Hosting your own database is a pain" and/or expensive. I'm sure you know about the raging debate that has been happening between MapReduce and Parallel DBs, and a big part of the reason that MapReduce (Hadoop) style has had so much support is because it's easy and cheap. In lots of instances it may run slower, but why pay for the hassle when you can just run Hadoop for free (not including server costs)? You're making the same point here. Whats the threshold on money savings? How much does it cost to use a DaaS (SQL or NoSQL) compared to what you want to do with Gun/Redis? (Also, is this going to be open source?)

Next, the conflict resolution should be interesting. What kind of eventual consistency guarantees are you hoping to have? I see you're planning on addressing this.

I'm also interested in the querying. That is, are you using Javascript? My problem with what you're saying about current query languages is that it's not really that hard to get the bare usage out of them. Simple queries in SQL are extremely straightforward. In addition, declarative languages really make things easier for the programmer, not harder. Who wants to have to deal with all the nuances of a join? Certainly not me.

Realistically I won't have cost-comparisons for a while, as this project is just in the infancy and obviously needs to stabilize/mature first. Although as a quick summary, your costs should approximately be server(s) that are expensive enough to fit your smallest set of per-user active-data, plus your S3 storage amount, plus S3 API calls (I've made this part a simple options parameter, so you control how often these calls are being made - the more frequent, the more integrity from worst-case disruptions but also more expensive, less frequent and your costs go down).

What is worst-case disruption? Everything goes offline simultaneously. User's don't have localstorage fallbacks in their phone/browser so retries rely on the server cache. The server cache is running the default method, not Redis - and then your server crashes or the machine goes under. And finally, of course, there is an S3 outage and you only persisted to 1 region, not multiple regions.

I'll get into hairy situations like that in some of my following posts, but too much for here.

Yes, you have another good point about query languages. I agree with you very much - somebody else commented about this, check out my reply to muaddirac. Please keep in touch with more questions/comments, or even email me!

I found the header image a bit disconcerting, and immediately hit Back (I'm at work).

Pictures of guns are offensive and dangerous at work now?


Heck, I used to work in an office where most people had similar weapons on them at all times.

Offensive may be too strong a word for how I felt about it, but I think unnecessarily aggressive would be accurate.

Even if you don't consider it offensive I think it is a really bad choice for a kind of logo.

sorry :/ I'm trying to intentionally convey that this tool is dangerously powerful. I don't mean to offend anyone though.

And here I thought you were showing your good taste in customized 1911 variants... You might want to consider a name that reflects the distributed nature over the dangerous power because of reactions like the parent (although I strongly disagree with the sentiment expressed).

Seriously though, this sounds cool and I'm looking forward to trying it out once you release code.

:P I'm coming to admire guns a lot more because of this library - respect!

I'm not in the "guns are offensive" line of thought but I think it could have a better name than an existing noun.

(Although CloudFlare has already taken "railgun"...)

Looks interesting!

> you never have to learn some silly separate query language again. A query language which just attempts to be some DSL to RPC another machine into doing the same query you could have already written in half the time it took to learn the query language.

Not sure I agree with this sentiment - most programming languages aren't declarative like query languages are, and that seems especially useful for, well, querying.

This is true. The neat thing about the modular design I have for gun is that people can always write a plugin that receives some/any query language, and then translates it directly into the appropriate algorithms. So you should be able to write your own abstractions ontop - but this is only possible because you are able to write the direct queries underneath.

I'm withholding my judgment on the tech - it all seems a little too good to be true - but the copy on your page is great, I actually read every line of it which rarely ever happens when I visit a technical page.

Hey, I'm confused by the statement, "No amount of leader election and consensus algorithms can patch this without facing an unjustified amount of complexity. Gun resolves all this by biting the bullet - it solves the hard problems first, not last. It gets data synchronization and conflict resolution right from the beginning, so it never has to rely on vulnerable leader election or consensus locking".

My question is, how do you solve data synchronization and conflict resolution, without using the techniques that do that, i.e. leader election, or some other type of consensus algorithm?

Gun's conflict resolution algorithm is deterministic, meaning that it will choose the same answer on every peer without having to communicate with others which value it chose. This even works when the ordering of incoming updates is switched depending upon which server was the "I" sending updates out to "you" (pronouns, analogously, are inverse of each other depending upon which person is doing the talking). Uh, this is not clear/sounds confusing, I'll explain it better in my post that goes over how the resolution algorithm works. (Basically the gist is that sometimes for two servers to agree on the "same" answer they both have to have an algorithm which results in an inverse condition from what the other would answer - similar to how quantum-entangled particles have opposite spins of each other to "balance out")

Point being, this makes gun truly peer to peer, because it behaves correctly if its the only one running, or if there are numerous guns interconnected with each other. No leader election, no consensus algorithm - you don't need those, because the system agrees as soon as the update are received because it resolves them immediately with an idempotent algorithm. Make sense? More details on this soon.

Then I look forward to your next post. Can you give me a preview on what type of failure model Gun can support? Consensus is easy assuming there are no faults, but if faults are possible, then leadership election, paxos, whatever, something is needed.

First let me go over the levels of redundancy:

1. In memory in the browser tab's process.

2. If available, in the browser's localstorage or fallback.

3. In the server process's memory.

4. If available, in Redis on the server.

5. If in a multi-machine setup, any other connected server that is subscribed to that data set, being in memory (3) or in Redis (4) if available.

6. If configured, in a machine log on S3.

7. Persisted to S3, which replicates and shards it for you internally.

8. If configured, in a revision file on S3.

9. If configured, in a multi-region S3 setup, redundantly in many places.

(2) is not cleared till an acknowledgment that (7) is confirmed. (1) is not cleared until an acknowledgement that (7) is confirmed or if the tab is exited. In the case of (7) it is no longer the delta/diff, but a snapshot of that current data set with that delta/diff's update. Retries from (1) ~ (5) will happen at various events, if the confirmations are not satisfied. If a conflict has already occurred by (3) the acknowledgement from (5) will include a notification that the value has already been updated, along with the standard delta/diff of that conflicting update being sent down. Meaning (5) does not guarantee that your delta/diff has "won", only that it has been saved or is already outdated.

Worst case condition is that (2, 4, 5, 6, 8, 9) are turned off, in which your user's data is as volatile as them preemptively leaving the page (although I suppose you could use an onbeforeunload to warn them) - however this behavior is the current norm for most http post based forms and apps. Actually, pardon me, worst case condition is that everything is offline simultaneously, however this is not really interesting because then users won't even be able to access your app in the first place.

Please correct me if I have my terms wrong:

Fail stop - since gun runs in your application process, any bugs or errors in gun should result in a standard error being thrown. In the case of javascript, your process will crash. Generally speaking you are responsible for the liveliness of your app uptime, however if you use some of my other existing open source libraries, they will respawn the process for you. When your app restarts, so will gun.

Receive omission - as mentioned before, the peer that originated the message will attempt to retry messages until a confirmation is given. So even if a process somehow does not receive a message, it eventually will, unless the origin gives up.

Send omission - this is a bit trickier, gun tries its best to keep user changes by all means possible until a confirmation is received. If for some reason gun is unable to do this, or the user's browser starts going haywire and doing weird things... you have no way to know and neither does gun. At this level, something is fundamentally wrong with the runtime or the OS, and gun has no way to check this.

Arbitrary - malicious attacks are best done from server peers, however this would require the malicious node to know your key. Currently the event of this happening may be decently high, as all it would require is for an attacker to take over an IP that another one of your gun node might connect to - the connecting node will issue the key as proof of being not malicious. I definitely could use some help on figuring out a more secure, yet still fully decentralized means of authentication and trustworthiness of server nodes. In the instance of byzantine mistakes, this will only effect the system if the message actually complies with the spec - in which case it does, the message is indistinguishable from a real request, therefore will be treated as one. And finally, what about an intentionally malicious user? The way gun's conflict resolution algorithms work makes it very difficult to be abused in any way that is actually meaningful to the malicious user, but I'm saving the details of this for a real post. That being said, such abuse will happen and I can already describe the results for you - "annoying" and "spammy" - luckily gun as filters built into the piping channels, so if you as the developer notice any particularly fishy behavior, you can always block that user out.

Recovery - when gun starts (or restarts) it goes about its usual business of checking its cache (if Redis is available) and retrying whatever is there that had failed to be cleared, listening to messages which may be retries of things that had failed to be cached (especially in the case Redis isn't enabled), and intelligently pulling things back in from S3. When any data set is loaded into gun it will also subscribe to that data set so it can be notified of updates from any other node, gun will also merge (using the same conflict resolution algorithms) the data set it got from S3 with any other node that also has that data set, and even replay some persisted logs if available. This way it boots itself back up into the most valid and live operating status that it can.

Partitions - there are 9 different layers, of which communication and networking can all fail or go out. That is about 9 factorial amount of combinations that can, could, and probably will go wrong (although this is an over estimation since not all the layers require networking). I simply can't cover them in this comment, but will go over some of the most concerning and common cases in a separate post - I also plan on building a partition simulator, so that way people can play with causing these issues, and so that I can do testing against them. That being said, gun as a strong commitment to handling these things, since it can afford to (since gun is eventually consistent, we have time to make up for networking problems). I hope this comment was helpful in what I did go over - I'm not sure if you'll even wind up seeing it (please notify me that you did! Even if that is just an upvote) I'll probably reuse a lot of this in one of my actual posts.

Thanks again! Anything else?

Nice idea which mirrors a lot of my current thinking; far more details are needed.

For example: "All conflict resolution happens locally in each peer using a deterministic algorithm." Hrrrmm.... if the model is to use CRDTs for conflict free resolution I can believe it; if it's timestamps, for instance, I'm much more doubtful.

noelwelsh, thanks for the comment! Timestamps do fail pretty miserably for handling conflict resolution - while my algorithm does use timestamps, the timestamp is hardly responsible for the actual synchronization. It just provides a basis to initially sort things off of, and then the conflict resolution kicks in. Obviously I'm going to need to do a pretty detailed post on how this works, cause unless it can be battled-tested and skeptically investigated by others, it is only as good as snake oil. So more on this soon! Thanks for bringing it up.

Conflict resolution is the main question.

How would gun handle user auth and data that can only be seen by the authenticated user?

So I'm making sure that the concepts of piping and transforms are built right into the API, so you'll be able to easily filter out data by whitelists or blacklists as it is getting pushed down to your users. I'll also be writing plugins to gun that will allow people who like ORMs to just attach one on, and the filtering and validation will be handled for them. Does this answer your question?

Yes conceptually it does, I would be interested to see how it is implemented. Thanks!

This reminds me a little of Meteor and a bit more of ShareJS. And it reminds me of a few other things.

As far as transformations and conflict resolution, I am wondering, are you using something like operational transformations?

I had to develop my own synchronization algorithms after doing a lot of research on OT for a couple of reasons. One is that Google sometimes relies on being a centralized authority (Google's server) to resolve some conflicts, especially for collaborative rich text editing (Neil Fraser, a sync genius that Google hired, has a great talk on his own type of implementation "diff-match-patch", although Google uses different algorithms now I believe). Another is that OT sometimes requires you writing your own transformation commands, and I wanted something that was generally applicable to most data types without forcing developers to write their own. (It has been a while since I did a lot of this reading, so this comment may not necessarily be accurate/true, please somebody correct me!)

Instead, I developed my own method, called "Analytical Fluctuation" - but I haven't written papers on it yet, but need to soon! One requirement is that it has to be truly peer to peer and cannot rely (even occasionally) on some centralized server or leader election setup. Another requirement is that it has to be latent proof, which for me means "preparing for the future" when people need to collaborate on documents not only from the other side of the world but also from Mars. This is where Neil Fraser says his system breaks down, because if there is too much latency, the patches stop applying idempotently. Again, more on this in upcoming posts.

So CAP: which does gun throw off the island? Based on this statement ...

> It bridges the distance with a realtime connection, so updates propagate at the speed of the raw pipes linking them.

... I'm assuming it's the P.

CA system's don't exist so it must be CP or AP.

  However, further consideration shows that CA is not really a 
  coherent option because a system that is not Partition-
  tolerant will, by definition, be forced to give up 
  Consistency or Availability during a partition.

It could also be neither, which is a popular choice these days.

Out of the box gun only guarantees you eventual consistency, not strong consistency. The reason why is because the majority of web applications out there today are manipulating data sets that aren't particularly significant or regulated (tweets compared to medical records, profile updates compared to payment information). As a result, gun opts for high availability so response times are fast for users, and assuring their data won't be lost.

However, the strength of this design is it allows you to implement stronger forms of consistency on top - which is necessary for those occasional pieces of data (payments, medical records, etc). But if you do this then the availability of that data diminishes because servers must wait for consistency to be achieved before doing any further transformations. The sweet spot about this is that it will only be the availability of that particular property (or properties that it depends upon) that decreases, all other pieces of information will still be snappy fast because gun is capable of locking at a granular level.

This will not be as "easy" as it is in databases that are strongly consistent to begin with, but those databases aren't capable of being eventually consistent either (because that would violate the DB's principles). But it is possible, and you the developer get the control and flexibility over that. So you win some and you lose some, but I believe this setup is highly optimized because generally speaking the things that you need to be "strongly consistent" anyways have much slower "weakest links" than the speed of networks being up or down - such as approval by some human overseer (a loan officer, a lawyer, a doctor, etc.), and meatspace brains take much longer to process regulatory issues than the potentially couple of hours the network might be down. I'll touch more on these quirks in a separate post after I address conflict resolution.

"Because face it, any sufficiently capable query language has to be Turing complete" - No, no it doesn't. Stock SQL is not Turing complete. There are extensions that support recursion to make it so, but it is perfectly possible and capable of doing everything you could reasonably want without it.

Caching is hard. Consistency is hard. Peer to peer is hard. I hope the author is addressing these in a sane, verifiable way. I'm really curious to see the demo apps that come out of this.

I apologize, I wasn't saying every query language /is/ Turing complete, just that to do exponentially more complicated tasks in a single atomic query... the lines start to blur between the two.

Yes, these are hard subjects - especially things like cache invalidation. You raise good points, and I hope to answer them in the follow up posts I write - I expect your's and other's good eye to check my work. I'll let people know when I've written them. Demos are coming too!

Regardless of the title being confusing or not, this seems pretty interesting! Would like to see how it performs under some heavy load / multiple sources of data.

Not everyone is familiar with the concept of NoDB so this paper [1] is a good read to start with.

[1] http://bit.ly/RjBl9S

What license is the code distributed under? It is not clear from the site or the code that is available through npm.

I have been looking for something just like this! Fantastic

The name "gun": a) has negative connotations, b) is ungoogleable, c) can't even write about it without being ungrammatical and d) will be filtered out at some workplaces.

Dumb, de-dumb dumb dumb.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact