The headline is a bit confusing -- it's not obvious that something can be "embedded" and "distributed", let alone "massively." You might rewrite it to tell me what the project does, not what it is.
Besides that, as a developer, what I want to know is:
- Why should I use it? (Instant in-browser storage synced to a distributed server network)
- How do I run it? (It run daemons on servers and stores the data in S3)
- What do I have to do to configure it? (Seems to me like "The granularity and frequency of these snapshots can be tweaked by you" means "Prepare to get elbow-deep into telling gun how to shard")
If you just take the answers to these questions and make them either bullet points up above the video, or headers for your marketing essay, it would help people like me digest the project enough to be interested and read more.
Finally, while I'm being a grinch, I'd adjust your styles just slightly to make the site more readable. At the very least:
- Make `#main` narrower; perhaps 700px or even less (long lines are hard to read)
- In `body`, apply a line-height of maybe 20pt (the lines seem too close together for the font size)
- Also in `body`, drop the text-shadow. Shadow on white tends to just make things look blurry.
All that said, I am interested in learning more. A self-hosted Firebase would be a real boon for many projects.
There are other possible issues, but this short summary doesn't give nearly enough information about the system to make a possible determination on its efficacy. I will be interested to see further work.
Distributed caches aren't new either. [e.g. https://github.com/golang/groupcache ]
Can we get a title change to something that labels it a Distributed Cache? [e.g. Shown HN: gun – Massively Distributed Cache ]
> It gets data synchronization and conflict resolution right
> from the beginning, so it never has to rely on vulnerable
> leader election or consensus locking
Super curious about how this works.
That aside, reading this it does sound similiar to PouchDB, we are also working on automated provisioning of servers (https://github.com/daleharvey/janus). Excited that more people are working on and becoming aware of similiar infrastructure.
Based on a quick googling, this looks like it would be an example of that. Frankly, I don't normally want my cache to persist so it isn't something I know much about. [e.g. I store persistent data in Cassandra or MySQL]
I have some clean delineations for this, but wow - I didn't know that this was going to explode so quickly. If I had known, I would have prepared some more resources and materials to explain this stuff. That said, everybody who signs up on the mailing list will get notified about the posts I will write on this! I'm looking forward to getting some serious counter-arguments/refutations to challenge gun.
Point being, that is the beauty of eventually consistent systems, you can always combine the together to create stronger forms of consistency - but you can't go the other way. If you start with a strongly consistent system, you can't downgrade. With gun, strong consistency is not given to you out of the box - but it is capable of being built on top when needed.
Thanks for the comment! I'll be doing follow up posts soon on the technical details of my conflict resolution algorithms, and will be preparing Jepsen tests for gun. I just wanted to formally announce this now, though, as I have a tendency to overly-perfect stuff, and getting feedback early on is great. Mind listing things you would like me write in-depth posts on? Thanks for the feedback!
First off, you make a great point about "Hosting your own database is a pain" and/or expensive. I'm sure you know about the raging debate that has been happening between MapReduce and Parallel DBs, and a big part of the reason that MapReduce (Hadoop) style has had so much support is because it's easy and cheap. In lots of instances it may run slower, but why pay for the hassle when you can just run Hadoop for free (not including server costs)? You're making the same point here. Whats the threshold on money savings? How much does it cost to use a DaaS (SQL or NoSQL) compared to what you want to do with Gun/Redis? (Also, is this going to be open source?)
Next, the conflict resolution should be interesting. What kind of eventual consistency guarantees are you hoping to have? I see you're planning on addressing this.
What is worst-case disruption? Everything goes offline simultaneously. User's don't have localstorage fallbacks in their phone/browser so retries rely on the server cache. The server cache is running the default method, not Redis - and then your server crashes or the machine goes under. And finally, of course, there is an S3 outage and you only persisted to 1 region, not multiple regions.
I'll get into hairy situations like that in some of my following posts, but too much for here.
Yes, you have another good point about query languages. I agree with you very much - somebody else commented about this, check out my reply to muaddirac. Please keep in touch with more questions/comments, or even email me!
Seriously though, this sounds cool and I'm looking forward to trying it out once you release code.
(Although CloudFlare has already taken "railgun"...)
> you never have to learn some silly separate query language again. A query language which just attempts to be some DSL to RPC another machine into doing the same query you could have already written in half the time it took to learn the query language.
Not sure I agree with this sentiment - most programming languages aren't declarative like query languages are, and that seems especially useful for, well, querying.
My question is, how do you solve data synchronization and conflict resolution, without using the techniques that do that, i.e. leader election, or some other type of consensus algorithm?
Point being, this makes gun truly peer to peer, because it behaves correctly if its the only one running, or if there are numerous guns interconnected with each other. No leader election, no consensus algorithm - you don't need those, because the system agrees as soon as the update are received because it resolves them immediately with an idempotent algorithm. Make sense? More details on this soon.
1. In memory in the browser tab's process.
2. If available, in the browser's localstorage or fallback.
3. In the server process's memory.
4. If available, in Redis on the server.
5. If in a multi-machine setup, any other connected server that is subscribed to that data set, being in memory (3) or in Redis (4) if available.
6. If configured, in a machine log on S3.
7. Persisted to S3, which replicates and shards it for you internally.
8. If configured, in a revision file on S3.
9. If configured, in a multi-region S3 setup, redundantly in many places.
(2) is not cleared till an acknowledgment that (7) is confirmed. (1) is not cleared until an acknowledgement that (7) is confirmed or if the tab is exited. In the case of (7) it is no longer the delta/diff, but a snapshot of that current data set with that delta/diff's update. Retries from (1) ~ (5) will happen at various events, if the confirmations are not satisfied. If a conflict has already occurred by (3) the acknowledgement from (5) will include a notification that the value has already been updated, along with the standard delta/diff of that conflicting update being sent down. Meaning (5) does not guarantee that your delta/diff has "won", only that it has been saved or is already outdated.
Worst case condition is that (2, 4, 5, 6, 8, 9) are turned off, in which your user's data is as volatile as them preemptively leaving the page (although I suppose you could use an onbeforeunload to warn them) - however this behavior is the current norm for most http post based forms and apps. Actually, pardon me, worst case condition is that everything is offline simultaneously, however this is not really interesting because then users won't even be able to access your app in the first place.
Please correct me if I have my terms wrong:
Receive omission - as mentioned before, the peer that originated the message will attempt to retry messages until a confirmation is given. So even if a process somehow does not receive a message, it eventually will, unless the origin gives up.
Send omission - this is a bit trickier, gun tries its best to keep user changes by all means possible until a confirmation is received. If for some reason gun is unable to do this, or the user's browser starts going haywire and doing weird things... you have no way to know and neither does gun. At this level, something is fundamentally wrong with the runtime or the OS, and gun has no way to check this.
Arbitrary - malicious attacks are best done from server peers, however this would require the malicious node to know your key. Currently the event of this happening may be decently high, as all it would require is for an attacker to take over an IP that another one of your gun node might connect to - the connecting node will issue the key as proof of being not malicious. I definitely could use some help on figuring out a more secure, yet still fully decentralized means of authentication and trustworthiness of server nodes. In the instance of byzantine mistakes, this will only effect the system if the message actually complies with the spec - in which case it does, the message is indistinguishable from a real request, therefore will be treated as one. And finally, what about an intentionally malicious user? The way gun's conflict resolution algorithms work makes it very difficult to be abused in any way that is actually meaningful to the malicious user, but I'm saving the details of this for a real post. That being said, such abuse will happen and I can already describe the results for you - "annoying" and "spammy" - luckily gun as filters built into the piping channels, so if you as the developer notice any particularly fishy behavior, you can always block that user out.
Recovery - when gun starts (or restarts) it goes about its usual business of checking its cache (if Redis is available) and retrying whatever is there that had failed to be cleared, listening to messages which may be retries of things that had failed to be cached (especially in the case Redis isn't enabled), and intelligently pulling things back in from S3. When any data set is loaded into gun it will also subscribe to that data set so it can be notified of updates from any other node, gun will also merge (using the same conflict resolution algorithms) the data set it got from S3 with any other node that also has that data set, and even replay some persisted logs if available. This way it boots itself back up into the most valid and live operating status that it can.
Partitions - there are 9 different layers, of which communication and networking can all fail or go out. That is about 9 factorial amount of combinations that can, could, and probably will go wrong (although this is an over estimation since not all the layers require networking). I simply can't cover them in this comment, but will go over some of the most concerning and common cases in a separate post - I also plan on building a partition simulator, so that way people can play with causing these issues, and so that I can do testing against them. That being said, gun as a strong commitment to handling these things, since it can afford to (since gun is eventually consistent, we have time to make up for networking problems). I hope this comment was helpful in what I did go over - I'm not sure if you'll even wind up seeing it (please notify me that you did! Even if that is just an upvote) I'll probably reuse a lot of this in one of my actual posts.
Thanks again! Anything else?
For example: "All conflict resolution happens locally in each peer using a deterministic algorithm." Hrrrmm.... if the model is to use CRDTs for conflict free resolution I can believe it; if it's timestamps, for instance, I'm much more doubtful.
As far as transformations and conflict resolution, I am wondering, are you using something like operational transformations?
Instead, I developed my own method, called "Analytical Fluctuation" - but I haven't written papers on it yet, but need to soon! One requirement is that it has to be truly peer to peer and cannot rely (even occasionally) on some centralized server or leader election setup. Another requirement is that it has to be latent proof, which for me means "preparing for the future" when people need to collaborate on documents not only from the other side of the world but also from Mars. This is where Neil Fraser says his system breaks down, because if there is too much latency, the patches stop applying idempotently. Again, more on this in upcoming posts.
> It bridges the distance with a realtime connection, so updates propagate at the speed of the raw pipes linking them.
... I'm assuming it's the P.
However, further consideration shows that CA is not really a
coherent option because a system that is not Partition-
tolerant will, by definition, be forced to give up
Consistency or Availability during a partition.
However, the strength of this design is it allows you to implement stronger forms of consistency on top - which is necessary for those occasional pieces of data (payments, medical records, etc). But if you do this then the availability of that data diminishes because servers must wait for consistency to be achieved before doing any further transformations. The sweet spot about this is that it will only be the availability of that particular property (or properties that it depends upon) that decreases, all other pieces of information will still be snappy fast because gun is capable of locking at a granular level.
This will not be as "easy" as it is in databases that are strongly consistent to begin with, but those databases aren't capable of being eventually consistent either (because that would violate the DB's principles). But it is possible, and you the developer get the control and flexibility over that. So you win some and you lose some, but I believe this setup is highly optimized because generally speaking the things that you need to be "strongly consistent" anyways have much slower "weakest links" than the speed of networks being up or down - such as approval by some human overseer (a loan officer, a lawyer, a doctor, etc.), and meatspace brains take much longer to process regulatory issues than the potentially couple of hours the network might be down. I'll touch more on these quirks in a separate post after I address conflict resolution.
Caching is hard. Consistency is hard. Peer to peer is hard. I hope the author is addressing these in a sane, verifiable way. I'm really curious to see the demo apps that come out of this.
Yes, these are hard subjects - especially things like cache invalidation. You raise good points, and I hope to answer them in the follow up posts I write - I expect your's and other's good eye to check my work. I'll let people know when I've written them. Demos are coming too!
Dumb, de-dumb dumb dumb.