
Show HN: Gun v0.1.0 – The Easiest Database Ever - marknadal
http://gundb.io/#step1
======
barakm
From the blog:

> Because gun is not a database (NoDB), it is a persisted distributed cache.

This I believe.

> The fatal flaw with databases is that they assume some centralized
> authority. While this may be the case initially when you are small, it
> always ceases to be true when you become large enough that concurrency is
> unavoidable.

Partially true. Though that's not necessarily a "fatal flaw", and calling it
such is troubling. Yes, concurrency is unavoidable when you become large
enough but you also want your data to be, well, consistent and persistent, but
then you go on...

> No amount of leader election and consensus algorithms can patch this without
> facing an unjustified amount of complexity. Gun resolves all this by biting
> the bullet - it solves the hard problems first, not last.

Where in the code, pray tell, is it solving these problems? The fact that you
also claim to be an AP system (and conflate this with ACID) makes me strongly
wonder what your notions on Consistency actually are.

"Just a cache" needs some consistency as well, I'll point out, but you may not
care as much about stale reads.

> It gets data synchronization and conflict resolution right from the
> beginning, so it never has to rely on vulnerable leader election or
> consensus locking.

From what I'm starting to understand you're, at best, shuffling that off to S3
or "other storage engines" \-- you've still got to pay the cost. You can't
really claim to do linearizability without, well, actually doing
linearizability.

So, maybe it's a cache, sure. And you seem to like to work on the developer
API, nothing wrong there. But there's nothing new under the sun and I'm really
skeptical that hard distributed database problems are solved in one large JS
file.

~~~
marknadal
Clarification, I said GUN is NOT acid compliant from your "usual"
understanding of the term, since GUN is AP. Most people assume acid means CP.

ACID is very vague though, and I'd like to explore it more by writing tests to
either confirm or deny whether GUN supports it or not (would you be interested
in helping build those tests?). I also want to get some Jepsen like tests up
as well.

Data convergence (data sync) is guaranteed by the Hypothetical Amnesia Machine
algorithm, which is completely deterministic and idempotent. There is some
details on it in the wiki, let me know if you have any questions. I also did a
tech talk on it.

In NO way does gun rely on S3 for consistency. That would be horrible. Check
out the algorithm and slam me with questions/critiques. Thanks for looking. :)

~~~
qqueue
Is the Hypothetical Amnesia Machine algorithm backed by any scholarly
research, or can you at least cite some papers with similar techniques? Blog
posts and javascript are nice, but I have a certain fondness for LaTeX-
generated PDFs whenever data integrity is involved, e.g. HyperDex's pretty
excellent papers:

[http://hyperdex.org/papers/](http://hyperdex.org/papers/)

~~~
marknadal
No papers yet, but I've been working on building up connection with academics
and hiring them. So hopefully expect to see something published, but the
process takes a while.

Meanwhile I'm actively working on building towards a simulation system and an
actual high-scale deployable battle testing environment. Think of these as
taking the theory from scholastic research, and actually implementing them in
practical settings that anybody can run.

I'd also like to get a TLA+ specification going. If you have any experience in
this stuff, please please please contact me mark@gunDB.io because this is
important to me.

~~~
roeme
When I read your responses, I can‘t shake the feeling I'm talking to some
snake oil salesman in a nice suit.¹

And to be honest, GUN‘s docs sound similar. Heavy on how to use, and how
awesome everything is, but as soon as one tries to understand stuff, it‘s
either WIP or “team up with me/us!”. _eyebrow rises_

And the claim of “building up connection with academics and hiring them” falls
perfectly in line with this. Why the hell can‘t you describe what you did by
yourself? If it's so awesome, why don‘t you just _die_ to explain it to
everyone who asks? Or, $DEITY forbid, should the “academics” lend some
credibility to GUN, even if it's with just their title? What's this HAM about?

Maybe it's just me, but all this with a rather complex naming convention
(souls...) and code...

eh, I'll go with “show us teh algoz”. Or describe it.

¹) To illustrate: « I'm actively working [...]» – I'd like to see you
passively working.

~~~
marknadal
Have you looked at the Wiki?

[https://github.com/amark/gun/wiki/Conflict-Resolution-
with-G...](https://github.com/amark/gun/wiki/Conflict-Resolution-with-Guns)

[https://github.com/amark/gun/wiki/How-to-Create-
GUN](https://github.com/amark/gun/wiki/How-to-Create-GUN)

[https://docs.google.com/presentation/d/1VIOJc0bdzUNs7yXMLKCc...](https://docs.google.com/presentation/d/1VIOJc0bdzUNs7yXMLKCcgwU8ZZqMh-G4XDJt8JRtvSA/edit#slide=id.g6c37d5900_0378)

No snake oil. It is a state machine operating over a boundary function.
However words like that sound super jargony which sounds vague, despite the
fact that people spend their entire lives working on just these problems sets
and their nuances.

I'm happy to discuss the workings, and I'd encourage you to try and use GUN
and see if it can withstand your concurrency attacks. Challenge accepted?

Edit: This person (in the comments below, please upvote him), and my reply,
best addresses the most important questions:
[https://news.ycombinator.com/item?id=9077969](https://news.ycombinator.com/item?id=9077969)

~~~
zero_iq
You're happy to discuss the workings? How about writing them down
somewhere...? All your documentation, such as it is, describes things using
terms that you never actually define or explain. Your code is just as bad.
Worse in fact, because it introduces yet further terms that are not in the
documentation.

Your Conflict-Resolution-with-Guns page simply says 'see gun.HAM' for the
explanation. No indication where this can be found. It isn't in the source
repository and it isn't in the wiki. A google search reveals nothing.

The 'algorithm' presented on How-to-Create-GUN is meaningless because you
don't define any of the return values. I can see how it maps some input values
to some output values, but nowhere do you say what any of those return values
actually mean, what I should do with them, or why they are useful.

e.g. return {amnesiaQuarantine: true} ... what does this mean? What should be
done with that return value? What is an amnesiaQuarantine?

e.g. return {quarantineState: true} ... what does this mean? How does it
differ from amnesiaQuarantine: true? What is a quarantineState? How should I
react to receiving this return value?

Your documentation says a lot, but doesn't actually define anything, and is
ultimately meaningless. This is why people are giving you a hard time and
asking so many questions.

Most people reading will not know: what amnesiaQuarantine is, what amnesiState
is, what the Hypothetical Amnesia Machine thought experiment is, what a
boundary function is (there are multiple definitions - what are you using?),
what 'converge: true means', what 'incoming: true' means, what state: true
means (given that you say other ' _state ' variables are _times* -- how the
hell does a boolean represent a time, what 'you have not properly handled
recursion through your data' means. What is a 'soul'? What happens to the data
when particular values are stored? Where are things stored? What is the data
flow? How are things shared? How does sync happen?

Imagine you don't know what any of your terminology means - like everybody
reading your documentation. Treat each term like an undefined variable. Now
try to understand your document. You can't. Those undefine terms are never
'set' anywhere. It doesn't make any sense. As soon as it gets close to
actually explaining anything it just handwaves, or leaves you with undefined
terminology.

You don't define what kind of persistence you implement or what consistency
guarantees (worse: your explanations do not seem consistent). You don't define
how your conflict resolution works (the 'explanation' given is tantamount to
Star Trek technobabble). You don't define how data is transferred. Your slides
are useless without any notes.

In your code you say that ACID is vague. It really isn't. Your explanation of
how you meet ACID _is_ extremely vague however, using what appear to be
truisms and contradictions, and yet more undefined terms that seem to have
little to do with anything mentioned in the documentation. Your code is poorly
structured, and badly commented. It uses 'cool' sounding gun-related
terminology ('shot', 'roulette', etc.) without defining what the hell those
things mean. There is nothing in the code that actually seems to do anything
with consistency

Your HAM algorithm - the very crux of your system as stated in your
documentation, remains unexplained, and WORSE.. has a TODO: comment noting
that it might not work and needs further investigation. This comment also
mentions rollbacks.... yet nowhere else in the code or documentation says
anything about rollbacks, and it's not clear why rollbacks would even be
needed according to the (vague) explanation of HAM.

Your further explanations in these comments STILL do not actually describe
precisely what HAM is or how it works. If you cannot do this in a simple and
elegant manner, then NOBODY will be able to use or trust your database system.

If you want anybody to take you seriously, you must write a simple and concise
explanation of HAM, including definitions of all your terms.

Frankly, it is so vague, and so unclear how it works that I am starting to
think this is the product of some kind of mental illness...

Sorry to be so harsh, but nobody seems to be getting through to you.

EDIT: I'm reminded of Einstein's quote: "If you can't explain it simply, you
don't understand it well enough."

~~~
marknadal
Terms, defined here:
[https://github.com/amark/gun/wiki/semantics](https://github.com/amark/gun/wiki/semantics)

Explanation of the conflict resolution in simple terms, here:
[https://news.ycombinator.com/item?id=9077969](https://news.ycombinator.com/item?id=9077969)
.

Return values, with comments explaining their purpose, here:
[https://github.com/amark/gun/wiki/How-to-Create-
GUN](https://github.com/amark/gun/wiki/How-to-Create-GUN) (I know you
referenced this already, but did you read the comments explaining each return
value? Edit: upon further reading your comment, it looks like you did, you
just didn't like them. Perhaps I should make them more concise)

Slides (no audio/video unfortunately) on what operations to apply given the
HAM return values, here:
[https://docs.google.com/presentation/d/1VIOJc0bdzUNs7yXMLKCc...](https://docs.google.com/presentation/d/1VIOJc0bdzUNs7yXMLKCcgwU8ZZqMh-G4XDJt8JRtvSA/edit#slide=id.g6c37d5900_0378)

Persistence, currently S3 or localhost-testing-only disk. Persistence is a
plugin.

ACID: Please link me to your favorite explanation of ACID that is clear and
concise. I'll try and base my reply off that. I haven't found any good ones.
GUN is AP, not CP.

People are taking me seriously, enough that I have contributors and funding.
Some people don't take me seriously, and I'm trying hard to open up to them
and be honest.

Do I need better documentation? Yes. Do I have documentation? At least some,
yes.

What else can I handle for you?

~~~
zero_iq
You still have not described HAM except in the vaguest of terms. You have
addressed very few of my questions.

What is the Hypothetical Amnesia Machine thought experiment? Where have you
described this, or where can a description be found? What do the return values
mean? The comments are very little help. What situations do they cover?

How does this relate to your algorithm? Please explain the algorithm in simple
terms, with precise definitions.

Your slides provide NO USEFUL INFORMATION WHATSOEVER. If you cannot see that
someone who doesn't already know what HAM is will be TOTALLY UNABLE to
understand your slides, then you have a serious problem seeing things from
another's point of view and should get somebody else to do your documentation
for you.

Believe me, it's not from lack of trying on my part. I'm not stupid. I'm an
experienced developer and familiar with the internal workings of many
different database systems. It's my job and my hobby. I have maintained and
contributed to several database systems. Your slides are intriguing but
meaningless to me.

Your list of definitions ('Semantics') redefines many things that already have
perfectly good definitions, and declares new terminology for concepts that
already have perfectly good labels.

Many of the definitions are vague or even nonsensical/self-inconsistent.

For example: "soul': is the practically unique, immutable identifier for a
node".

OK, so it's an identifier for a Node. So it's a Node ID. Why don't you just
call it that?

But what does 'practically unique' mean? Something is either unique, or it
isn't. It might be unique in a particular context, e.g. only in one instance
of the database, or application, or server, ... or... what?

And what's a 'node'? "A group of no, one, some, or all fields, as they change
over time." Well, you've redefined a perfectly good piece of jargon with a new
and vague description. Node seems like a really bad word for this. In what way
is a set of fields anything like a 'node' in the general sense? How does a
node capture things over time? Is it a list, a history, an event log....?

"A group of no, one, some, or all" is better known as a 'set'. This is
universally-accepted mathematical terminology. Except you've redefined that
too.

And if something is a set of fields.... hey, how about calling it a field set?
You know, like everybody else does...? Oh, no, let's call it a node
instead....

My favourite: "Sent: proof that a message was received, might contain data
that needs no receipt." The more you study this sentence, the more nonsensical
and ambiguous it becomes. For a start, why not call it 'Received'? Or even
'Receipt', because that's the common noun for an item showing proof of
receipt. Except, that you might need to prove receipt of data that needs no
receipt... It is a _ridiculous_ definition.

I'm sorry, but I can't take you seriously.

Frankly, it sounds like you yourself don't understand the domain and concepts
you are describing, and are handwaving to cover your lack of knowledge. The
fact that you provide your own terminology for things that could quite easily
be described in standard terms betrays a lack of theoretical background, and
ignorance of the state-of-the-art.

I'd venture a guess that your being REALLY, REALLY bad at explaining things
may be correlated with the fact that you're apparently really good at
describing tiny things in the most grandiose and self-aggrandizing terms. This
seems to be ubiquitous across all your github projects. Redefining things
unnecessarily, solving things that already have simple solutions, describing
toy apps as radical revolutionary game-changers. I suspect your inability to
explain things stems from this narcissism/egocentrism.

~~~
karlgrz
Abso-fucking-lutely.

------
dang
A few disagreements in this thread have crossed over into being disrespectful.
This is a gentle reminder that you can (and on Hacker News, please do)
disagree without calling names.

[https://news.ycombinator.com/newsguidelines.html](https://news.ycombinator.com/newsguidelines.html)

[https://news.ycombinator.com/showhn.html](https://news.ycombinator.com/showhn.html)

------
bitanarch
A few questions.

1\. How do you define the operating boundaries for your time stamps? What is
too low and too high and why?

2\. What are the expected use cases for your conflict resolution algorithm?
The HAM function you proposed would just overwrite one string with another and
so for things like collaborative document editing, user intention isn't
preserved.

3\. Where is the vector clock defined in your code? I can only see
Gun.time.is() in a brief glance at your code... and it is just getting the
UNIX timestamp in milliseconds.

~~~
marknadal
Wonderful questions! Actually some of the best in the entire thread I think.

1\. See (3) but first read:

A) The upper boundary is defined by the current machine's local clock, which
could have skew or drift.

B) The lower boundary is defined by the last known update on an individual
record (down to the UUID+field).

2\. The expected use case is for this conflict resolution algorithm is for
basic field/value pairs (terms defined here:
[https://github.com/amark/gun/wiki/semantics](https://github.com/amark/gun/wiki/semantics),
and here: [https://github.com/amark/gun/wiki/JSON-Data-
Format](https://github.com/amark/gun/wiki/JSON-Data-Format)) within a UUID an
object (called a node, as in a node in a graph).

This is what HAM works off and is considered the lowest level atomic pieces
(the value). In order to sync on collaborative text you need to build an OT
layer on top of this (I plan on doing this, possibly integrating with ShareJS
as another mentioned). You cannot collaboratively sync on atomic values by
themselves, you must define a CRDT for that - plugins/modules for them will be
coming later.

3\. Vector Clocks. HAM does not assume what the sort key is for state, it just
assumes it is a value it can do <, <=, ===, =>, > comparisons on.

A) Vector clocks have a vulnerability that if you are working with
temporary/ephemeral machines, the clocks will constantly get reset and have to
play "catch up". However, network partitions are highly likely, so there is no
guarantee that two machines won't issue a conflicting vector clock. If this
happens, there is no standard way of dealing with this, although there are
plenty of work arounds.

B) Timestamps also have a vulnerability, that is if you set your local clock
ahead (say 2 years in the future) then it will "always win" wiping out other
peers valid values. However you unfortunately cannot determine in an untrusted
network whether a peer is being malicious about being 2 years in the future,
or if they are actually at a different point in timespace - like a GPS
satellite or on Mars, or went offline in the subway.

C) As a result, this is why I combine them together via the boundary function.
The upper and lower boundaries of the state machine provide the relative
"vector" for the untrusted timestamp in the delta update.

The benefits of this technique are two fold:

1) You get deterministic and idempotent resolution within a special-relativity
timeframe in a decentralized system without gossip (consensus).

2) If you do run GUN within your own trusted network, you can use the
timestamps to calculate drift between machines and then readjust the boundary
function of the state machine. Thus giving you a highly accurate "objective"
view of your data across peers, which if the latency is low enough could
indicate it is worth creating locks (but thus sacrificing Availability).

Hope this was clear enough! Any questions? I'm going to be reposting this in
the rest of the thread.

------
SlyShy
I don't know if I should consider this "the easiest database ever" or "gun is
not a database" (from the FAQ). Github says "a distributed, embedded, graph
database engine".

I think some clarification around the marketing could do a world of good.

~~~
marknadal
Good point, oh boy - caught me red handed. shameshame.

What I'm trying to get across is that it is the easiest database because it is
not your traditional master-slave database, and it doesn't require maintaining
any database process. It is indeed just a cache, but it has all the benefits
of a database.

~~~
lberger
I'm lost. How does it have all the benefits of a database, without any
persistence? That doesn't sound like a database at all.

~~~
marknadal
Persistence is just a plugin/module/hook. Currently it plugs into a very
never-should-ever-be-deployed file on disk (for easy local testing only) and
S3.

We're going to be adding more storage engines though! Hopefully building an
open source S3 that uses fancy algorithms to store on disk and on peers.
However I don't know that stuff, somebody else is doing it (or I'm hiring -
we're funded!).

------
nolanl
Interesting project! It seems to share a lot of the goals and design choices
of PouchDB/CouchDB: distributed, offline-first, eventually consistent,
deterministic conflict resolution, etc.

One big difference I can see is that it's only using LocalStorage, which has
good cross-browser support, but only allows 5-10MB maximum. Are there plans to
add IndexedDB/WebSQL support so that users can store more data?

~~~
marknadal
Yes! Thanks.

LocalStorage implementation is just the default plugin, and I chose it first
because of its compatibility. I'd like to get IndexedDB support in there as
well. Interested in helping?

------
gkya
I wanted to read the text on the page, but the styling, with the shadow, or
the gloss, or whatever it is, it is giving my astigmatic eyes pain, so I
couldn't, I'm sorry.

~~~
marknadal
sorry about that. IDK why but it makes it easier on my eyes, maybe I should do
a survey (I'm probably just weird) and then fix it.

~~~
adambard
I definitely remember complaining about this exact thing a year ago :P. At
least you toned down the shadow a bit.

~~~
marknadal
awwwe you remember me! Happy face! For... being "that guy" that had blurry
text. Shoot, sad face. Thanks for sticking around :).

~~~
adambard
Heh, I remembered the project too, I was just reminded of the blurry text by
the blurry text.

If you find a high contrast hard on the eyes you could drop it a bit by just
making the lettering a lighter grey in leiu of the drop shadow. Just don't
overdo it or you'll get people complaining about that.

------
ArekDymalski
It looks very promising. However I wonder who is the intended user of Gun:

1\. "full-stack" developers who just want to save time and/or benefit from
NoDB aspect 2\. Beginners and front-end developers who don't anything about
databases?

In case of group 1 your marketing seems to be insufficiently technical as many
people here have already noted.

In case of group 2 (which I belong to) things look completely different. As a
beginner whose learning efforts are constantly disheartened by tutorials and
courses which end at "locally hosted HelloWorld app" phase, I'd be more than
happy seeing: 1\. step-by-step, layman-friendly tutorial on installing Gun on
S3 and other platforms. 2\. _very_ well commented example app demonstrating
how to create typical functionalities.

With such approach you will keep the "Dropbox of databases" promise which
sounds very exciting. Actually I think that something like this should be an
obligatory feature on Codeacademy or any web development MOOC dedicated to
beginners.

~~~
marknadal
Great questions.

1\. As of right now, focusing on beginner/front-end devs who just want an easy
open source Firebase like database. People building small experimental apps,
since we have finished our battle-testing suite yet.

However, I'd also highly encourage full stack developers to get involved and
try it out and give us feedback. For small projects it'll probably save you
time, but the plugin/modules ecosystem (aka features) aren't mature enough
that you'll be writing a lot of your own logic. Which please do! We need them!

If you don't want to run GUN on localhost, I'll host a GUN server for you. :)
You are right, I need to get better docs/tutorials and information out on
this, so laymen don't get disheartened.

Is there anything I can do to help? Thanks for your comment!

~~~
ArekDymalski
Thanks Mark, I'll keep an eye on the docs page then. I'll also keep the thumbs
up :)

------
rawnlq
Have you heard of sharejs [http://sharejs.org/](http://sharejs.org/)? It's
made by an ex-Google-wave engineer and uses operational transforms for
eventual consistency. It seems like you guys are solving similar problems.

I mention this because Dropbox has their own "Dropbox for Databases" called
Datastore:
[https://www.dropbox.com/developers/datastore](https://www.dropbox.com/developers/datastore)
which is based on Operational Transforms:
[https://blogs.dropbox.com/developers/2013/07/how-the-
datasto...](https://blogs.dropbox.com/developers/2013/07/how-the-datastore-
api-handles-conflicts-part-1-basics-of-offline-conflict-handling/)

~~~
marknadal
Actually yes! I'm one of the people who accidentaly sparked a long discussion
in the #1 issues thread:
[https://github.com/share/ShareJS/issues/1](https://github.com/share/ShareJS/issues/1)
that I've seen other people on HN link to.

GUN doesn't have OT-style text collaboration yet, so go with ShareJS if that
is what you need now. I do plan on implementing it on top of GUN though, or
trying to get ShareJS integrated with GUN. Joseph is a really great guy.

Yupe, I've talked to Steve Marx at Dropbox Datastore at a hackathon before.
He's a great guy as well. They're using algorithms that require some
centralized conflict resolution though. Which is great, but I'm interested in
the decentralized side.

------
theseoafs
I dislike that the webpage actually has very little information about what the
tool does, what use cases it is suitable for, what the architecture is like,
etc.

Here's an important question the homepage doesn't answer: is it ACID?

~~~
marknadal
Good point, I'll try and move the blog to another page and replace it with
more details.

The fastest summary is that it is an Open Source Firebase.

Flat up answer for ACID: honestly, not how you traditionally would think, as
it favors AP of the CAP theorem.

However, ACID terminology is actually pretty vague
([http://en.wikipedia.org/wiki/ACID](http://en.wikipedia.org/wiki/ACID)). Here
is my comments about ACID in the code:

    
    
    			A - Atomic, if you set a full node, or nodes of nodes, if any value is in error then nothing will be set.
    				If you want sets to be independent of each other, you need to set each piece of the data individually.
    
    			C - Consistency, if you use any reserved symbols or similar, the operation will be rejected as it could lead to an invalid read and thus an invalid state.
    			
    			I - Isolation, the conflict resolution algorithm guarantees idempotent transactions, across every peer, regardless of any partition,
    				including a peer acting by itself or one having been disconnected from the network.
    
    			D - Durability, if the acknowledgement receipt is received, then the state at which the final persistence hook was called on is guaranteed to have been written.
    				The live state at point of confirmation may or may not be different than when it was called.
    				If this causes any application-level concern, it can compare against the live data by immediately reading it, or accessing the logs if enabled.
    

If you have any specific further questions I am happy to answer. It has
support for vector-clock/timestamp "state" transactions.

~~~
akerl_
I'm attempting to draw a connection between your comments on ACID and what
ACID actually means, and there doesn't appear to be any parallel.

~~~
marknadal
I understand what you mean.

Could you do me a favor and point me to your favorite description of ACID?

------
tomphoolery
> 400 Bad Request

[https://github.com/amark/gun](https://github.com/amark/gun)

~~~
marknadal
oh snap, the HN "DDOS" has peaked! I'll see what I can do to get things back
online. Thanks for putting the github link in here in case others get the same
issue.

------
karlgrz
I would really love if this actually worked as promised. Way too much
skepticism and not nearly enough proof. Kudos for actually putting this out
there, though. It'd be great to prove everyone wrong, but I will not hold my
breath.

Good luck!

~~~
marknadal
You can try messing with it yourself by, doing (if you already have
node/npm/git installed and familiar with terminal):

    
    
       git clone http://github.com/amark/gun
       cd gun/examples && npm install
       node express.js 8080
    

Then open it in a couple of browser tabs on different devices, change their
system clock, try refreshing data, crashing things. etc.

I'm also trying to figure out how to write simulated tests (like Jepsen) that
will do all of this for you and give you the results of what failed/succeeded.
Till then, let me know if you see anything break.

~~~
karlgrz
I'm not going to spin this up on 1000 nodes to make sure it handles the kind
of load needed to simulate actual production traffic (which is what you would
need to actually figure out if this would hold up to some kind of large scale
load that Riak or Cassandra would be able to handle). Maybe you should do that
yourself and document it to prove how good your product is!

~~~
marknadal
You don't need to spin up a 1,000 nodes.

You can just spin up a 1,000 tabs.

Since they all run the same algorithm!

Yes, I am working on more tests to prove myself wrong or right. Please bare
with me as I/we make progress, because it is literally only a few contributors
and me.

This is v0.1.0 for a reason, not v1. Lots ahead, but please play with it while
we work on developing the test suite.

~~~
karlgrz
I appreciate the suggestion and response. Understand it is v.0.1.0 but you
should also understand that when you bring something like this out with next
to no academic backing behind your theories and algorithms there is DEFINITELY
going to be skepticism and doubt.

You are exactly right, though. It's early, and I'll give you the benefit of
the doubt that you will achieve what you want.

Also know that there is a TON of research in these areas (which you clearly
are aware of based on your comments in this thread) that basically refutes a
lot of what you are claiming. I would love to see more clear documentation
along with actual proofs showing how your algorithm is sound.

Until then, good luck, and I look forward to hearing about your success!

~~~
marknadal
Thanks! :)

Quick question though: I'm claiming an AP system, not that I have all three.
What research are you referring to that suggests you can't have
idempotent/deterministic conflict resolution? CRDTs are out there in the wild
and working. Do you have any papers in mind?

~~~
karlgrz
I'm not saying you claimed to have all three.

Only paper I would have in mind is the CRDT paper from Letia, Preguiça, and
Shapiro which I'm sure you're already familiar with.

The thing that bothers me the most is that it appears your entire algorithm
(Hypothetical Amnesia Machine) has no proofs behind it. Specifically, your
wiki article here:

[https://github.com/amark/gun/wiki/Conflict-Resolution-
with-G...](https://github.com/amark/gun/wiki/Conflict-Resolution-with-Guns)

Has a giant hole where the substance would be. That bothers me because you are
putting this potentially cool thing out there WAY BEFORE you have done the
actual work.

Again, I applaud the fact that you actually put this together and you
implemented it. And I understand it's v.0.1.0. That's fine.

Claiming this: "All conflict resolution happens locally in each peer using a
deterministic algorithm. Such that eventual consistency is guaranteed across
all writes within the mesh, with fault tolerant retries built in at each step.
Data integrity is now a breeze."

without any proof that algorithm actually does this reliably and WITHOUT DATA
LOSS bothers me. There is so much snake oil out there, you don't need to be
starting off on the wrong foot.

I'm no expert at this stuff (I've only been working on distributed systems for
about 5 years) but I'm also not claiming to be an expert. I just know that
there is a lot of hand waving out there, and I think it would be important to
actually prove your algorithm.

My 2¢.

~~~
marknadal
This person (in the comments above/below, please upvote him), and my reply,
best addresses the most important questions:
[https://news.ycombinator.com/item?id=9077969](https://news.ycombinator.com/item?id=9077969)

Please don't assume I haven't done the "actual work", I have. The academic
side of the equation with proofs is going to take much longer than the
timeframe from my investors for this seed round. I openly admit that, but I'd
rather do good of getting this out in peoples hands to actually play and build
stuff with.

To be honest, I'll probably want to get Jepsen tests and the sort built before
the academic side of the equation is completed. Thank you for being skeptical
(I like that), but please don't ignore or not experiment with something just
because a paper hasn't been published yet. Who knows, if you did play with it,
you might like it enough to help write the paper - but maybe that is me being
too optimistic.

Blessings.

~~~
karlgrz
That is good information, thanks for that.

------
danbruc
There is no way this can work. Merging data changes can inherently not be
automated in the general case. Deciding if a change from foo to bar should win
over a change from foo to baz depends on the semantics of those strings. There
are some cases, for example counters, with simple and clear semantics where
you can build reusable and robust solutions for. You can also handle the
general case with simple policies like last write wins. But there is no way
any algorithm will ever be able to figure out whether to choose bar or baz,
not at last because I could arbitrarily declare any of the two outcomes
correct.

~~~
marknadal
Every change can be preserved in a history/append-only/log/stream. So you
don't have to "lose" data from another "winning". However, the algorithm will
by default select one, you can then code it at the app level for the user to
select a new winner.

The general case here is very UUID based key/value pairs. Anything beyond
that, you should be using CRDTs and OT like algorithms, which I will be
building on top of GUN.

However, in the meanwhile, I challenge you to try running the example folder
from the GitHub ReadMe and seeing if you can break the automated sync and
cause data divergence!

Edit: This person (in the comments below, please upvote him), and my reply,
best addresses the most important questions:
[https://news.ycombinator.com/item?id=9077969](https://news.ycombinator.com/item?id=9077969)

------
nathan7
Awesome to finally see Gun on here, Mark! What still worries me is the
reliance on external storage services, although a good local storage service
could be built for Gun. Other than that, I'm glad to finally see docs!

------
stephanfroede
Cool approach. I had some fights with Neo4J and taming IO. I did fall back on
a 2nd Level Cache, which is nothing else than a huge hash map/KV store in
memory.

~~~
marknadal
Thanks! Interested in joining and working on these types of problems? You seem
to have some pretty good skills. Shoot me an email mark@gunDB.io

------
glittershark
You guys seriously need to work on your SEO - Googling "gundb" has the page
show up with the text "Your browser does not support frames...".

~~~
marknadal
Oh my goodness #fail. Thank you for spotting this. I'll try to figure out how
to fix it (probably by not being cheap by domain masking).

------
fiatjaf
If it "is just a cache" and the data is distributed among every client, where
is the data at each time before it is persisted to S3?

~~~
marknadal
Great question, I'm going to C&P a reply I did previously:

1\. In memory in the browser tab's process.

2\. If available, in the browser's localstorage or fallback.

3\. In the server process's memory.

4\. If available, on disk in the server.

5\. If in a multi-machine setup, any other connected server that is subscribed
to that data set, being in memory (3) or in disk (4) if available.

6\. If configured, in a machine log on S3.

7\. Persisted to S3, which replicates and shards it for you internally.

8\. If configured, in a revision file on S3.

9\. If configured, in a multi-region S3 setup, redundantly in many places.

(2) is not cleared till an acknowledgment that (7) is confirmed. (1) is not
cleared until an acknowledgement that (7) is confirmed or if the tab is
exited. In the case of (7) it is no longer the delta/diff, but a snapshot of
that current data set with that delta/diff's update. Retries from (1) ~ (5)
will happen at various events, if the confirmations are not satisfied. If a
conflict has already occurred by (3) the acknowledgement from (5) will include
a notification that the value has already been updated, along with the
standard delta/diff of that conflicting update being sent down. Meaning (5)
does not guarantee that your delta/diff has "won", only that it has been saved
or is already outdated.

Worst case condition is that (2, 4, 5, 6, 8, 9) are turned off, in which your
user's data is as volatile as them preemptively leaving the page (although I
suppose you could use an onbeforeunload to warn them) - however this behavior
is the current norm for most http post based forms and apps. Actually, pardon
me, worst case condition is that everything is offline simultaneously, however
this is not really interesting because then users won't even be able to access
your app in the first place.

------
kainolophobia
I've looked at your "Hypothetical Amnesia Machine algorithm" and have a few
questions.

First though, I'd like you to read this: [http://research.microsoft.com/en-
us/um/people/lamport/pubs/t...](http://research.microsoft.com/en-
us/um/people/lamport/pubs/time-clocks.pdf)

~~~
marknadal
Yes, I've looked at this paper before - I should reread it though.

I've done a tech talk (not recorded though) on the pros/cons of vector-clocks
and timestamps. I have some very specific insights which I should probably
write a paper on. Or at least get the tech talk recorded or written down.
There are some slides at the bottom of:
[https://github.com/amark/gun/wiki/How-to-Create-
GUN](https://github.com/amark/gun/wiki/How-to-Create-GUN) .

What questions may I answer?

------
fiatjaf
The demo is a little confusing because it loads two iframes and seen to be
faking it, but yes it works and no, it is not faking it.

[https://dl.dropboxusercontent.com/u/4374976/gun/web/tabs.htm...](https://dl.dropboxusercontent.com/u/4374976/gun/web/tabs.html?key=random/sdcws4mU0)

~~~
marknadal
Thank you for noticing this. :)

My original tutorial actually required the user to physically open up multiple
tabs and have them be side-by-side. However it was a mess and people didn't
like it. So I opted to fake it... while still depending upon the real tech
underneath.

HOWEVER, it is just running on a freebie heroku box, so it is probably bound
to crash/fall-over soon.

------
lux
As a "self-hosted Firebase", I'd love to see something like their integrations
with various JS frameworks, for ex:

[https://www.firebase.com/docs/web/libraries/react/](https://www.firebase.com/docs/web/libraries/react/)

~~~
marknadal
YES! We're actively working on trying to get adapters built for React,
Angular, Ember, Backbone, etc. but we're a super tiny team.

Would you be interested in contributing? You could really help make a big
difference.

~~~
lux
Awesome! I've starred the project on Github. I'm in startup mode and juggling
way too many things these days, but maybe I can find a free evening :)

~~~
marknadal
sweet, shoot me an email mark@gunDB.io to talk more.

------
marknadal
Hey everyone! If you have any questions, I'll be here for the next several
hours. Also check out the GitHub Wiki:
[https://github.com/amark/gun/wiki](https://github.com/amark/gun/wiki) .

------
bhz
Have you tried redis?

[http://try.redis.io/](http://try.redis.io/)

~~~
marknadal
I love redis! My first proof of concept of GUN used redis as the
persistence/storage layer. But I moved off of it since I wanted a fully
embedded solution.

Data wise the difference is that Redis doesn't support graphs. But you could
easily build that on top of Redis, so you could argue GUN is just graph data
ontop of Redis (well, not anymore) with a conflict resolution algorithm baked
in.

~~~
bhz
I'll have to give Gun a go when I have the chance. Thank you for providing the
contrast.

------
Yadi
Hey Mark! Congrats, it looks awesome, good to see this here!

------
trithagoras
...Is there a glow effect around all the text?

------
protomyth
Congrats. What is the license? I must be missing where it is and the source
code I checked doesn't have it.

~~~
marknadal
Thanks!

Honestly, I might put this up to an open-source vote.

I personally learn towards the MIT and the ZLIB license,
[http://en.wikipedia.org/wiki/Zlib_License](http://en.wikipedia.org/wiki/Zlib_License)
.

However I also know a lot of other databases are doing AGPL, I think for
monetary reasons. Which :/ I might also want to consider.

But as I said, I honestly think this should be a combination of community
decision.

Could people reply back with what license they'd like?

~~~
wongarsu
If you want everyone to use your database, MIT or ZLIB are clearly superior.
For you(r company) that would limit your monetization options to support and
similar, which is certainly not ideal.

If you value free software (as opposed to open source), AGPL is a good option
and allows you to sell more permissive licenses to everyone who needs one.

If you actually want to make money with this, it's really a question of your
business model. I would use it with either license.

~~~
lclarkmichalek
Why would using AGPL imply not valuing open source?

~~~
jackbravo
Because open source guys value having more people using your code, and using
AGPL discourages some people from using it because they can't keep their
modifications private?

------
samuelcouch
This is really awesome! Excited to use it.

------
mmagin
"With all new flavors like banana, fizzbitch, and GUN!"
[https://www.youtube.com/watch?v=t-3qncy5Qfk](https://www.youtube.com/watch?v=t-3qncy5Qfk)

------
sigmonsays
This is the best troll ever

------
curiously
this seems like a great tool. hosting firebase on my own is what I want to
build real time apps, is this possible?

I have the same concerns for meteor which I have for this as well, which is
security and scalability.

How does Gun address those two things?

~~~
marknadal
Great questions.

Hosting on your own: Yes.

Security: Currently a "Roll Your Own" approach, where you wrap GUN behind some
firewall/throttling like system.

Why? Because permissions are so app-specific behavior, I haven't figured out
how to generalize it. I don't think it is possible to do it, so in the future
we'll probably provide various security plugins that come with app specific
assumptions.

Scalability: Run the example folder in the GitHub Readme, and open up hundreds
of tabs. Gun is running individually in all of them. See how it handles that.

I'm trying to have a production-grade battle-testing suite developed soon,
such that you could just run a script, it would ask you how much you want to
spend on the test, and then it would deploy a ton of GUN peers to the cloud
and generate a ton of traffic and load. This is not available yet, but
something I'm focusing on within the next 6months or year.

Anything I can help with?

------
eridius
Why is it called Gun? The name is a little off-putting. What's next, a
database called Kill? How about Murder? Genocide?

Edit: The fact that I'm being downvoted for voicing a concern about the naming
is really disappointing. This is a serious issue, and I would appreciate a
response, not being buried.

~~~
marknadal
I didn't downvote you, so please don't think I'm the one trying to bury you.

I'm calling it GUN because it is powerful and therefore a dangerous tool to
wield. Because I'm going with a fully decentralized/distributed system, it has
also generated some controversy with people.

Fact is, centralized/master-slave consensus based databases are incredibly
popular right now. Things like Riak, Cassandra's CRDTs are not getting enough
traction as they should - but probably because they can be difficult to set
up. I'm trying to blow this all out of the water and make distributed database
systems easy for developers.

So I'm admittedly going for an edgy name. I'm not wanting to kill anybody,
just centralized software.

~~~
eridius
Thanks for the response. I didn't think you were the one trying to bury me,
but I appreciate the fact that you care.

I'm glad to hear that you are aware of the fact that this is a loaded term and
that you intentionally chose it because you wanted an edgy name. While I'm
still not a fan of it, I feel much better about it knowing the reason behind
the naming. And I think you need to put this info somewhere on the site and
the GitHub project. I read gunned.io, and I skimmed the README of your GitHub,
and nowhere did you even acknowledge that the name was edge, much less
indicate that this was an intentional choice. I would urge you to add a FAQ
entry on gundb.io, add a wiki page to your GitHub repo, and put a line
somewhere in the README (perhaps at the bottom) linking to that wiki page.
Otherwise, you're going to end up with more people than just me thinking that
you chose a potentially-offensive name as opposed to a deliberately edgy one.

Speaking of your GitHub repo, you should also add gundb.io as the webpage for
the repo, and probably link to it in the README.

