Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Gun database – 100k ops/sec in IE6 on 2GB Atom CPU (github.com/amark)
71 points by marknadal on Dec 24, 2016 | hide | past | favorite | 48 comments

"If you plan to compare Redis to something else, then it is important to evaluate the functional and technical differences, and take them in account.

- Redis is a server: all commands involve network or IPC round trips. It is meaningless to compare it to embedded data stores such as SQLite, Berkeley DB, Tokyo/Kyoto Cabinet, etc ... because the cost of most operations is primarily in network/protocol management."


It's not meaningless to compare X to Redis for a particular app in terms of performance. The choice is an architectural decision and often architectural decisions are made by non-architects based on defaults.

That's not to say that the comparison here is bad or good. But it is valid given the diversity of use cases, the complexities of the technical differences you mention, and the pressure that blogs and other media place to adopt large scale solutions for small scale applications.

The old title claimed it was 50x faster than Redis with no analysis or discussion of tradeoffs. You had to read the fine print to even see the comparison was apples to oranges.

If the comparison were well handled, delving into the architecture tradeoffs, it could be fascinating. But this was just a cheap ploy to get clicks. Now the title has changed, my acidic tone is no longer justified.

The former title was in place when I wrote my comment.

For what it's worth, according to the guidelines for 'Show HN' an acidic tone is inappropriate. People expose themselves to significant psychological/social risk when posting their own work online as their own work. Show HN is intended to provide a safe productive space. In general, an acidic tone is a bit at odds with HN's general guideline of civility.

    expose themselves to significant psychological/social risk
Sounds too extreme. Criticism like "you are comparing apples to oranges", isn't the same being a political dissident or losing a loved one.

Absolutely! I mention this in one of my other comments, thanks for highlighting it. The performance difference is really a matter of the cache being closer (embedded in-memory into your app), not any magical algorithms. But this is still an optimization that many people miss! And it can significantly speed up your web app.

So why did you make an HN post to share a result that you admit doesn't mean anything?

I think it means a lot, it is a brilliant optimization that more databases should take advantage of and more web developers should exploit. There are a lot of valuable insights like this that I go over in my JS perf talk (since JS perf is so bad and full of so much pulpit-preaching versus fact-checking) https://www.youtube.com/watch?v=BEqH-oZ4UXI .

Here's the important piece of information, from the conflict resolution page:

"Compare their string values with JSON.stringify, choosing the greater of the two."

In other words -- don't ever use this database in production.

on edit: Here's why. CouchDB, which is also an AP database, puts conflict resolution into the hands of the user, but each document contains a revision, including information for the user to determine which is the correct revision. Many times, it's just "the correct one is the most recent one", but that's done in the user's code.

Making this kind of assumption (stringify and string comparison) for conflict resolution is dangerous because there will be scenarios when you will lose data no matter how many times you post to the database, guaranteed. Yes you can't be consistent, but you need to tell the user that there's a data conflict and have them deal with the logic.

Lexical comparison is only used when 2 writes happen at the exact same time.

Yes, it is a very basic hybrid vector/lexical/timestamp CRDT. The naive-ness of it is also its power, because it has strong eventually consistent guarantees.

However, it is intended to be the base case. You can build your own custom CRDTs on top to handle more specific business logic. For instance adding a DAG, a ledger, a simple linked list, etc. gun's default algorithm is not a silver bullet but it does "just work" out of the box for many web cases. See more here: https://github.com/amark/gun/wiki/Conflict-Resolution-with-G... .

Please especially read about our CAP tradeoffs (see the sidebar on the wiki).

Finally, it should be noted that because gun uses a graph data structure, JSON.stringify only ever applies to atomic primitive values (not complex/compound objects). So it is safe to apply (it is NOT safe to apply to complex objects).

> Lexical comparison is only used when 2 writes happen at the exact same time.

But you take time from the system. Your writes may converge, but to an nondeterministic state.

No, because we then offset it with a vector relative to the machine, even the great Kyle Kingsbury (Aphyr of Call Me Maybe Jepsen tests) tweeted about us about this fact (that we assume/treat time as being loose/unsafe): https://twitter.com/aphyr/status/646302398575587332 .

For context, this has been on Show HN before. This is the first, I believe: https://news.ycombinator.com/item?id=9076558

....I'll let everyone draw their own conclusions.

That's an awfully big claim. Under what conditions does it do well? What does it do badly? Where would I consider using this instead of Redis?

"Do work only once, then use centralized caching and cache updating to skip ever doing work again."

It looks like their benchmarks are particularly friendly to their cache size.

Redis is quite an awesome database, these claims here aren't to belittle Redis (it is one of the fastest out there!). Check out there performance (and make note of theirs, and mine, warnings that benchmark tests are not the end game) blog: https://redis.io/topics/benchmarks .

The biggest difference is that you still have to speak to Redis over the wire within a single machine, where Redis has it cached and replies. The (terrible terrible?) choice of JavaScript gets around this, once cached, it is directly in the app logic's memory - so you don't even have to perform any request. And that is where you get the speed gain.

Summary: GUN's caching is closer to the end result you are measuring, therefore it is faster. Not magical algorithms.

- What does GUN do badly?

Banking. Anything relating to strong consistency. GUN is an AP system which means you really shouldn't use it for anythign involving accounting, money, etc. - read more about that here: https://github.com/amark/gun/wiki/CAP-Theorem

Where should you use it instead of Redis? When you want realtime updates or graph data. There is a 10 minute guide on how to use graphs from a key/value, table or relational, or document oriented way here: https://github.com/amark/gun/wiki/graphs .

Developing for IE6 by itself leads to PTSD.

All the CP v. AP and algorithmic discussion aside, I'd be really interested in some actual concrete documentation on the actual peer-to-peer protocol involved (preferably without having to resort to testing my cursory understanding of Javascript by reading the source code). I'd be very interested in trying my hand at some alternate implementations (this sort of thing looks perfect for an Erlang/Elixir project), but if my only hope is to reverse-engineer the reference implementation, I'd be far more inclined to just stick to CouchDB or Barrel.

Great comment!

The P2P protocol's base case is an ad-hoc mesh network (how the messaging between peers works is explained here: https://github.com/amark/gun/wiki/Mesh-Network-Messaging-Alg... ).

Initial connections are handled statelessly, both peers exchange blobs of the requested/subscribed data to make sure they are in sync (not out of date / old / stale). On the wire, this is a GET command.

Then the connection becomes stateful, and only the changes/deltas/diff of data are pushed over the network, thus allowing for realtime updates. Over ther wire, this is a PUT command.

All data is represented as a graph (whether key/value, table, relational, document, or graph itself).

And that is about it! Other than, of course, the AP aspects and the hybrid vector/lexical/timestamp CRDT for conflict resolution. The architecture is genuinely quite simple, intended to be the foundation for more complex data structures to be built on top (other CRDTs, or DAGs, simple linked lists, etc.).

There is a half baked "How to port GUN" guide here: https://github.com/amark/gun/wiki/porting-gun .

Honestly, I cringe everytime I read about gundb.

Db made in nodejs? Check.

No mention of multiprocess? Check.

No mention of persistence? Check.(lol @ s3 persistence)

All tests failing ? Check.

Optimizing if/else? Check.

Faster than redis? Check.

Store non-optimized json in disk(not sure but looks liek it)? Check.

I cringe at every page on their docs(like sharding). It just feels like the templeos situation. Upvoted for discussion though.

One might take the claims — and hubristic comparisons to production-ready systems like Redis, Cassandra, Firebase and Riak — more seriously if the code weren't such an inscrutable ball of spaghetti [1]. I commented on this a year ago [2], and the author's response was... not great. Anyone who considers using this project should definitely read the code first and make up their own mind.

[1] https://github.com/amark/gun/blob/master/gun.js#L584

[2] https://news.ycombinator.com/item?id=10683467

Actually, we turned S3 into a timeseries store that handled 100M+ messages for $10/day (over 100GB+ data, all costs for processing, storage, and backup), check out this screencast - https://www.youtube.com/watch?v=x_WqBuEA7s8 .

Master/stable branch is not failing, and we definitely don't recommend using the JSON dump in production (we have warnings about this everywhere) it is intended to make local development easier.

NodeJS is single threaded, but because GUN has a P2P/decentralized architecture it can easily sync between machines or threads and handles concurrency and conflict resolution. Check out this demo - https://youtu.be/-i-11T5ZI9o .

"It just feels like the templeos situation."

Huh? TempleOS is explicitly not meant to be one's primary OS or even really anything besides an educational environment. Gun appears to actually be trying to target production use.

If anything, this is closer to a "MongoDB is web scale" situation.

I was talking about the "mental issues" angle. But he is charging 500/hour for consultancy so he is doing something right. This thread has more juice: https://www.reddit.com/r/programming/comments/5k3367/50x_fas...

"Store non-optimized json in disk(not sure but looks liek it)? Check."

How can you say "not sure" and "check" in the same line?

Because I read the docs but I didn't understand it so I wasn't sure, but had confidence that I was right based on the other points.

then I'd change the title - comes across as a direct comparison to redis.

Can we please let HN be free of click bait? :(

We've updated the title from “50X Faster Than Redis, Open Source P2P Firebase”.

We also detached this subthread from https://news.ycombinator.com/item?id=13249586 and marked it off-topic.

I suggest just flagging it, as I did, it'll be off the front page soon enough.

How can I do that? Rarely comment/moderate stuff in hacker news

You need more upvotes in your account probably. And then it will show up.

I think you need 30 karma before you can flag submissions or comments.

I encourage you to act in how you see is best fit for the HN community. My two counter arguments would be:

(1) How many people on HN do you think are aware that they can get a huge performance boost by using an embedded cache rather than just relying on Redis or others? Personally I think there is merit to this point alone, most people overlook it.

(2) I linked to Redis' performance page, and even with pipelining turned on they are doing 0.5M reads/sec while gun (on the exact same type of test) is doing 25M ~ 30M on the same device (MacBook Air). So "50X" is a claim that can be backed by evidence, not link bait. If it is shocking to anybody, then that is actually why we need to talk about (1) more, to get the word/discussion out.

Look, the title was 100% click bait. The word Redis didn't even exist on the page linked. It deserved to get kicked off, as it was (certainly it wasn't just my flag that did it).

It's not a comment on your technology, architecture or approach.

But you're not sketching out a clear, 2 minute overview of what your approach is, with the pros and cons, and intended use cases. You're just throwing around random synthetic numbers, and apples to oranges comparisons. It's handwaving.

> The word Redis didn't even exist on the page linked


"Compare to Redis at 0.5M ops/sec (cached reads, Macbook Air), even with pipeline optimizations turned on, here: https://redis.io/topics/benchmarks"

The title and content have been updated since I made these posts. The old headline was clickbait claiming the technology was 50x faster than Redis with no justification or analysis on the linked page.

Could you please explain yourself more? I tried to address these points earlier but I must have done a bad job at communicating, I apologize. Here:

(A) There is a 2 minute summary at the bottom of the page, with the pro/cons/dangers of benchmarking. But that doesn't mean benchmarking should be ignored, it is a legitimate science.

(B) My previous comment's (2) addressed the apples/orange, it would be nice if you could explain why it is handwaving when I tried preempt that directly (to which you ignored?).

(C) There is a fair question in my previous comment's (1), which you also didn't address - would you mind answering it?

I love having conversations with people. It would be easier for me to understand what you are trying to say, though, without being accused (ad-hominem) of being handwaving. Thanks, I look forward to hearing your thoughts.


Sorry all - celebrating Christmas early with family. I'll be back to respond to any/all questions/concerns. Really appreciate everybody joining in and giving feedback.

> Use PTSD (Performance Testing Speed Development) to micro-optimize your code without getting lost in micro-optimizations. More on PTSD to come in the future.

I've lost friends to suicide secondary to PTSD... Maybe we don't overload that particular initialism...

I agree with this, although I know that many engineers scoff at any attempt to point out inappropriate technical names/acronyms.

And unless the dev is not a native English speaker, then I strongly suspect this was done as a little attempt at humor. For many people, it's just not funny.

I mean, absolutely use jokes like this if you want, it's your right. But then don't be surprised when people discuss the inappropriate terminology instead of your project's technical merits.

Perhaps talking about PTSD without stigmatizing it is an alternative. PTSD is as much an ordinary human health condition as a torn ACL. Just less diagnosed and less treated.

Absolutely. That doesn't mean it's something that we should make light of though...

This past fall a friend and colleague suffering from PTSD took his own life with a gun... You can see why I feel the joke is in poor taste.

I think maybe we all experience PTSD to some degree or another. In 1998, my beloved and I did a video tape life review with her grandpa Fred. We turned on the camcorder, she said 'tell us about yourself,' and the first thing Grandpa Fred said was, 'My mother died when I was twelve and after that there was no picnic'. He was 86 at the time and it had been 74 years. He'd buried a wife and a son and the trauma of loss was still a life changing event. From time to time, I still grieve my friend who died in 1983.

My condolences on your loss and grief.

Just in case this is helpful: PTSD isn't simply the painful experience of trauma, which, you're right, most of us experience. PTSD is a reaction to trauma that is so intense it distorts cognition and triggers physical symptoms. The disruption of thought processes, even in moderate cases, can be so significant it's experienced as something like catatonia, or as a manic fight-or-flight response that can lead to suicidality. It's a condition you want to take very seriously.

(Not that you weren't!)

try passing out walking down street impromptu

you're not even close to the reality of it, you might or might not even be aware of how heavily knocked you are, and it cold go on doe months or years

... "coming to" mid crossing a six lane street narrowly missed by a vehicle... how did you get there? will you reenber the incident two minutes later? or ever?

vr immersion and opiods play vital roles in creating non traumatised memory paths in your brain, and since opiods can be substituted for vr immersion for pain relief in multi amputees, to a considerable degree, that link is worth checking too.

We're all going to die. The inability to make light of death seems like a refusal to acknowledge this fact. One perfectly valid perspective is that, when facing the impermanence of everything, nothing is off the table for poking fun at. Most of us have experienced tragedy and some of us can still laugh at it.

That doesn't mean we shouldn't ever be serious, but I'm not convinced that our collective sense of humor should be modulated down to the least tolerant and most sensitive among us.

You're right in general, of course, but you're talking to someone who just told us he recently lost a loved one in a traumatic manner. Responding to a specific pain with a general argument doesn't help, and being right makes it worse.

I am well acquainted with death. It is an occupational hazard of working on an ambulance. It is that very familiarity that puts first responders at a very high risk for PTSD (like the friend I mentioned above).

Dark humor is favorite pastime of folks working in fire/EMS/law enforcement. We are _very_ good at laughing at things that would probably be shocking to many. There is, however, a time and place for that. Making light of something like PTSD is not especially helpful to anyone.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact