Show HN: PumpkinDB, an event sourcing database engine

yrashk · on Feb 26, 2017

It's a lower-level "database engine" that allows you to build different types of higher level databases based on a very simple foundation:

1) BTree-based K/V engine (which gives you an ability to iterate over lexicographically sorted keys) 2) Strong immutability guarantees (data can not be overwritten) 3) ACID transactions 4) Server-side executable imperative language that gives you a control over querying costs

In a sense, it's as much of a database constructor as different MUMPS systems (GT.M, for example: https://en.wikipedia.org/wiki/GT.M)

PumpkinDB also aims to provide a good set of standard primitives that help building more sophisticated databases, ranging from hashing to JSON support, and more to come.

vtuulos · on Feb 26, 2017

I am a big fan of this approach: Low-level, high-performance, immutable-by-default database engine as a building block, with bindings to multiple languages for easy application development.

TrailDB (http://traildb.io), which has many elements of event sourcing in it, follows this philosophy and it has proven to be pretty successful for its intended use cases.

I was delighted to notice that PumpkinDB has an imperative query language inspired by Forth. We recently open-sourced a similarly imperative query language inspired by AWK, http://github.com/traildb/reel :)

I will definitely follow PumpkinDB with great interest!

ams6110 · on Feb 26, 2017

Thanks. I read the entire "Documentation" page and still didn't feel confident that I understood what this really was. "Event sourcing" to me implies that it's generating events.

makmanalp · on Feb 26, 2017

Event Sourcing is a technical term, it's the idea that instead of mutating your database, you should instead have a log and insert a new log entry saying that you changed this value to that, etc. The idea is that this helps you do cool stuff like temporal queries (i.e. make a query "as the entire database looked like a month ago") or look at historical values and changes of things. This matters a lot in some fields. Of course then there's the matter of how to do this efficiently. You can build event sourcing on top of a regular RDBMS but if there is database-level support (as in PumpkinDB), then maybe some things are more efficient. Read more:

https://martinfowler.com/eaaDev/EventSourcing.html

gritzko · on Feb 26, 2017

Virtually every db works like that under the hood. They expose it differently, though. Kafka has nothing but log, for example.

Groxx · on Feb 27, 2017

e.g. https://dev.mysql.com/doc/refman/5.6/en/binary-log.html

solidsnack9000 · on Feb 27, 2017

They do but it's usually not exposed in a useful way. Postgres once had this feature... https://www.postgresql.org/docs/6.3/static/c0503.htm

anarazel · on Feb 27, 2017

We actually expose something to support event sourcing: https://www.postgresql.org/docs/current/static/logicaldecodi... - although that's very different from the old time travel feature.

Edit: missing word

mattdeboard · on Feb 26, 2017

This sounds like my (probably wrong/incomplete) understanding of Datomic?

grzm · on Feb 26, 2017

I believe you could use Datomic for event sourcing, though it wasn't explicitly designed for the job and may not be the best choice depending on the requirements of your system as a whole. Bobby Calderwood at Capital One has a talk on a system which includes both event sourcing (using Kafka, IIRC) and Datomic:

https://speakerdeck.com/bobbycalderwood/commander-better-dis...

mattdeboard · on Feb 27, 2017

Nice, thank you.

yrashk · on Feb 26, 2017

We are definitely looking to improve the documentation. This is 0.1, after all.

This project was started as a backend for a lazy event sourcing approach (https://blog.eventsourcing.com/lazy-event-sourcing-ed7e59007... , https://m.youtube.com/watch?v=aqv8d1pjmU8) and beyond.

The idea behind it is that it provides primitives for building systems that are designed around immutable events, journals, indexing, etc. Hence the current positioning. We thought it would be useful to be targeting fairly narrowly early on.

Either way, we will definitely need to expand on that in our materials

elcritch · on Feb 27, 2017

Amazing looking project! I've been daydreaming about using LMDB's memory mapping to provide a flexible low level db primitive for quite a while, figuring a combo of erlang like actors and flexible data scripting would be killer. Needless to say I love the design.

Any thoughts on whether this could be used to implement a Q/kdb+ like computation system? Seems like PumpkinScript could be extended with a library of computational array primitives. (https://news.ycombinator.com/item?id=13481824)

That being said, it'd be great to be able to read how the "actor" system is implemented. The documentation alludes to actors and pub/sub channels. Not sure I can help much at this time, but will keep an eye on it to see!

bhargav · on Feb 26, 2017

Well, its a fancy word. Thats basically what I took away from it. Also, I agree, read the whole doc and took way longer to understand what it does.

Clearly, they are smart, but bad copywriters :)

Please write better docs.

yrashk · on Feb 26, 2017

Doing our best, one step at a time! It's clear to me that we haven't spent enough time on documentation yet. The whole project is only 3 weeks old and our initial focus was to get a working version out with some level of documentation (a goal that we somewhat attained, I believe)

bhargav · on March 9, 2017

Fair enough. I'm gonna be done this shitty program and have a piece of papers with some fancy seal on it in a couple of months. I'd be happy to contribute.

Bookmarked until then.

yrashk · on Feb 26, 2017

Also, if this topic is interesting to anybody who's good at writing, we are very open to contributions of all kinds!

atombender · on Feb 26, 2017

Is the idea that apps talk to PumpkinDB in order to achieve this layering, or do you see it as a library?

Not often that you see MUMPS referenced on HN, by the way. It's one of those oddly productive niche languages that are (as far as I know) alive and well, but rarely encountered except if you work in that niche, e.g. finance or healthcare.

yrashk · on Feb 26, 2017

Some of the layering in terms of basic building blocks and higher level languages will become a standard part of PumpkinDB. For the rest, yes, it's expected that end applications or frameworks will compile higher level constructs to PumpkinScript.

I've been following MUMPS and using it for some ideas for some time and that's how some of their ideas became inspirations for PumpkinDB. As quirky M is, MUMPS was indeed oddly productive and I wanted to piggyback on that.

solidsnack9000 · on Feb 27, 2017

I hope that PumpkinDB is an alliterative reference to MUMPS :)

yrashk · on Feb 27, 2017

mtdewcmu · on Feb 26, 2017

It seems to serve approximately the same purpose as Berkeley DB.

yrashk · on Feb 26, 2017

Well, that would have been true if we had no PumpkinScript. Then it would have been just a tiny wrapper around LMDB that we are using as a storage backend, indeed.

mindcrash · on Feb 26, 2017

So basically something similar to FoundationDB?

yrashk · on Feb 26, 2017

To a certain degree, yes

dozzie · on Feb 26, 2017

From what I see in the published documentation, it's way too early to make it "show HN". If I didn't know what event sourcing is, I couldn't make heads or tails of your daemon, and even then, I don't understand how is user supposed to interact with it in a production-like deployment.

Normal_gaussian · on Feb 26, 2017

Its never too early to show HN.

Especially for open source projects how do you suggest you onboard people to get this thing progressed?

The voting mechanism helps us determine whether it is interesting anyway.

yrashk · on Feb 26, 2017

What to Submit

Show HN is for something you've made that other people can play with. HN users can try it out, give you feedback, and ask questions in the thread.

simonw · on Feb 26, 2017

I like how every commit message is formatted as a problem and a solution: https://github.com/PumpkinDB/PumpkinDB/commits/master

jdiez17 · on Feb 26, 2017

Indeed, it's a neat "hack" to force yourself to write better commit messages. I think this style of commit messages originated from the zeromq community: https://github.com/zeromq/libzmq/commits/master https://github.com/zeromq/zproto/commits/master

yrashk · on Feb 27, 2017

Yes, I picked this style up form Pieter Hintjens

DTrejo · on Feb 26, 2017

This is going to make for some awesome release notes and new feature release marketing. Very well done @yrashk.

makmanalp · on Feb 26, 2017

Very interesting! I think one thing that this would benefit from is a lot of usage examples, especially around pumpkinscript. I was reading recently about MUMPS and Caché and it's interesting to see a modern implementation of similar ideas.

One question - what is the storage layout like? Do you have plans to support efficient range queries at all?

yrashk · on Feb 26, 2017

We definitely need better documentation! That's for sure. We only did a basic one just to get the basics out.

As for the layout -- everything is built around btree k/v, and the original idea behind PumpkinDB is to give primitives that are useful in building databases, indices in particular. The expectation is that, over time, we will grow our library to have more sophisticated primitives, including ready-made indices of different kind.

Does this help?

playing_colours · on Feb 26, 2017

Written in Rust :) Inspiring, I am learning Rust now to try implementing HDFS-like storage.

nik736 · on Feb 26, 2017

What's an actual use-case for this? I am reading the documentation but still don't see why I should use it and what the actual advantages compared to current solutions are.

yrashk · on Feb 26, 2017

It was built as a kind of a database constructor for event sourced / journalled systems. It's design inspiration is largely stemming from MUMPS which provided a great ("oddly productive") combination of a database and a programming language.

Being a constructor it's also a great tool for building applications with a better control over querying mechanics (since everything is actually described in PumpkinScript)

tyingq · on Feb 26, 2017

It sounds sort of like "overlayfs" for data. Which might allow a non-destructive, no-copy way to...

- Do what-if analysis. Change the price of oil at some point in history and see how your financials would have played out from that point forward.

- Fork your database and have two live copies acting on different data or rules for a live comparison...without all the plumbing overhead. Perhaps having one set work with a fiscal year that is calendar year, and another with a different fiscal boundary.

fiatjaf · on Feb 26, 2017

I don't get this whole "never overwrite data" thing, including Datomic, for example.

Isn't the disk space needed for these schemes enormous?

solidsnack9000 · on Feb 27, 2017

I have reservations, too: it's important to be able to remove data even though disk is cheap.

* Removing very old data is a reasonable hedge for user privacy.

* Sometimes confidential data makes its way into the data set and needs to be removed.

* Old event data is often not useful but can impact performance or cost just the same. For example, one needs to allocate an EBS volume on AWS volume with a certain level of performance; but the cost of that is `IOPs * GBs`, not `IOPs * useful GBs`.

* Replicating and backing up the dataset takes longer and longer as the application grows.

yrashk · on Feb 27, 2017

I agree, this is an importabt aspect.

Our plan in PumpkinDB is to add key value association retirement, subject to defined retirement policies.

christophilus · on Feb 27, 2017

I have the same question. Presumably, you could store fairly efficiently doing a git-like diff-based storage scheme or something. But I would be interested in hearing analysis of this.

Salgat · on Feb 27, 2017

Exactly, events only represent changes in state. Furthermore, you aren't storing everything in the events; the business logic on services can add and build up models to much more than what the events hold (for example, an event could hold an id that the service uses to populate a bunch of properties into the actual model snapshot, something as simple as an order number would map to all kinds of order details).

digitalzombie · on Feb 26, 2017

Holy cow it's in Rust.

I'm doing a thesis in Classification Trees, doing R and hoping to do the backend of the R package in Rust (it looking to be C++). I'll look through the source code of this to see it's tree implementation. Probably used the rust standard library's implementation of BTree?

elcritch · on Feb 27, 2017

Documentation says they use LMDB for the backend. Looking over the documentation, it looks like you could readily use pumpkin be directly to implement the database/datacaching scheme and interface with it from R. Unless your thesis is on implementation of B-trees, definitely try bootstrapping on something like this first. BTW, lmdb provides memory mapping which can be very fast for computations.

cestith · on Feb 26, 2017

Where would I use this in place of, say, Kafka and Samza?

stonewhite · on Feb 27, 2017

Looks very interesting. I'd love to use it once it supports Akka persistence. Is this on the roadmap?

JoelSanchez · on Feb 26, 2017

What an interesting commit message format.