
Show HN: PumpkinDB, an event sourcing database engine - yrashk
http://pumpkindb.org
======
yrashk
It's a lower-level "database engine" that allows you to build different types
of higher level databases based on a very simple foundation:

1) BTree-based K/V engine (which gives you an ability to iterate over
lexicographically sorted keys) 2) Strong immutability guarantees (data can not
be overwritten) 3) ACID transactions 4) Server-side executable imperative
language that gives you a control over querying costs

In a sense, it's as much of a database constructor as different MUMPS systems
(GT.M, for example:
[https://en.wikipedia.org/wiki/GT.M](https://en.wikipedia.org/wiki/GT.M))

PumpkinDB also aims to provide a good set of standard primitives that help
building more sophisticated databases, ranging from hashing to JSON support,
and more to come.

~~~
ams6110
Thanks. I read the entire "Documentation" page and still didn't feel confident
that I understood what this really was. "Event sourcing" to me implies that
it's generating events.

~~~
makmanalp
Event Sourcing is a technical term, it's the idea that instead of mutating
your database, you should instead have a log and insert a new log entry saying
that you changed this value to that, etc. The idea is that this helps you do
cool stuff like temporal queries (i.e. make a query "as the entire database
looked like a month ago") or look at historical values and changes of things.
This matters a lot in some fields. Of course then there's the matter of how to
do this efficiently. You can build event sourcing on top of a regular RDBMS
but if there is database-level support (as in PumpkinDB), then maybe some
things are more efficient. Read more:

[https://martinfowler.com/eaaDev/EventSourcing.html](https://martinfowler.com/eaaDev/EventSourcing.html)

~~~
gritzko
Virtually every db works like that under the hood. They expose it differently,
though. Kafka has nothing but log, for example.

~~~
solidsnack9000
They do but it's usually not exposed in a useful way. Postgres once had this
feature...
[https://www.postgresql.org/docs/6.3/static/c0503.htm](https://www.postgresql.org/docs/6.3/static/c0503.htm)

~~~
anarazel
We actually expose something to support event sourcing:
[https://www.postgresql.org/docs/current/static/logicaldecodi...](https://www.postgresql.org/docs/current/static/logicaldecoding.html)
\- although that's very different from the old time travel feature.

Edit: missing word

------
simonw
I like how every commit message is formatted as a problem and a solution:
[https://github.com/PumpkinDB/PumpkinDB/commits/master](https://github.com/PumpkinDB/PumpkinDB/commits/master)

~~~
jdiez17
Indeed, it's a neat "hack" to force yourself to write better commit messages.
I think this style of commit messages originated from the zeromq community:
[https://github.com/zeromq/libzmq/commits/master](https://github.com/zeromq/libzmq/commits/master)
[https://github.com/zeromq/zproto/commits/master](https://github.com/zeromq/zproto/commits/master)

~~~
yrashk
Yes, I picked this style up form Pieter Hintjens

------
makmanalp
Very interesting! I think one thing that this would benefit from is a lot of
usage examples, especially around pumpkinscript. I was reading recently about
MUMPS and Caché and it's interesting to see a modern implementation of similar
ideas.

One question - what is the storage layout like? Do you have plans to support
efficient range queries at all?

~~~
yrashk
We definitely need better documentation! That's for sure. We only did a basic
one just to get the basics out.

As for the layout -- everything is built around btree k/v, and the original
idea behind PumpkinDB is to give primitives that are useful in building
databases, indices in particular. The expectation is that, over time, we will
grow our library to have more sophisticated primitives, including ready-made
indices of different kind.

Does this help?

------
playing_colours
Written in Rust :) Inspiring, I am learning Rust now to try implementing HDFS-
like storage.

------
nik736
What's an actual use-case for this? I am reading the documentation but still
don't see why I should use it and what the actual advantages compared to
current solutions are.

~~~
yrashk
It was built as a kind of a database constructor for event sourced /
journalled systems. It's design inspiration is largely stemming from MUMPS
which provided a great ("oddly productive") combination of a database and a
programming language.

Being a constructor it's also a great tool for building applications with a
better control over querying mechanics (since everything is actually described
in PumpkinScript)

------
fiatjaf
I don't get this whole "never overwrite data" thing, including Datomic, for
example.

Isn't the disk space needed for these schemes enormous?

~~~
solidsnack9000
I have reservations, too: it's important to be able to remove data even though
disk is cheap.

* Removing very old data is a reasonable hedge for user privacy.

* Sometimes confidential data makes its way into the data set and needs to be removed.

* Old event data is often not useful but can impact performance or cost just the same. For example, one needs to allocate an EBS volume on AWS volume with a certain level of performance; but the cost of that is `IOPs * GBs`, not `IOPs * useful GBs`.

* Replicating and backing up the dataset takes longer and longer as the application grows.

~~~
yrashk
I agree, this is an importabt aspect.

Our plan in PumpkinDB is to add key value association retirement, subject to
defined retirement policies.

------
digitalzombie
Holy cow it's in Rust.

I'm doing a thesis in Classification Trees, doing R and hoping to do the
backend of the R package in Rust (it looking to be C++). I'll look through the
source code of this to see it's tree implementation. Probably used the rust
standard library's implementation of BTree?

~~~
elcritch
Documentation says they use LMDB for the backend. Looking over the
documentation, it looks like you could readily use pumpkin be directly to
implement the database/datacaching scheme and interface with it from R. Unless
your thesis is on implementation of B-trees, definitely try bootstrapping on
something like this first. BTW, lmdb provides memory mapping which can be very
fast for computations.

------
cestith
Where would I use this in place of, say, Kafka and Samza?

------
stonewhite
Looks very interesting. I'd love to use it once it supports Akka persistence.
Is this on the roadmap?

------
JoelSanchez
What an interesting commit message format.

