1) BTree-based K/V engine (which gives you an ability to iterate over lexicographically sorted keys)
2) Strong immutability guarantees (data can not be overwritten)
3) ACID transactions
4) Server-side executable imperative language that gives you a control over querying costs
In a sense, it's as much of a database constructor as different MUMPS systems (GT.M, for example: https://en.wikipedia.org/wiki/GT.M)
PumpkinDB also aims to provide a good set of standard primitives that help building more sophisticated databases, ranging from hashing to JSON support, and more to come.
TrailDB (http://traildb.io), which has many elements of event sourcing in it, follows this philosophy and it has proven to be pretty successful for its intended use cases.
I was delighted to notice that PumpkinDB has an imperative query language inspired by Forth. We recently open-sourced a similarly imperative query language inspired by AWK, http://github.com/traildb/reel :)
I will definitely follow PumpkinDB with great interest!
Edit: missing word
This project was started as a backend for a lazy event sourcing approach (https://blog.eventsourcing.com/lazy-event-sourcing-ed7e59007... , https://m.youtube.com/watch?v=aqv8d1pjmU8) and beyond.
The idea behind it is that it provides primitives for building systems that are designed around immutable events, journals, indexing, etc. Hence the current positioning. We thought it would be useful to be targeting fairly narrowly early on.
Either way, we will definitely need to expand on that in our materials
Any thoughts on whether this could be used to implement a Q/kdb+ like computation system? Seems like PumpkinScript could be extended with a library of computational array primitives. (https://news.ycombinator.com/item?id=13481824)
That being said, it'd be great to be able to read how the "actor" system is implemented. The documentation alludes to actors and pub/sub channels. Not sure I can help much at this time, but will keep an eye on it to see!
Clearly, they are smart, but bad copywriters :)
Please write better docs.
Bookmarked until then.
Not often that you see MUMPS referenced on HN, by the way. It's one of those oddly productive niche languages that are (as far as I know) alive and well, but rarely encountered except if you work in that niche, e.g. finance or healthcare.
I've been following MUMPS and using it for some ideas for some time and that's how some of their ideas became inspirations for PumpkinDB. As quirky M is, MUMPS was indeed oddly productive and I wanted to piggyback on that.
Especially for open source projects how do you suggest you onboard people to get this thing progressed?
The voting mechanism helps us determine whether it is interesting anyway.
Show HN is for something you've made that other people can play with. HN users can try it out, give you feedback, and ask questions in the thread.
One question - what is the storage layout like? Do you have plans to support efficient range queries at all?
As for the layout -- everything is built around btree k/v, and the original idea behind PumpkinDB is to give primitives that are useful in building databases, indices in particular. The expectation is that, over time, we will grow our library to have more sophisticated primitives, including ready-made indices of different kind.
Does this help?
Being a constructor it's also a great tool for building applications with a better control over querying mechanics (since everything is actually described in PumpkinScript)
- Do what-if analysis. Change the price of oil at some point in history and see how your financials would have played out from that point forward.
- Fork your database and have two live copies acting on different data or rules for a live comparison...without all the plumbing overhead. Perhaps having one set work with a fiscal year that is calendar year, and another with a different fiscal boundary.
Isn't the disk space needed for these schemes enormous?
* Removing very old data is a reasonable hedge for user privacy.
* Sometimes confidential data makes its way into the data set and needs to be removed.
* Old event data is often not useful but can impact performance or cost just the same. For example, one needs to allocate an EBS volume on AWS volume with a certain level of performance; but the cost of that is `IOPs * GBs`, not `IOPs * useful GBs`.
* Replicating and backing up the dataset takes longer and longer as the application grows.
Our plan in PumpkinDB is to add key value association retirement, subject to defined retirement policies.
I'm doing a thesis in Classification Trees, doing R and hoping to do the backend of the R package in Rust (it looking to be C++). I'll look through the source code of this to see it's tree implementation. Probably used the rust standard library's implementation of BTree?