
Umbra: an ACID-compliant database built for in-memory analytics speed - pbowyer
https://umbra-db.com/
======
brenden2
There's been an explosion of new DBs, but I haven't found anything that really
beats Postgres or MariaDB for most workloads. The main advantages of these
battle tested DBs is that they're easy to operate, well understood, full
featured, and can handle most workloads.

It does make me wonder what will be the next big leap in DB technology. Most
of the NoSQL or distributed DB implementations have a bunch of limitations
which make them impractical (or not worth the trade offs) for most
applications, IMO. Distributed DBs are great until things go wrong, and then
you have a nightmare on your hands. It's a lot easier to optimize simple
relational DBs with caching layers, and adding read replicas scales quite
effectively too.

The only somewhat recent new DB that comes to mind which had a really
interesting model was RethinkDB, although it suffered from a variety of
issues, including scale problems.

Anyway, these days I stick with Postgres for 99% of things, and mix in Redis
where needed for key/value stuff.

~~~
heipei
The issue that I frequently run into is not that I'm looking for a fancy
distributed/sharded database because of reasons of performance, but because I
need to store large amounts of data in a way that allows me to grow this
datastore by "just adding boxes" while still retaining a few useful database
features. I'd love to use Postgres but eventually my single server will run
out of disk space.

Now, one approach is to just dismiss this use-case by pointing at DynamoDB and
similar offerings. But if for some reason you can't use these hosted
platforms, what do you use instead?

For search, ElasticSearch fortunately fits the bill, the "just keep adding
boxes" concept works flawlessly, operating it is a breeze. But you probably
don't want to use ElasticSearch as your primary datastore, so what do you use
there? I had terrible experiences operating a sharded MongoDB cluster and my
next attempt will be using something like ScyllaDB/Cassandra instead since
operations seem to require much less work and planning. What other databases
would offer that no-advance-planning scaling capability?

Somewhat unrelated, by I often wonder what one were to use for a
sharded/distributed blob store that offers basic operations like "grep across
all blobs" with different query-performance than a real-time search index like
ElasticSearch. Would one have to use Hadoop or are there any alternatives
which require little operational effort?

~~~
DelightOne
What is the difference between one server with many TB vs multiple servers
with less space?

~~~
Drdrdrq
With multiple servers you can add space to each of them. With a single one
there is a much lower limit to what you can do - that's the idea behind
vertical/horizontal scalability. That, and the systems with multiple nodes can
be made more reliable than single node servers.

------
lichtenberger
Great work and very interesting ideas. I'm working on a versioned database
system[1] which offers similar features and benefits:

    
    
        - storage engine written from scratch
        - completely isolated read-only transactions and one read/write transaction concurrently with a single lock to guard the writer. Readers will never be blocked by the single read/write transaction and execute without any latches/locks.
        - variable sized pages
        - lightweight buffer management with a "kind of" pointer swizzling
        - dropping the need for a write ahead log due to atomic switching of an UberPage
        - rolling merkle hash tree of all nodes built during updates optionally
        - ID-based diff-algorithm to determine differences between revisions taking the (secure) hashes optionally into account
        - non-blocking REST-API, which also takes the hashes into account to throw an error if a subtree has been modified in the meantime concurrently during updates
        - versioning through a huge persistent and durable, variable sized page tree using copy-on-write
        - storing delta page-fragments using a patented sliding snapshot algorithm
        - using a special trie, which is especially good for storing records sith numerical dense, monotonically increasing 64 Bit integer IDs. We make heavy use of bit shifting to calculate the path to fetch a record
        - time or modification counter based auto commit
        - versioned, user-defined secondary index structures
        - a versioned path summary
        - indexing every revision, such that a timestamp is only stored once in a RevisionRootPage. The resources stored in SirixDB are based on a huge, persistent (functional) and durable tree 
        - sophisticated time travel queries
    

As I'm spending a lot of my spare time on the project and would love to spend
even more time, give it a try :-)

Any help is more than welcome.

Kind regards Johannes

[1] [https://sirix.io](https://sirix.io) and
[https://github.com/sirixdb/sirix](https://github.com/sirixdb/sirix)

~~~
erichocean
> _\- completely isolated read-only transactions and one read /write
> transaction concurrently with a single lock to guard the writer. Readers
> will never be blocked by the single read/write transaction and execute
> without any latches/locks._

> \- _variable sized pages_

> \- _lightweight buffer management with a "kind of" pointer swizzling_

> \- _dropping the need for a write ahead log due to atomic switching of an
> UberPage_

LMDB made those same design choices and is extremely fast/robust.

~~~
lichtenberger
In my particular case it was also a design decision made back in 2006 or 2007
already. It's designed for fast random reads from the ground up due to the
versioning focus (reading page-fragments from different revisions, as it just
stores fragments of record-pages). I'll change the algorithm slightly to fetch
the fragments in parallel (should be fast on modern hardware, that is even
SSDs and in the future for instance with byte-addressable non-volatile
memory).

------
senderista
For some context, this project is from one of the leading research groups in
high-performance main-memory OLAP databases. Neumann’s 2011 paper, in
particular, basically invented the modern push-driven operator-collapsing
approach to query compilation.

------
jandrewrogers
This is a tidy and thoughtful database architecture. The capabilities and
design are broadly within the spectrum of the mainstream. At this point in
database evolution, it is well established that sufficiently modern storage
architecture and hardware eliminates most performance advantages of in-memory
architectures. However, many details of the design in the papers indicate that
this database will not be breaking any records for absolute performance on a
given hardware quanta.

The most interesting bit is the use of variable size buffers (VSBs). The value
of using VSBs is well known -- it improves cache and storage bandwidth
efficiency -- but there are also reasons it is rarely seen in real-world
architectures, and those issues are not really addressed here that I could
find. Database companies have been researching this concept for decades. If
one is unwilling to sacrifice absolute performance, and most database
companies are not, the use of VSBs creates myriad devilish details and edge
cases.

There are techniques that achieve high cache and storage bandwidth efficiency
without VSBs (or their issues) but they are mostly incompatible with B+Tree
style architectures like the above.

~~~
lichtenberger
I think with modern hardware as for instance the now available first byte-
addressable NVM variable sized pages and buffers should in theory get more
widespread use and the reading/writing granularity gets more fine granular in
the next years. I think as of now the Intel Optane memory however still has to
fetch 256 Bytes at the minimum.

However, variable sized pages also allow page compression.

Can you give us some links to the mentioned issues and techniques that achieve
high cache and storage bandwith efficiency without VSBs?

~~~
jandrewrogers
I can explain it, the methods are straightforward. As with most things in
database engine design, much of what is done in industry isn't in the
literature.

The alternative to VSBs is for each logical index node to comprise a dynamic
set of independent fixed buffers, with each buffer having an independent I/O
schedule. This enables excellent cache efficiency because 1) space is
incrementally allocated and 2) the cache only contains parts of logical node
that you actually use. References to the underlying buffers remain valid even
if the index node is resized. Designs vary but 8 to 64 buffers per index node
seems to be the anecdotal range. The obvious caveat is that storage structures
that presume an index node is completely in buffer, such as ordered trees,
don't work well. Since some newer database designs have no ordered trees at
all under the hood, this is not necessarily a problem. There are fast access
methods that work well in this model.

The main issue with VSBs is that it is difficult to keep multiple references
to the page consistent, some of which may not even be in memory, since
critical metadata is typically in the reference itself. A workaround is to
only allow a single reference to a page, but this restriction has an adverse
impact on some types of important architectural optimization. The abstract
objective makes sense, but no one that has looked into it has come up with a
VSB scheme that does not have these tradeoffs for typical design cases. That
said, VSBs are sometimes used in specialized databases where storage
utilization efficiency (but not necessarily cache efficiency or performance)
is paramount, though designed a bit differently than Umbra.

The reason to use larger page sizes, in addition to being more computationally
efficient, is that it gives better performance with cheaper solid-state
storage -- storage costs matter a lot. The sweet spot for price-performance is
inexpensive read-optimized flash, which works far better for mixed workloads
than you might expect if your storage engine is optimized for it. Excellent
database kernels won't see much boost from byte-addressable NVM and people
using poor database architectures don't care enough about performance to pay
for expensive storage hardware, so it is a bit of a No Man's Land.

------
rogerb
I love seeing this: there are massive opportunities to build fundamentally
differently architected database based on evolving computer architectures
(ram, persistent ram, GPU, heck - even custom hardware) as well as improved
understanding of ACID in distributed environments. SQL remains an important
API :)

~~~
holstvoogd
Potgresql has some GPU (&nvme) support already through an extension:
[https://github.com/heterodb/pg-strom](https://github.com/heterodb/pg-strom)

------
gbrown_
From the paper.

> and subsequently allow the kernel to immediately reuse the associated
> physical memory. On Linux, this can be achieved by passing the MADV_DONTNEED
> flag to the the madvise system call.

Shouldn't this be MADV_FREE? This instantly reminded me of this classic Bryan
Cantrill talk
[https://youtu.be/bg6-LVCHmGM?t=3529](https://youtu.be/bg6-LVCHmGM?t=3529)

Edit: It seems that the Linux behavior is relied upon? From later in the
paper.

> Note that it is even legal for a page to be unloaded while the page con-tent
> is being read optimistically. This is possible since the virtual memory
> region reserved for a buffer frame always remains valid (see above), and
> read accesses to a memory region that was marked with the MADV_DONTNEED flag
> simply result in zero bytes. No additional physical memory is allocated in
> this case, as all such accesses are mapped to the same zero page

------
jojo2000
> It is a drop-in replacement for PostgreSQL.

Well, that's a bold assumption as pg is speaking one of the richest sql
dialects out there. And it also means it supports pg WAL protocol ?

The product is backed by solid research, so I suppose that there must be some
powerful algorithms built-in, with a good coupling with hardware [1].

So the last question is how the code is made and tested, because good
algorithms are not enough for a having a solid codebase. pg+(redis/memcached)
is battle-tested.

Seems to use some common ideas with pg such as query jit compilation but mixes
it with another approach.

> Umbra provides an efficient approach to user-defined functions.

possible in many languages using pg.

> Umbra features fully ACID-compliant transaction execution.

jepsen test maybe ?

Didn't harvest the clustering part neither.

[1] [http://cidrdb.org/cidr2020/papers/p29-neumann-
cidr20.pdf](http://cidrdb.org/cidr2020/papers/p29-neumann-cidr20.pdf)

------
gavinray
Perhaps I am a bit slow, but could someone else with better understanding ELI5
what benefits this provides over Postgres?

I would really appreciate it.

The only bit I really understood was:

    
    
        The system automatically parallelizes user functions
    

Now granted, I only understand how DBs work from a user-facing side so that
might be a barrier here.

~~~
maxmcd
The abstract here is helpful: [http://cidrdb.org/cidr2020/papers/p29-neumann-
cidr20.pdf](http://cidrdb.org/cidr2020/papers/p29-neumann-cidr20.pdf)

------
pachico
I'm still surprised that the industry barely knows about ClickHouse. Very few
times I had the impression the be adopting a game changer technology and
that's the case with ClickHouse. We only currently use it for analytical
purposes but it's been proven that it's very valid solution for logs storage
or as time-series DB. I already have in my roadmap to migrate ElasticSearch
clusters (for logs) and InfluxDB to ClickHouse.

~~~
snikolaev
Does Clickhouse already have an inverted index capabilities or how are you
going to search for logs containing "error"? Just LIKE's performance is going
to be enough? Or it's not the case for you?

------
polskibus
Amazing! I wonder if this is going to be acquired in similar way like HyPer.
Commercialization of HyPer took a lot of resources, I wonder what state is
Umbra in.

~~~
killberty
Thomas Neumann told me in person, that they will not sell Umbra

~~~
KarlKemp
Isn't that exactly what the Instagram founders said?

I'm perfectly willing to belief that they have no intention of selling. But
that's really not a promise one can easily make. Even if you're capable of
withstanding the allure of whatever large sum someone is offering, it's always
possible to be faced with a choice of selling or shutting down, or selling or
not being able to afford your spouse's/child's/own sudden healthcare needs.

~~~
nicolas_t
Hi, just a quick note that your comment on internet in the thread about Turkey
is dead (shadowbaned) despite being relevant. You should contact the hn team
at hn@ycombinator.com

~~~
mkl
You can also resurrect people's auto-mod-hidden comments (frequently new
users, especially with links) by clicking the time to get to the comment's
page, and clicking "vouch". (Needs >30 karma.)

~~~
nicolas_t
Cool, I never knew that. Thanks!

------
jayd16
Anyone know of any benchmarks are specific features this has over other DBs?
"Built for in-memory speed" might as well say "web scale."

That browser based query analyzer is cool.

------
bbulkow
the innovation has happened at cloud database projects. dynamo, redshift,
cosmos, big query, etc. they don't publish their code, but there is plenty
happening under the covers. at this point, i think anyone who acquires
machines and installs software has a desire for pain, and isn't making a
sensible business trade off - unless you are an infrastructure company or are
operating at high scale.

------
adriancooney
> Drop in replacement for PostgreSQL

Well that's impressive. Can I just drop this into my test suite and get a mega
speed improvement? Could be worth it.

------
jwildeboer
Where’s the source code? It’s open source, I guess?

------
based2
[http://hyper-db.com/](http://hyper-db.com/)

~~~
Jweb_Guru
That group has been doing interesting and industry-relevant work for a long
time. Not surprised they're trying to commercialize it as existing databases
didn't really pick it up.

------
mkaufmann
Hyper, which was created by the same group, can now be used for free with the
Tableau Hyper API [https://help.tableau.com/current/api/hyper_api/en-
us/index.h...](https://help.tableau.com/current/api/hyper_api/en-
us/index.html)

I especially like the super fast CSV scanning!

------
milesward
This thread makes it pretty clear to me that managed DB services from the
cloud providers are a Very Good Idea on their part.

------
mamcx
One aspect where rdbms development have forgotten to work is to become a real
contender for Access/dBase family.

You will see a lot of people chasing the "Facebook wanna-be" kind of
workloads.

I work with small/medium companies (or that are big, but with < 1TB of data).
I bet 90% can't pass the first stages of data manipulation:

\- Most(all?) rdbms have the same datatypes, mean: Use of nulls (bad) not
algebraic types (sad), very unfriendly means to model data/business logic

\- Use only SQL, that is impractical to anything bigger than basic queries. I
work before with foxpro: I could do ALL THE APP with it, including GUIS,
reports, etc. So to say SQL is disappointing is to say the less.

\- All the engine stuff is a black box. That is great, until you wanna do your
own index, store columnar data, save array or text or whatever you want, plug
into the query executor and do your own stuff, etc.

You know, if you have JS and I tell you you can't code your own linked list,
you will ditch that quickly. Sometimes, if you db engine allow to plug into
the storage I could save my graphs for the only time I need it, instead of
hack around putting it in tables or worse: Bring ANOTHER db engine to make my
life hard.

Wait! Why?

All that stuff that some put in the middle-wares, model or controllers? In any
other product you will reject the tool if you can't, but rdmbs FORCE to use
anything else to finish it, despite the fact most run in the same box.

\- Everyone add your own auth tables/logic, because the one implemented in
rdbms is for a use case that not exist anymore. Then do it wrong, of course

\- We are in the http world, but rdbms need something else for that.

\- Import/Export data is still SAD.

\- Import/Export data outside the few half-backed attempts (like csv, that a
lot of time better you do it with python) is impossible in most. Using foreign
adapters could work yet, you need to step out the black box, bring the
adapter, compile it, install it, then use it. You need to become a C++
developer, despite you pretend to be a SQL one.

That is making thing worse because:

\- Not exist a "rdbms" packager managers.

Look, how great if you just:

    
    
        db install auth-jwt
        db install csv-import
    

end. Like everyone else

\- Making of forms and reports. You can bet any company (even users!) will
kill if their dbs allow to create reports and forms. Yep, alike access. Yep,
that is because access is still a thing with the most weak db engine in town.

\- You wanna allow to send emails, connect to external apis, call system
services, etc.

But why? Is not that problematic? Well, if you JS (a language FOR THE BROWSER)
allow it, why not your db? I live in that world before (foxpro) and it work
GREAT.

\---

A lot of this stuff is because the rdbms are look with a too narrow POV. Is
crazy: People use half-finished NoSQL engines and are happy making his own
query engine, yet talking about do else than SQL (or: plug into the query
parser so i enhance it) will sound crazy to some.

Rdbms get leap-frog by NoSQL because until very late, get stuck in a mindset
and use cases of the 80s.

Not exist ANYTHING that say you rdbms must be like all the others.

Broad the view is what, I think, rdbms need to do to get invigorated, and
considering that are also performant, and very good, then will made to conquer
the world!

------
maitredusoi
isn't sqlite doing the same ???

