OP here. Peloton has been posted here before, but didn't get any attention.
I think this database is very interesting even if you don't care about the time saving part of it, since it claims to be a hybrid (OLAP and OLTP), it implements postgres' wire protocol and it claims to compile queries to machine code using LLVM [1].
He's joking about the rehab I'm sure...that's his style (I like it personally).
BTW, here are the video lectures to the graduate database course he mentioned in the presentation, where students were developing features for Peloton as part of the course (they're great IMO):
Side note, but I really dislike the current trend (in in-memory databases, to be clear) of not bothering to include any real provisions for durability and justifying it by saying "NVRAM exists." It effectively doesn't for anyone who need to be able to deploy to off-the-shelf environments, and it's super expensive (and if you're going for performance, like most of the research projects are, countering by using the database in a clustered configuration would be counterproductive). Are there any cloud providers who provide NVRAM in any configuration?
But, it provides a dead easy way to publish a research work, claim insane speedups, and not worry about disk journals, caches, in flight data corner cases when VM is snapshoted, etc.
Does anyone know what happens after the query plan is generated in most database? I'm assuming individual step, like index scan, hashjoin are coded already and the plan steps are iterated and respective methods are called? So the execution steps are already compiled but the step traversal is kind of interpreted. With Peloton LLVM engine everything is merged together in a single sequence of machine code?
How much advantage does this give you? Is there really so many steps in the execution plan (the visible steps are usually < 50) but what about the internal actual compiled steps? Unless this is allowing merging and further simplification steps identifying redundant operation that gets trimmed of not sure where 100x performance improvement comes from.
Though I remember seeing the scala based in-memory query engine that was sort of doing simplification of the actual steps and doing very well in benchmark, maybe this is similar.
I wonder why they try to support both OLTP and OLAP workloads. Supporting both of these workloads requires too much work (both row and columnar storage types, different algorithms for both storage and querying etc) and they didn't even prove that autonomous systems (which is the main point of the project) can replace the existing databases.
Great question! There happens to be an autonomous mechanism for supporting hybrid workloads (OLTP & OLAP). Peloton supports hybrid storage layouts that are automatically and dynamically adapted over time based on the workload patterns. Row and columnar storage types are special cases of hybrid storage layouts.
This is a promising area of ongoing research. If you are curious about this kind of autonomous tuning of storage layout, you might want to check this out [1].
I guess it is a trend currently with modern MMDB's (MEMSQL,HyperDb etc) have support for both OLTP & OLAP workloads. You can checkout the git repo give it a spin see if it hold up to the claims.
However Peloton also aims to be an autonomous system. That's a lot for undergrad and grad students so I'm not sure if he wants Peloton to be stable in a near future.
This sure has a lot to live up to: trying to do two thing and do them
Well isn't very unix-y. There's a reason relational database are set up to have oltp schemas (highly notmalized tables for supporting transactions etc.) and olap schemas (star schemas for example, large sometimes flat fact and dimension tables etc.). Also I'm not sure about the learning part: any decent database these days will cache frequently used data and tables can be built as in-memory ones.
so from my understanding - the learning part isn't frequently used and caching, it's (attempting to be) generalized workload learning, the part of understanding that every DBA should do but usually doesnt.
If that is successfully and is even marginally able to predict workload skews, then the scheduling of operations can be significantly more efficient -- you're essentially reducing entropy in your database massively.
Any team of database admins/engineers worth their salary plans for capacity, fixes inefficient queries, And works with development on future goals for what they want out of the database layer.
Is very rare to have a DB that not need both oltp/olap workloads.
All db-based apps end fast the need requeriments for transactional code and move into "infinity-reporting-requests".
For certain ERP I work on in the past, it have at least 300 reports in the base package. Most request was for more reports specialized for each customers. And additions to the transactional code was in part driven by the need to add more data for the reports!
So, I think have both styles is exactly what "everyone" want. Even folks that get stuck with NOSQL databases.
---
I have thinking very much about this, I consider the ideal architecture is a relational-db with decoupled modules that work like this:
We certainly do :) There happens to be an autonomous mechanism for supporting hybrid workloads (OLTP & OLAP). Peloton supports hybrid storage layouts that are automatically and dynamically adapted over time based on the workload patterns. Row and columnar storage types are special cases of hybrid storage layouts. This is a promising area of ongoing research. If you are curious about this kind of autonomous tuning of storage layout, you might want to check this out [1].
Peloton is in fact the French word for platoon. I'd be highly surprised if the bike maker had the legal standing for issue infringement claims. Just as you can't copyright the word "bicycle", peloton is used widely enough that they should be fine. Then again, Jade the preprocessor was forced to rebrand as Pug.
Peleton is a word referring to the main group in a bicycle endurance race, so it's not the same as calling your software "Wal-Mart." In this case it seems like the fact they are in completely different markets would be sufficient.
I think this database is very interesting even if you don't care about the time saving part of it, since it claims to be a hybrid (OLAP and OLTP), it implements postgres' wire protocol and it claims to compile queries to machine code using LLVM [1].
[1]: https://www.youtube.com/watch?v=mzMnyYdO8jk (slideshow: http://www.cs.cmu.edu/~pavlo/slides/selfdriving-nov2016.pdf)