Hacker News new | past | comments | ask | show | jobs | submit login

I always felt the value of an OLAP cube was uncovering questions you may not know to ask. If you’ve got a cube with a good interface, it means more people in your organization can inspect the data looking for trends across dimensions that might not be obvious. As an example in a previous organization, someone just playing with an olap cube once found that certain DUI searches became much more common on Weekend mornings (perhaps obvious in retrospect). That insight allowed better targeting and sales, and made concrete what otherwise is just a guess.

Sometimes just exploring the data allows stuff like that to pop out.

Are there similar interfaces with columnar stores? Or do all the analytics need to be pre compiled? The ability to slice/dice/filter/aggregate your data in perhaps non obvious ways is really the value of business analytics in my opinion.




I don’t think this benefit is unique to OLAP cubes. This is just generally the reason you have a data warehouse at all. The implementation is up to you but if you model the business process then you will be ready to answer questions that have not been asked.


But are there tools that allow non developers to analyze the data ad-hoc? If there’s a gatekeeper in either direction than it really slows down the process of understanding the data.

Cubes usually allow ad-hoc real-time queries that business users can play around with and explore.


That definition doesn’t align with my experience or perception. Maybe I am wrong. Microsoft definitely sells a suite of tools for creating OLAP Cubes and presenting the data.

What you are describing is just a data warehouse with a reporting tool. There’s dozens of options for building something like that.

A data warehouse is always “OLAP” but it isn’t often a “cube”.


The difference as I understand it is that a cube specifically is the multidimensional schema definition of your data set. So for instance typically in an OLTP system you’d have customer data, product data, sales data, performance data, etc. Usually that data is in separate tables.

OLAP is the combined methods that allow you to analyze the data, roll it up, aggregate it, combine it across dimensions etc. Usually by using a star schema.

A OLAP Cube, then, is the full multidimensional schema and relations of all the data. Typically a cube lets you define how each type of field is aggregated, whether it’s summing count fields, averaging dollar fields, or special rules for date fields etc. In addition you specify hierarchies of the data, so for instance City belongs to State belongs to Country, etc.

Once it’s all defined and the ETLs transfer your OLTP data into the OLAP system, then the cube allows for complicated ad-hoc queries. An example is a spreadsheet like cube explorer that allowed you to “slice”, “dice”, “drill”, and “pivot” in real time.

https://en.wikipedia.org/wiki/OLAP_cube#Operations

The queries are run by the tool as you change the “cube spreadsheet” real time. Typically aggregates are precomputed to speed things up.

So as I understand columnar data systems, especially if we’re talking not defining cube multidimensional schema, then some database engineer needs to program each view of the data as requested by the consumers of the data.

So as I understood the article, columnar databases let you get away with all the slicing and dicing etc in a performant way due to the database architecture and speed. Without defining the multidimensional cube schema upfront. The problem is, that seems to me to mean, those definitions are just being created later by the query designers, instead of upfront. I haven’t seen, nor does the article talk about any tools that can replace the tools that OLAP cube systems typically have that let you take advantage of that predesigned cube schema.

Here’s a single example called JPivot in action. The is running MDX queries in the cube as it’s manipulated:

https://youtu.be/dLccogT_phY


Whatever approach you take, someone or something has to build a model of the data for end users to slice and dice.

The bit you’re missing is that a “cube” usually refers to a separate data store and compute engine to the data warehouse, queried via a different language such as MDX, typically meaning a proprietary tool is now required to get access. Usually this is also a subset of the data i.e. the cube designer has to decide what data is not included in the cube.

The people saying they don’t see the value of cubes are essentially saying they prefer multidimensional analysis on directly on top of a different stack: a relational, OLAP, columnar data warehouse, queried via SQL, with access to all the data.

It looks like the article is written by a vendor of such tools and hosted on their website, and others are mentioned throughout the thread.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: