I'm investigating this right now, I can't tell you much, as I don't understand some of the fundementals yet.
I have to say, first glance, I like the idea of this tool a lot. I normally use light weight spreadsheets like gnumeric when I feel the need to use them at all, and I'd love a more expressive kind of spreadsheet/database/pivot table like this.
At the outset however, I'd be concerned about integrating a tool into my workflow that, most dangerously, is a service that could disappear and take my work with it. Second most dangerously, is proprietary. Third most dangerously, lacks a community beyond the initial dev. Fourth, if it is a service, is not a 'zero knowledge' service.
So while I like this look of this tool very much, and I'm going through the tutorials and playing around with it, personally I see this as a difficult sell. At least for me. But... I dunno, I don't think I'm the kind of demographic that moves markets.
You might see some kinds of success with this by emulating the gitlab model. open source the central components of it for personal use, and try to get some companies looking at it with enterprise-y features?
Agreed, those four barriers are barriers for me as well. However, I'd note that enterprise startups all face and must overcome the same obstacles. (I'm sure you know this, just stating it for others who might not)
In particular, proprietary is not a sine qua non. Some enterprise companies open source a figleaf to assuage proprietary concerns, while others simply stay proprietary. If you're a product manager and want to use Aha, you just have to use Aha and go with their pricing tiers (including 30 day free trial) and claims of security. There's no open source Aha. That hasn't stopped them from being popular.
Funding and courting early adopters can go a long way towards neutralise 1 ($ and momentum imply a longer lifetime) and 3 ($ implies can hire other devs). Nothing ever removes the risk of a company cancelling a product, but that's the nature of products even from BigCos like Google and Microsoft. Even open source can be abandoned.
I've got several mails from people who'd like to install the system on their own machines. Installing Egeria is not as easy as a desktop application because it is a web server. I will probably add a download page with installation instructions in the next weeks.
Install a webserver/application? Sounds like what you want is a tiny Dockerfile. Can highly advise giving this a try, PM me for questions.
And if written in a script (or non-compiled language) probably something to keep your source safe and avoid abuse. Some random examples are sourceguardian (PHP) or proguard (JAVA) or enclosejs (Javascript)
Wow nice, I can easily imagine using this to manage my custom keymaps.
Currently I store everything in Excel. All the modifiers and modifier-combinations (lctrl, lalt, lshift, lctrl+lalt, lctrl+ralt+lshift ...) correspond to columns, all the modified keys (a-z, 0-9, ...) column to rows.
But being limited to 2 dimensions can be very inconvenient. For example since you have at least 6 modifier keys (ctrl, alt, shift keys on both left and right, 8 if you include windows keys, 9 if you include remapped capslock), trying to cover all the modifier combinations (e.g. lctrl+ralt+lshift) leads to combinatorial explosion and make the number of columns totally unmanagable. (It becomes even trickier when you want to define vim style sequences)
This looks like it's designed just for that purpose!!
Could you explain what you mean by multidimensional.
My first thought was that all spreadsheets are at least 2D, some like SC are explicitly 3D, but the write up seems to suggest something different?
"Multidimensional data model: worksheets are organized by business entities (SKUs, departments, years, months, scenarios and so on). The data is stored in a more structured way which has many benefits like simple and robust computations across multiple worksheets
"
Multidimensional sheet is like a tensor (a multidimensional matrix). You have a cell for every coordinate combination of it's axis. You can imagine a 3-D sheet as a stack of 2-D sheets.
You can also check this one out: https://en.wikipedia.org/wiki/Dimensional_modeling
Yes I get that but A. All spreadsheets are multi dimensional, so why point it out. B. The blurb seems to conflate it with structured/typed data.
I've tried reading the link, but it isn't making much sense at the moment, but doesn't seem to be using dimensions in the same way as I would expect a spreadsheet to?
A normal spreadsheet is considered 2D. You could probably call Excel 3D if you count the multiple worksheets, but thats it. You cannot add the forth dimension.
Dimenional model is mostly used in the databases right now. I agree, it is a bit more complex than normal spreadsheets. Here is an article which describes the concept a bit better: https://en.wikipedia.org/wiki/OLAP_cube
Nice, this looks pretty similar to Quantrix Modeler and Javelin, it's precursor. Sadly Javelin suffered from MS's shenanigans with Excel bundling. I was wondering when someone would reimplement those two on the modern web stack. Before the web became popular for these types of apps I had prototyped something similar using pandas and PyQt.
> Before the web became popular for these types of apps I had prototyped something similar using pandas and PyQt.
Haven't you open-sourced your prototypes? A hackable desktop app made with PyQt and exposing Pandas functionality a nice GUI way seems a way more appealing to me than yet another heavy web application thet depends on a 3-rd party server and requires an Internet connection to use.
That's a really good question. I generally end up building these tools for personal use or to see if some design space is worth exploring, and so don't bother cleaning up the code/making it easy for other people to run. In addition, people now have the expectation that people releasing open source code owe them some manner of improvements/support/maintenance, something I'm not interested in doing. I like David Beazley's recent tweet about releasing his code asis - https://twitter.com/dabeaz/status/1069236767029633024
How many rows can you have in your spreadsheet before performance starts to become an issue? (I noticed the table isn't lazy-rendered, which sparked the question.)
A single view should not get too large. The system will not show more than 60000 cells in one view. You will get an error message. The cube itself can grow very large: 10-100 million filled cells. The trick is to design the dimensions so, that only a small portion of the data is visible at once. You can group large dimensions by a column to build a hierarchy (category/model/sku or region/point of sale). This way only a portion of the data is shown at once.
One feature that would be immediately interesting for a specific userbase is to be able to import existing multi-dimensional arrays - e.g. netCDF / Zarr
There aren't tools available for being able to view those in a reasonable way currently (generally I'd use a REPL and be slicing & printing 2D chunks)
So even if it were import-only, this would be v useful
I was surprised that there is no common file standard for interchanging dimensional data. As far is I can see netCDF and Zarr only store values/arrays and not the metadata? I will check if there is an easy way to import one of them.
To be fair there are SOME tools for reading netCDF files (there is netCDF4 for python for example) but it's not all that great out of the box, you have to write your own method to walk through all variables etc... kind of a pain. ncdump also.
Is this an Anaplan competitor? If so, what are your key differences, value props?
What are you using as persistence layer? Postgresql? Mysql? Commercial? How well would it compose with existing ETL and BI tools?
What is the concurrency model esp with regard to planned permissioning model? What happens when two differently permissions users run the same formulas?
- I am not really familiar with Anaplan. You could probably compare them, but as far as I understand Anaplan is a more specialized (planning) tool.
- Egeria is a database itself. I use Sqlite as a key value store for persistence now. I will probably use Mongo in production.
- I will provide a command line tool to automate import and export with ETL tools in CSV format first. A better integration with relational databases would probably come later on.
- All formulas are computed on the server, so users don't run them. Only superusers/designers will be able to change formulas. A normal user will only be able to enter values and change metadata. The concurrency is implemented using locks and in-memory data structures on the server. Open two tabs on the same sheet and try changing values in one of the tabs to see how it works.
Just curious: if one of the product differentiators is that its semantics more closely resemble a relational database, then why not power it with a relational database?
Egeria is in-memory for performance reasons. Egeria can recompute millions cells per second. To do so cells are grouped into chunks and stored in binary form. Relational databases are much slower. It is possible to have a relational backend sometime in the future, but for now using a key value store also meant less implementation effort.
Multidimensional calculations typically require a lot of aggregation, rotations (pivots) and joins.
Mongo is good for store-and-retrieve, and aggregation along a single dimension, and not as strong at the operations mentioned above.
Have you benchmarked Mongo against a relational db for analytic operations on a multidimensional model? The results could be interesting and perhaps different from what you would expect.[1]
Many new OLAP products are implemented on pure relational databases for performance reasons. Some databases with columnar indices are even faster for OLAP type operations.
[1] That said, Mongo could be a good choice for latency reasons. If your spreadsheet is doing lots of small calculations, then it makes sense to use something that can return results quickly.
Could you please elaborate on how sigma is similar to multidimensional modeling of arbitrary formulas and user writes? It seems that sigma is just a SQL generator + visualisations, so it only servers the read side. Spreadsheets are much more flexible as you operate more on cells than on entire columns, and can usually model an arbitrary-long formula chain.
> Egeria is in-memory for performance reasons
I dont see how this relates, as a relational DB can run in-memory, and overflow to SSD. And cache values for the "views".
Selecting rows from cache of a database is still 100-1000 times slower than reading numbers from an array. And dumping this array into a KV store is much easier than generating SQL.
Cells are grouped into chunks (similar to Minecraft) and stored in binary form. There is a complex algorithm which decides how the data is partitioned. All the operations are done in memory and the KV store is only used as data storage.
You mention that Excel is used by non-technical people,/( which is a generalisation), and then I see many comments on HN seeking clarification. It seems to be a complex product. What's the target market?
That's exactly what I try to understand now. I think that people who work with OLAP now should be able to use the system. But I also hope that some percentage of non database people would also be able to learn it. I don't think that the system is self explanatory, but watching a 10 minutes tutorial video could explain a lot.
Take a look at the target market for Quantrix Modeler (Lotus Improv's spiritual successor).
It sounds like your product is in the same space (multidimensional modeling) and the market is likely an overlapping one. I looked at Quantrix Modeler some time ago as an Excel replacement for complex multiparty reporting, and was impressed by it.
Also, Gartner considers these class of products to be in the CPM space. Take a look at this for the products you'd be in competition with [1]. These are more tailored toward financial and business reporting, but the underlying foundation for all of them are similar: they all use multidimensional modeling engines.
Most of these products are so expensive, that only large companies can afford them. A middle size company has a guy who gets 30-50 Excel reports per e-mail at the end of a month and combines them to a single document manually. I hope Egeria could help in such situations...
From what I have heard of it: yes. I hope the formula language is also a bit better. AFAIK Improv could only have one formula per dimension element. Egeria allows formulas on any subspace inside the cube.
At the moment I don't. It's a beta test. The commercial version will support two factor authentication. I think a local installation on the client's server/workstation would be the best option for very sensitive data.
Then how do you support it? What happens when you need to upgrade or fix a bug?
It's nice to hear that you'll support different needs but be careful as this could cause unnecessary complexities and take you away from developing the core product and improvements. Makes me cringe thinking about supporting different versions of the same product.
I'd suggest stick to the cloud offering and get compliant with gdpr and soc etc
I have to say, first glance, I like the idea of this tool a lot. I normally use light weight spreadsheets like gnumeric when I feel the need to use them at all, and I'd love a more expressive kind of spreadsheet/database/pivot table like this.
At the outset however, I'd be concerned about integrating a tool into my workflow that, most dangerously, is a service that could disappear and take my work with it. Second most dangerously, is proprietary. Third most dangerously, lacks a community beyond the initial dev. Fourth, if it is a service, is not a 'zero knowledge' service.
So while I like this look of this tool very much, and I'm going through the tutorials and playing around with it, personally I see this as a difficult sell. At least for me. But... I dunno, I don't think I'm the kind of demographic that moves markets.
You might see some kinds of success with this by emulating the gitlab model. open source the central components of it for personal use, and try to get some companies looking at it with enterprise-y features?