I think the development of Python and Jupyter and other less known things like Vega are much more interesting. Python is today the only "glue code" that puts all of it together, from data to insights.
Other than the expensive part, is it really such a bad thing? I feel like relational databases are a pretty good fit for a wide set of use cases and have a huge amount of tooling.
In short, is there any solution that "does everything you could possibly want" while ensuring you _never_ need to hire a data engineer? This is a holy grail that I don't think exists.
You have to normalize data taken from various sources of various age and complexity. So you really have to understand the data. You also have to really understand the questions.
I've worked with (and on) lots of these tools and projects; the complexity is never in the frontend, it's dominated by getting the data, getting the data right and into the right format.
If all you want in the end is a good looking dashboard on a website then you might as well build it yourself; because of the cost structure that can even cost less than buying one of the BI frontend tools (there's not a lot of difference in development time, the the BI frontenders are more expensive because they are rarer and the licencing is high).
The people and their spreadsheets was the easy part to control.
Welcome to my reality.
Would I love a data architect and a domain expert in my team? Yeah.
Will I run around booking meetings with everyone that even hints at working with data like a headless hen? Yeah.
Is this the normal procedure for Data Scientists in big and old companies? More so than I would like.
Oh! And I forgot that the security department will constantly deny your access to data you need (until you force their hand).
(Disclaimer: I work in data engineering at Amazon and use those tools in my day to day)
Companies prefer well known products like Alteryx or Tableau because, despite the cost, it makes people easier to replace.
But i cant blame you for writing your own things. Im currently replacing a large SSIS-based etl proces with Python, because i'm sick of SSIS randomly breaking.
PostgreSQL on the other hand - so good, so free!
They make it simple to get started and even without knowing what you are doing you can easily churn out something that works if it is simple, doesn't change often, doesn't need to scale and deals with small amounts of data.
But similar to
"It represents a quagmire which starts well, gets more complicated as time passes, and before long entraps its users in a commitment that has no clear demarcation point, no clear win conditions, and no clear exit strategy."
RDBMS are the root cause.
There are no major systems out there of even moderate complexity that aren't built on an rdbms.
I don't think this claim is accurate.
Counter-intuitively Datomic is in violent agreement with /u/rqmedes where he said "A better alternative is having the data, data model and business logic tightly bound in one place. Not separated in multiple "tiers"" – Datomic inverts/unbundles the standard database architecture such that the cached database index values are distributed out and co-located with your application code such that database queries are amortized to local cost. Immutability in the database is how this is possible without sacrificing strong consistency, basically if git were a database you end up at Datomic.
When one of these things change it changes the rest.
In theory, it could be used to provide that industrial strength abstraction layer between your Tableau/Looker/etc. and your bajillion weird and not-so-weird (RDBMS) data sources.
That would seem to make sense to me from the point of view of -- I would want my data visualization/analytics-type company to be able to concentrate on data visualization/analytics, not building some insane and never-ending data abstraction layer.
The part that surprised me was that Denodo could allegedly do a lot of smart data caching, thus speeding things up (esp hadoop-oriented data sources) and keeping costs down.
I'm guessing the other data virtualization providers can do similar.
The only barriers to Salesforce + Tableau adoption I noticed were cross-object JOINs and live vs cached data extracts.
Both issues were remedied by denormalizing the data prior to export. For example, a nightly flattened "view" of Opportunities with key related objects moved into columns.
Mulesoft is well suited to perfect the ETL challenges. Bringing them to the table could be a win for everyone.
In that case you may be interested in Dash (dash.plot.ly). It’s a free and open source library that you can use to create dashboards online with Python only.
We write our back ends with FastAPI, which is usually just a wrapper around our ML models. Then serve both Dash and FastAPI with gunicorn. The backend is provided the uvicorn worker class with the gunicorn -k arg to greatly increase the speed as well.
For personal projects you can use this stack in GCP's AppEngine standard environment to basically host your (relatively low traffic) apps for free.
The real issue has always been the organizational problems of larger teams and companies as data gets split into multiple silos and needs ETL and cleanup before it's useful. The new abilities we have gained have increased the complexity and scale which can lead to new challenges, but the tools are definitely getting better every day.
Don't forget Teradata.
I found the same thing with MicroStrategy. I spent a lot of time reverse engineering what I could from MicroStrategy jars to expose additional functions in their plugin interface (which is so incomplete it shouldn't be advertised). But the reality is its 20+ year old system with front end updates, can only put so many band aids on it.
I think the only thing keeping MicroStategy alive is its cube functionality and the businesses who have invested to much into it.
We seem to prefer the lite version, which is simplier.
If you look at the examples, you can click a button a go to a dynamic editor which we rather like.
If JS and web browsers aren't your thing, they have python version called "altair"
It's still really early, but feel free to have a play and create an app. Here is an example app examining using the Prophet forecasting library: https://nstack.com/apps/rdA647Q/
I'd love any feedback, and if you'd like to chat to learn more, reach out to me on email@example.com.
And next week you will have to do it all again, because it’s all manual.
None, because it's too much programming for IT to let business people have access to it, and it's not disguised as an office productivity app the way Excel is.
If they had access to it and had basic training on it that anyone already competent in any vaguely quantitative domain could handle, plenty of them could and would.
At least judging by my experience with SQL shells and similar told that are both less powerful and less friendly than Jupyter + Python, and yet plenty of business people used them productively in enterprise environments (often right up until IT ripped it from their hands.)
But that’s for data science which is (hopefully) the foundation of actionable BI.
Did I mention its a nasty fugly POS?