Self-Serve Dashboards

pcloadletter_ · 2024-06-11T15:29:13

This reminds me of a time I was at a company using a BI tool for dashboarding. The numbers weren't making much sense to me, so I looked at the query building tool. I couldn't tell for the life of me if a part of the query was doing an inner join or left join. The business analyst who built the dashboard had no idea either.

It turned out that it was doing a left join when the intent was an inner join, and the data being shown was an order of magnitude higher than it should have been. This is when I lost all faith in these kinds of abstraction layers on top of SQL targeting people who don't actually know SQL.

garciasn · 2024-06-11T16:45:28

As someone who leads a Data Engineering and Data Science team and has for 15+ years, this is exactly the problem. Too many folks with access to data who do not understand the data, its relationships, and what the outputs their individual efforts create mean.

Decentralized/embedded Engineers/Scientists; self-service dashboards; low-code BI/data tooling; and, now, LLM-driven text to SQL/viz lipstick on a pig have been floated as some of the solves to the problems seen in the analytics space over the 25 years I've worked in the space. Unfortunately, to date, nothing has actually solved the root issue: lack of data understanding and, its end result, trust in the deliverables.

But, to your specific point, SQL isn't the solve here, either. Too many folks know enough SQL to pull data and use it as they see fit, but too few folks understand the data, its structure/schemas, and valid use of those data. THAT requires time, energy, knowledge, and experience in the space. NO TOOLING, other than experience, solves for this--today (note: will LLMs get to a place where they can? Maybe; but, let's be honest--probably not).

Dashboards are great at giving quick hit information of KPIs and the ability to drill down into them; but, the most important thing to solve are always:

1. Data Management practices

2. Understanding of data, its relationships, and proper use of those data/metrics in deriving insight to drive the business forward.

I am excited to see what the future holds, but my grey beard doesn't allow me to ever, Ever, EVER trust any next-gen tooling being it hasn't held true to date.

cjk2 · 2024-06-11T21:21:54

Ah it's much higher than that.

I always ask: tell me the question you are answering with this.

99% of people can't answer that question.

oooyay · 2024-06-12T14:34:55

I'm a SRE and have used my fair share of BI tooling. I have a rule that if the dashboard has a sufficient number of consumers/users then it should actually be in an internal application. BI tools are for me to play and prototype with data, they don't produce products in themselves. Much of that has to do with what you mentioned here, the other half is that I will absolutely never devote another part of my life to debugging some dashboard query I made six months ago.

naijaboiler · 2024-06-12T23:51:33

I am data science manager at my own startup. Everywhere I go, I tell people proudly, I am ardently and stridently against data democratization.

I honestly will prefer people make decisions with their guts than with ill-understood data. Instead what I get is people who don't understand the data, its context, its meanings (and what it doesn't mean) trying to lead people down the wrong road while using the data as crutch. It is so frustrating to me.

Intuition of folks with good understanding of the business, especially when their salaries depend on it is often a much much better compass than some rubbish someone is claiming data is saying.

Another term I hate with a passion "data driven". No! data drives nothing. "Data informed" is where its at. You take the data, our best understanding, mix it our understanding of things the data doesn't cover, and use to that inform the best decisions we can make.

appplication · 2024-06-11T16:40:50

I have seen it play out over and over with low code tools. The difference between a sensible default and a footgun is just a matter of perspective.

withinboredom · 2024-06-12T08:48:44

And then when you fixed it, did everyone freak out that it was lower?

I've been at big companies where you realize a subtle bug/design causes higher revenue and nobody wants to touch it and be responsible for loss of revenue. Things like the free plan button being just below the fold on average resolutions, calculating state incorrectly that causes discounts to not be applied, someone forgetting to add a "false" that causes people to be prompted to signup when signup isn't technically needed, etc.

I wonder if you felt the same way; like people would blame you for there being lower numbers?

Gormo · 2024-06-12T12:24:16

This is a problem that's common to all "no code" solutions. The heavy lifting involved in implementing complex logic flows isn't typing code into an IDE, it's knowing how to model the problem and design an effective algorithm to solve it.

These solutions are targeted at non-technical users with the promise of not having to write code, but those users often get lost or produce erroneous results because they still don't understand the more complicated parts of engineering their solution.

Every time I see an attempt to scale up complex business processes with "no code" tools, it inevitably winds up hitting a wall after a lot of churn, then ultimately having to be handed off to actual engineers anyway, who are in turn hamstrung by the lack of any of the affordances applicable to actually writing code in a real programming language. It's almost impossible to work off a shared repo, have proper version control, do code review, automate testing, or do any kind of CI/CD when you are stuck working with visual flow builder tools.

fostware · 2024-06-12T00:46:00

Do what everyone else does - synthesize the abstract relations into views and limit the self-serve BI dashboard to the views that do make sense. Users default to only seeing it from their perspective, so have alternate methods to let them see it how they think it works.

I'm aware of a product that uses time-based attendance for education, because not every day, every school, or every campus uses the same timetable quilt and often you have to be flexible (school sports carnivals, relief swaps, joint class activities, or 14-day rolling timetables for example). Doesn't mean there isn't a view that synthesizes the quilt into class-based attendance, or even just AM/PM for those users that think that way.

kthejoker2 · 2024-06-11T20:07:42

The assumption that business users can't be taught or are too impatient (even a hint of too stupid?) to learn the relationship between their questions, the data models, and the drop downs is ridiculous.

I could flip this problem around ..

In my experience they are always willing to learn but often times the data modelers don't understand their domain well enough to capture all the nuances of the questions they ask.

Instead they just hide nuance (and increase the time it takes to answer a question) or eliminate it (and therefore produce inaccurate and misleading answers) all in the name of dumbing down self-serve.

I hate the euphemism "non-technical" you can absolutely find a middle ground between LLMs and BI query generation tools and SQL, instead of just declaring by fiat some impenetrable wall of competence.

sanderjd · 2024-06-12T13:06:35

This is related to the advice I give all the young programmer-curious people I know: I tell them that they should go ahead and follow their interests in becoming an expert on computers and programming and data analysis, but that they should try to do those things secondarily to a domain they are learning about or working in.

The solution to the problem of "the domain expert doesn't understand the data and the data expert doesn't understand the domain" is to have them be the same person.

banku_brougham · 2024-06-12T20:36:44

The best business/data outcome I have experienced in my career was with: PMs who knew SQL, a simple extract/scheduling tool, and reasonable data models maintained by a data engineering team with input (or table designs) from BI developers.

citizenpaul · 2024-06-12T03:24:41

We've reached new heights of institutional incompetency from what I've seen.

There is a dilbert comic that said similar criticism about spreadsheets but it applies to BI tools and AI tools. It was something like.

"The spreadsheets in this presentation are of course riddled with errors and incorrect information. It doesn't matter because unless they reinforce a decision that upper management has already made no one will ever look at them again." - Dilbert Comic Paraphrase

I've seen this in display in the real world. Various reports and dashboards in states of broken.

1. Totally broken. Are not updating at all for months and sometimes years and no one seems to have noticed. People are actively using them for processes/decisions/workflows.

2. Broken in a large but not obvious way. The data is not updating but is pivoting on a date/time so it changes every cycle it runs but just rearranges the same data.

4. Endless other things I'm sure others here have many stories too. 3. Various formulas and "math" that is completely incorrect and outputting made up fantasy numbers.

threeseed · 2024-06-12T03:34:47

> institutional incompetency

Bad choice of words. They are not incompetent.

It's just that the data challenge has become exponentially harder as the world moved away from centralised EDW and ERP systems.

And the level of investment hasn't caught up.

ehnto · 2024-06-12T03:52:23

The level of investment went from having a team and internal software services, to a $60/month SaaS subscription(s) in many places.

The web of SaaS many companies run on will never be as coherent as a purpose built ERP, as every service is generalized and abstracted concepts don't map perfectly to your business.

You see it often, where a business will inherit the models of the SaaS they use instead of what makes sense to their business, as an attempt to map the two entities better.

citizenpaul · 2024-06-13T19:21:30

If i can look at a report that I've never seen before and tell something is very wrong with the numbers in less than a minute. What do you call the people that are familiar with the data just going along with the incorrect data?

Value extractors?

mmcgaha · 2024-06-11T17:41:34

I have been providing data to business users for the last 24 years. It doesn't matter if we give them a query tool, MS Access, Power BI or a data cube in Excel, there are only a small number of users who will actually use the tools. My guess is these are the same users who would have done analysis with data scraped from terminals and printed reports 40 years ago. For what it is worth, execs do like a dashboard with key metrics and the new BI tools make it a lot easier to write and maintain KPI dashboards.

LeonB · 2024-06-12T00:00:12

I love discovering the people inside enterprises who don’t realise how awesome they are at coding. They might be a “personal assistant” or something but they hack on sharepoint forms or access or excel and make it sing - acknowledging their clever skills and giving them more powerful tools is awesome. Then they leave for a better job… brilliant!

Jerrrry · 2024-06-12T14:01:06

Yeah, title inflation / deflation based on DEI quotas are still the elephant in the room.

Of course the guy backpacking the contract making $20 an hour is gonna have something adverse to think of his "CSR" title relative to a few salaried watering hole loiters/"Software Engineers" who's biggest achievements are having named an Excel sheet 4 years ago and googling a Salesforce configuration option and misplaining it.

whartung · 2024-06-11T20:45:09

> My guess is these are the same users who would have done analysis with data scraped from terminals and printed reports 40 years ago.

Back in the early computing days, most of the work was done on the large, shared, mainframe computers. The computers were so costly, that the "computer division" was it's own, separate division of the corporation that had premises with the other divisions. For example the Western Division made products, but if it wanted computer resources, it contracted with the Computer Division, who handily had mainframes installed onsite.

Our group was a feisty, small internal analysis group using the new, "cheap" mini computers. One of our points of service was simply being much more reactive to the users needs, we could simply respond more easily because of how the funding worked.

To you point about "data scraped", we were walking through the plant and saw one of our users with one of our reports. They were cutting the lines out of the green bar report, taping them to another piece of paper, and photocopying it. They were sorting the report by a different criteria. We told them "You know, we can do that for you!" "Oh really!?"

People that need to Get Stuff Done, get it done. Our goals as service providers (which is what we in the computer systems groups are, service providers to our internal customers), is to make that as efficient as possible.

Another person was using a PC, our tablet digitizer, and Autocad to record the points on aircraft to generate radar profiles. It was an inventive use of the digitizer, not for fundamental CAD work, but simple data capture from the drawings in Jane's Combat Aircraft.

ehnto · 2024-06-12T03:57:10

Sometimes I am engaged to build software to integrate some systems together, and will simply configure or expose exisiting dashboards to the client. Usually the dashboard blows them away, and they can't believe they had this data all along. Integrations are necessary business operations but the stakeholders fucking love digestible data. It's a really easy value add.

datadrivenangel · 2024-06-11T17:14:49

The conventional BI interface shown in the post is Metabase, which has one of the better interfaces for doing BI out there today. Metabase also has the ability to see the GUI-generated SQL for a question, and convert that question into pure SQL, which is a great way to move from self serve to governance as it's easy to modify and validate the logic and give less technically skilled people an easy route towards improving their skills.

The post is fundamentally correct though, and I say this as a professional data person, a BI tool rarely gives more people correct understanding of the data or the technical skills needed to use it correctly. If your data is well managed, the tools are easy and people can figure things out, but the world is complicated, so your data will become complicated, and the cost of data management is very visible while the benefits are invisible.

NightMKoder · 2024-06-11T16:46:55

I’ve come to a similar conclusion about “self-service BI” myself but my solution is somewhat different. The solution I have is move the layer of abstraction higher: make extremely customizable dashboards but do not expose SQL to business users.

An example of this might be a dashboard with 20 filters and 20 parameters controlling breakdown dimensions and “assumptions used.” So asking “how did Google ads perform in the last month broken down by age group” is about changing 3-4 preset dropdowns. Parameters are also key here - this way you only expose the knobs that you’ve vetted - not arbitrary SQL.

Obviously this is a hard dashboard to build and requires quite a bit of viz expertise (eg experience with looker or tableau or excel) but the result is 70% of questions do become self service. The other 30% - abandon hope. You will need someone to translate business questions into data questions and that’s a human problem.

snowfield · 2024-06-11T18:13:14

Yes, know which questions gets asked often and make those dashboards. Now the cfo or whoever can just open those dashboards when they want answers to those questions for any given timeframe.

And tinker with basic parameters

naijaboiler · 2024-06-12T23:55:40

my experience both as a builder, and as part of people that have to sit in the C-suit is the person then comes with their ill-understanding of the data and try to convince everyone A is B when it isn't

smithcoin · 2024-06-11T17:28:05

We use Metabase (featured in the image) and for the most part non-tech people do it fact use it. What helped us implement it was having 'office hours' where I walked people through examples of how to use it "This is how to get sales at a specific location or in a state". While it hasn't solved every problem/query/export, a large portion of requests that went in the direction of engineering before no longer make it that far. Other reasons to love metabase, you can self host it and use your GSuite SSO.

Galanwe · 2024-06-11T18:35:19

From my experience, the issue is not the means to query the data, it's the actual idiosyncrasies of the data. The key metric is thus not "users seem to be more independent as we are receiving less inquiries for help", because there is a very high chance said users are pulling and interpreting completely bogus metrics from the data.

I have seen this over and over. Once low tech users have access to the data, they start building pyramids of bogus analysis, somehow convinced that after all it's not that hard.

The real blocker is all the context that is - let's be honest - always required to perform a correct analysis:

- "Oh no, you cannot use the delivery date to compute monthly sales, since the finance team refill it for recurrent sales"

- "Oh no the prices are stored in USD in the catalog but we actually adjust the rate monthly based on the `monthly_discount` table"

- "Yeah you have to remove items that have a null purchase date from the sales report we have that convention to mark last year's unsold stock"

- "No you cannot sum the sales without joining with the FX rate table since prices are in local currency"

- Etc, etc, etc

ktm5j · 2024-06-11T23:10:24

I love metabase! It's something I set up for now programmers at my company, and honestly they don't even use it beyond looking at the dashboards I've set up for them.. but it's been a hit! Super useful tool!

jppope · 2024-06-11T18:41:27

It always cracks me up that you can pay an upper level manager >$XXX,XXX and they can't run BI sql queries... SQL was invented for the purpose of making it easier for managers to query data. lol

As a former sales guy/ manager I also have no sympathy for those people...

LeonB · 2024-06-12T02:06:51

I know a CEO who “can” do sql - and better than most of the people in the business who do sql — but they refuse to do it, because they understand basic economics. If they spend half a day doing a job someone else can do — even if it takes the other person 3 days — that’s half a day when they’re not doing the stuff that they alone can do.

Similarly a competent CxO knows that the real time is spent on getting the details absolutely correct - null handling quirks, date handling, mismatched coalescing — even though sql is “high level” it still takes time and dedication to get a trustworthy answer - if there’s people who available who specialise in that, let them do that.

NoboruWataya · 2024-06-11T17:24:52

I think BI dashboards can work but only for very simple queries. If you are even at the point where you are asking non-technical people to execute joins on data I think you are in too deep and should just be writing SQL at that stage. Joins might seem very basic to some but honestly I find it difficult to wrap my head around them sometimes and with a UI like a dashboard which is less expressive than SQL I think it would just be a recipe for confusion.

It's ultimately a trade-off: you can make your tool more accessible to non-technical users than SQL, but it will necessarily be less powerful than SQL. And I still think there are plenty of use cases within that space. IME so much "BI" is just "I have two columns of data and I want to plot one against the other".

The author describes SQL as the only "self-serve" BI tool, but honestly, I think that is Excel. So many of these BI tools are just reinventing Excel with new (and therefore less familiar) interfaces. It is a meme to hate on Excel and I think that is because people have in the past tried to use it for complex stuff that really should be done in SQL. If we used SQL for complex data manipulation and Excel for "give me that as a pie chart", there would truly be no need for BI tools.

Optimal_Persona · 2024-06-12T22:49:01

I was going to post much the same regarding SQL <> Excel. Once underlying data sources are cleaned, wrangled, and access-controlled, it's amazing how far VLOOKUP and Pivots will get you.

I've also found that giving non-technical users the opportunity to self-serve when more than one data source is involved always leads to offline mashups of data and the question - "Hey data team, why doesn't 'your' data match 'mine'", always with the assumption that 'theirs' is the correct one!

kkfx · 2024-06-11T20:25:57

The main issue is that modern tools are not like classic desktops, like Smalltalk workstations or Emacs, meaning a single, fully-integrated environment, where anything is at user disposal, with end-user programming concepts built-in.

In org-mode I can create "good looking slides" in a snap, I can quickly craft some chuck of code, run it and get some results, it's damn limited in "dashboard" terms, let's say I can quickly plot some data but the plot is just a crude static image, make it glow with PGF/TikZ it's very time consuming so it's not an option either and it would be still static, because Emacs itself it's the right tool but from an older era. Modern tools offer more eye candy and quick manipulation but only for very limited actions in a very inflexible UI obviously not integrated with anything else. R probably is the quickest with R Studio/quarto to produce contents quick, dirty and still nice to see, but it's still far from the flexibility of Emacs. I think there is no solution without re-writing the entire modern software stack with the classic paradigm and modern stuff doable thanks of much more horsepower under the wood.

mehulashah · 2024-06-11T19:15:58

If this is true for BI (which I tend to agree with), then this even more true for ETL. End users have an even less of a chance of doing data wrangling. Rather, it’s the technical folks that’ll do it.

What we learned in building AWS Glue was that it wasn’t just about the context — it was also about escape valves. Escape valves are tools necessary to get out of a situation that wasn’t anticipated. When the answer doesn’t make sense, the technical users are the only ones that have the know how to debug it.

dapearce · 2024-06-11T21:04:44

I've implemented self-serve analytics at three organizations successfully. It's important your data is well organized, well labeled/defined, and the BI tool needs to allow you to configure guardrails around how data is queried. I have used Looker as the BI tool at all three organizations. End users do not need to even define joins. They just select the grouping and aggregate columns they want to return and click run, then configure their visualization from the data. It is true that some users are not data literate and still will not run their own queries, but from my experience a lot of non-technical business users love being able to easily explore data themselves, and the business receives a lot of value from it.

hamasho · 2024-06-11T18:47:57

I think sometimes encouraging business people to make SQL queries themselves can be beneficial if the database schema is well-defined.

When I worked for an ad-tech company, we built an ETL system to convert our messy, complex, and technical debt-ridden application database into a well-organized analytical database. The main purpose was to make the jobs of data scientists and machine learning engineers easier.

However, it also helped business people with some technical knowledge create the dashboards they wanted. Although some queries were a bit messy, it made it easier for them to organize their requirements and communicate effectively with the tech team. Unfortunately, it also resulted in a lot of half-baked and unused dashboards, but overall, it brought positive change to the company.

That said, I don't think it's worth developing an ETL system just for business people. It requires multiple dedicated devs, whereas writing SQL for dashboards occasionally only takes a few days per month for a single dev. I agree that the most important part is fostering a good relationship between tech and business people. If business people have a mental barrier, it becomes challenging to create new dashboards and update or fix existing ones.

Angostura · 2024-06-11T20:28:43

The author spends a lot of time convincing himself that SQL is not the problem, but that it’s the context and semantics of the data.

I disagree. Just because understanding the data is a difficult problem doesn’t mean that SQL isn’t also a problem.

I understand the data and the set up of the database just fine. Darned if I can remember SQL syntax

drinkzima · 2024-06-11T16:01:18

The straw man is always that self-serve fails because every user cannot use data well at work. The reality is some users will be inclined to solve their own problems and others will not, but self service is available to many users with deployed BI, and SQL is nearly always not the way they are doing it.

Most times I see this type of article, it's with folks that have never worked in a modeled BI tool. Salesforce data, for example, is very complex. But an ability to make a table of live opportunities with metadata and order them freely, next to usage data in an app is self service BI. It's not hard; it takes some setup; but it's self service.

The idea that folks can jump from business understanding to fully mapping the data as it lives in the data warehouse, on the other hand, is not trivial and won't be. The nuance of the real world is hard.

Different types of users need different interfaces - SQL all the way down to point and click. And there's no free lunch on modeling raw data to bring it to a consumable place for the company.

snowfield · 2024-06-11T18:15:23

Yes, provide different ways for different users to sove their own issues. Then technical people can solve the hard ones.

camjw · 2024-06-11T16:14:21

This feels like one of those spaces where everyone says LLMs and AI are just going to make it a solved problem, but no-one in the comments here is even mentioning it - maybe not a good problem for LLMs after all?

PheonixPharts · 2024-06-11T16:52:57

> but no-one in the comments here is even mentioning it

The post has an entire section discussing this.

The problem with text-to-sql is that, as the post elaborates, writing SQL is not the problem. It's understanding the context and the data:

> On the other hand, a technical person would notice that the question doesn't make sense, and they would ask for more context. They would ask for details about the business person's hypothesis and the problem at hand. Then, they would explain what type of data is available, and work with the business person to formulate a precise and useful question.

Text-to-sql in practice is a solution that nobody was asking for, despite the insane number of SV startups shipping GPT-text-to-sql wrappers as products.

There certainly is like places where LLMs can help (post touches on this briefly), and that is in semantically exploring databases/tables/etc and contexts around data, but this is a very different project and would require a lot of curation from data teams to make it happen.

mritchie712 · 2024-06-11T16:19:11

Most people that went after this tried for text-to-sql (e.g. ask a question and generate a ton of SQL to answer it). That approach has pretty much failed. The LLM could never have enough context to generate accurate SQL at a high enough rate to trust.

What we've found to actually work at Definite (I'm the founder) is text-to-semantic-query. This is an older video, but here's an example: https://www.youtube.com/watch?v=44mhLgUYOp8

camjw · 2024-06-11T16:20:41

How is this any different (text-to-sql vs text-to-semantic-query). Isn't this just comparing text-to-sql to text-to-slightly-simpler-sql?

mritchie712 · 2024-06-11T16:42:00

Yes, it's simpler, but there's a few key differences:

1. You also have complete control over what the LLM can do / access thru the semantic layer (e.g. you can remove tables that the LLM shouldn't consider for analytical questions).

2. One of the biggest choke points for text-to-sql is constructing joins. All the joins are already built into the semantic layer.

3. Calculating metrics / measures is handled in the semantic layer instead of on the fly with SQL (e.g. if you ask something like "how much revenue did we generate from product X", you wouldn't want the LLM to come up with a calculation for revenue on the fly. Instead, revenue is clearly defined in the semantic layer).

4. The query format for our semantic layer (we use cube.dev) is JSON, which is much easier to control then free form SQL.

The semantic layer gives the LLM a well defined and constrained space to operate within whereas there are hundreds of ways for it to fail writing raw SQL.

TheGRS · 2024-06-11T16:20:09

The post talked about that too: LLMs could come up with a query for any question. A technical person would know how to spot the problems with that query or even the question, a non-technical person wouldn't know that they just asked for a bunch of hot nonsense.

tlarkworthy · 2024-06-11T16:42:08

I have watched this indie hacker do quite well over the last year or so, with ai2sql.io

https://x.com/mustafaergisi/status/1793657418435432649

pcloadletter_ · 2024-06-11T16:22:03

With all the hallucinating (or "bullshitting") going on, it's hard to imagine LLMs working well for query generation. But hey, we're _very_ early days for all of this.

pelagicAustral · 2024-06-11T15:44:29

Haha Yes, this is it. I maintain a number of databases at work and I've implemented Metabase about a year ago. Started the first dashboards, translated some of the old reports, and hoped that my dear users would engage with the platform for their own needs... it never happened, and it's probably never going to happen.

I agree, a user with no grasp of the schema, or structure of the database, or with zero knowledge on how to wire a query (with or without a graphic interface) will not make the effort because even when they do, they feel like a caveman in front of an iPhone, and everybody hates that feeling.

Most of the time, even people that have worked with PowerBI, or other tools wont make the effort because they also know, push comes to shove, I will end up querying for them, so why would they?

This is a problem with no easy solution I'm afraid... me, I love writing SQL, so it's a bit of a break each time I need to produce stats of some sort, or get a report out.

sumoboy · 2024-06-11T16:45:14

Huge gap for building self service tools/dashboards for execs, just no sense of the schema and data to make them sql writers which you don't want. When metabase showed a version last year with AI, you could see right away without some knowledge of the data model your just grasping at writing the correct prompt.

DarkerIsBetter · 2024-06-11T18:15:39

Excellent post.

Unless the data model is either extremely clean or simple, users lacking deep context will struggle.

There are always tables for abstract concepts, code paths for legacy behavior, etc. Should we expect users to embed this internal business logic into their SQL queries? How do these users know when they need to change their embedded assumptions?

mritchie712 · 2024-06-11T16:16:11

You're right that text-to-sql doesn't work, but text-to-semantic-query works very well. This is an older video, but here's an example from Definite (I'm the founder)

https://www.youtube.com/watch?v=44mhLgUYOp8

ssahoo · 2024-06-11T16:40:57

Modern devs are not familiar with OLAP and cubes, therefore the argument is valid until time swings again.

qwertyuiop_ · 2024-06-11T16:45:07

Just a nitpick. The term "modern" is thrown around as equivalent to evolutionary progression to best practices. The term that could used to refer to the developers in you are describing could be "contemporary".

tobilg · 2024-06-12T07:56:21

I agree to the article, but the discussion is nearly two decades old already and more or less the same since I started in classic BI in the early 2000s.

I also strongly believe in SQL to be the "glue", or the least common denominator for accessing data, that's why I built https://sql-workbench.com which is a free SQL environment in your browser for querying and visualizing local and remote Parquet, CSV, JSON and other data via DuckDB WASM.

It also supports bringing your local LLM for Text-to-SQL generation via Ollama...

lovasoa · 2024-06-11T21:41:01

I do agree that data teams are probably not going to disappear in the short term, and their understanding of the structure of the data they work with is even more important than their SQL syntax skills.

But that said, I think we can do better than just static dashboards. What I'm trying to do with https://sql.ophir.dev is to let the same data teams write apps instead of dashboards. Apps are more flexible, and allow deep dives and navigation to a level that is not possible with just a dashboard.

heinrich5991 · 2024-06-12T10:30:42

I guess BI stands for https://en.wikipedia.org/wiki/Business_intelligence?

g4zj · 2024-06-12T13:28:14

I had to Google this as well. Is it asking so much for people to reference the full term for an acronym _just once_ at the top of an article?

Even at this moment, with 70+ comments here, not a single person has mentioned what this stands for.

antonyt · 2024-06-12T14:28:21

Would you expect every article about SQL to say "structured query language" at the top? Some abbreviations become so common they transcend the expectation of definition. BI is arguably one of those - if you're not familiar with it, you're probably just not the intended audience for the article.

g4zj · 2024-06-12T15:14:58

I would, personally. I don't think it's asking a lot.

The small amount of effort helps not only for those who are new to the concept, but also makes it easier to discover the article via search.

> Some abbreviations become so common they transcend the expectation of definition.

I agree with this, however it just seems silly to use an acronym 10+ times (as "BI" was in this article) and never once expand on its definition.

And it isn't as though the web site in question is specific to this concept, where users discovering the article are conveniently and unavoidably exposed to information regarding its meaning. In fact, according to a quick Google search, the string "business intelligence" doesn't appear at all across all of the site's indexed pages.

I'm sure I'm in the minority, but it just throws me off to have to stop reading and go on a quick side quest to learn what a key acronym used in the article might stand for. :)

pphysch · 2024-06-11T17:47:05

> That story usually starts with an engineer or data scientist who's frustrated because they spend too much time writing queries and preparing dashboards for business people. They think that if they make BI easy enough, everyone will be able to “self-serve”, but that rarely ever happens.

The solution is usually to paste/implement that query in a low-tech automated email report, web page, or Grafana/shared spreadsheet.

It's a (simple) automation problem, which should not be conflated with cross-training an MBA to be a data scientist.

XCSme · 2024-06-13T00:11:06

I think text-to-SQL is still useful for devs more than non-devs.

This https://docs.uxwizz.com/guides/ask-ai-new works sort of ok half of the time, but LLMs have to get better before they can really understand the database structure and make correct queries. At the moment, sometimes they get wrong even basic things like using a wrong table name.

mrweasel · 2024-06-12T15:16:49

Depending on the skills and needs of the people doing the queries, just provide a set of template SQL queries. My wife frequently need to query sales data from their POS system. The developers provide small SQL query window, along with a set of predefined queries. New templates can be pushed if the database changes and users can easily change parameters or make small modifications to the queries if they need to.

ThomPete · 2024-06-12T03:20:30

Dashboards and other UI are used because most people don't know how to do sql so yeah once you start using LLMs as the UI layer things start to happen.

Here is an example of an agent that is communicating with an SQL database using simple language.

https://www.youtube.com/watch?v=jh9gJoDqFoU&t=2s

nlh · 2024-06-12T05:43:03

While we're on the topic, what SQL-based BI tools do we all recommend? I'm in agreement wit OP that writing SQL is generally a best-practice, but I'm still a bit clueless about the State of the Art tools. Metabase? Looker? (it's still so expensive though). Something else?

Dyac · 2024-06-12T07:21:06

I'm a very happy customer of https://www.count.co

It's a big canvas that multiple people can use at once, with SQL and Python, text annotations, comment threads, shapes and drawing etc.

Connects into the regular cloud database / warehousing systems.

It's great because it allows you to get the flow of data quite literally visualised on a big board and it exposes the underlying SQL so that, over time, non technical users can learn by exposure and osmosis, as things aren't super hidden away "behind the scenes".

nlh · 2024-06-12T17:11:45

Oh man that looks cool - BUT - $199/month for the "starter" version and $1k/month for a team prices me wayyyyyyy out on this. I am but a humble solopreneur trying to use SQL to build some nice dashboards (and not go broke while doing it).

codexb · 2024-06-11T20:29:54

I've also found this to be the case.

That being said, I like being able to create dashboards for myself, but the interface for every tool to make these queries tries to be too clever and it ends up being painful to use -- coughJIRAcough

DaiPlusPlus · 2024-06-12T01:39:21

If Jira is a dashboarding tool then I wonder what Atlassian's ERP is like...

banku_brougham · 2024-06-12T20:33:39

Tableau is expensive, and they are trying to move into the Databricks space with managed compute, and what we are using it to build at my company is the largest hidden spaghetti ball imagineable.

jasmcole · 2024-06-11T16:49:06

If you're interested in a mix of SQL, Python, DuckDB and low-code interfaces to SQL databases, you may want to check out https://count.co

Dyac · 2024-06-11T17:58:04

MASSIVE count fan here. Been using it over a year and nothing else comes close for exploratory analysis.

I've not felt as impressed by a tool or had it so quickly change how I work ever before.

hi-v-rocknroll · 2024-06-12T08:07:36

Meta internally has a mix of both where tables, charts, and such can be created by querying almost any datasource. While it has UI-driven query builders, SQL queries can be used instead.

disgruntledphd2 · 2024-06-12T09:01:26

Meta has had tools like that since well before I joined (in 2013), they spent a lot of money making this work.

engine_y · 2024-06-11T15:15:37

I wonder what people here think. Would self serve BI work for something which is more context specific? So for example self serve BI for your own sales. What would it take? Why not?

rbetts · 2024-06-11T15:49:00

> If we assume that the problem with self-serve BI is not SQL, but the context and semantics of the data, then it follows that the solution is to teach people about the data they're querying, regardless of interface.

This has been the basic truth of any self-serve BI system I've used.

Even in smallish orgs there are often three steps - the engineer who instruments code/implements a metric, the engineer who builds the ETL pipeline into the underlying BI warehouse, and the person querying that data. So there are minimally three people in potentially three different roles who need a shared specification and understanding.

Also, self-serve BI tools can be surprisingly opaque and their output can be hard to validate/test. So even if you know accurately what data you are querying, testing that your query is what you intend is hard.

wswope · 2024-06-11T15:46:31

Hot take as a datamonkey who’s done a lot of work in this vein professionally:

General-purpose self-serve is hostile to non-technical end users. Most do not have the mental model of SQL to guide their usage of the tool, so giving them an “open-ended” option to run their own vizes ends up really being a fence-toss of some technobabble garbage that isn’t useful to them at all.

“Slice n’ Dice”-style filtering added to an existing set of reports, however, is a completely viable middle ground. Practically speaking, this means writing the basic dashboard query, then hooking a bunch of query parameters up to some front-end UI widgets to let end users pick the date range, level-of-detail, filter which trendlines or categories of data are shown, etc. Just use good taste to avoid going overboard - don’t try to make every last detail configurable.

mead5432 · 2024-06-11T16:09:04

This is something I've been wrestling with for a little bit. The challenge is that if everything is locked behind a data analyst then things only move as fast as the data analyst. If a business person needs some information now, not next quarter, the challenge is managing the queue for the data analyst which devolves into a world of "rush jobs". It's all the things highlighted by the theory of constraints (The Goal by Goldratt is sooo good). You can alleviate the DA bottleneck by getting more DAs or you can find ways to help the business users do some of the basic stuff themselves.

I wholeheartedly agree that just giving a business user access to the straight database is not ideal for all the issues mentioned - they don't know the context and the gotchas in the data combined with probably not understanding how to write SQL. I think an effective data warehouse strategy with straightforward data marts of materialized views can simplify the interaction and maybe even make it really simple for someone to generate basic visualizations. A lot of business people can make basic dashboards in Excel which, worst case, could be connected to a data mart. It's not going to handle BIG data but may cover a large number of use cases for most businesses.

I'm in favor of creating some basic dashboards We've also been experimenting with embedding dashboards in internal tools that provide some slice and dice capabilities but with a high level filter. A user can manipulate and tweak a dashboard for a specific customer but to look at a different customer they need to navigate to it in the internal tools, not via the dashboard.

internet101010 · 2024-06-12T13:53:30

Yes. Say you want to get all sales for a list of 75 products out of the 30,000 products that you sell in a list of 125 stores out of the 4500 stores you operate. Self-serve is perfect for this. So long as you can paste lists and not require people to search for things and click checkboxes.

huevosabio · 2024-06-12T14:45:36

Frankly, at Lyft, we had a very good system for dashboards.

Lyft uses mode [0] for most dashboards, it has a very well documented data catalog thanks to the system they created, Amundsen [1], and the tables are almost all free to peruse by anyone at the company.

This lets anyone with curiosity and the willingness to write some SQL build some very useful dashboards. And every dashboard is very easy to understand because the SQL can be inspected and the sources can be inspected. If you wanted to go further, you can even pin down where and from which repository the data got emitted. Mode has a very poor charting library, but it satisfies 80% of the needs and the other 20% can easily be complemented by the fact that they have an integration with Jupyter notebooks that gives you even more power.

The entire system felt very ergonomic. It also felt like it gave the opportunity for non-technical people to step up and get their hands dirty rather than wait for a data analyst.

[0] https://mode.com/ [1] https://www.amundsen.io/

esafak · 2024-06-11T18:21:12

I think the missing product the author alludes to is the data catalog, to document what the data means, and where it comes from.

zurfer · 2024-06-11T18:41:05

As someone who has build a data catalog tool and is now building one of these looked down on LLM agents(1): Data catalogs are mostly used by technical people/the data team and a handful of power users. Why? Because it's intellectually really difficult to understand data deeply and it's similarly hard to present data to make it easy, so people just ask the people who know (no self service).

An LLM agent can shine here because a) you can give all the relevant context that is needed to make sense of the data and b) it can be personalized and proactive, so the user just gets appropriate reports when they need it, instead of wading through a "Customer360 Cockpit" with hundreds of visuals and options.

(1) https://www.getdot.ai/

nosefurhairdo · 2024-06-11T18:01:57

Our self-serve dashboards feature is the most used feature on our platform according to our analytics. Our customers include many Fortune 100 companies and our company was acquired for some billions of dollars.

I largely agree with the sentiment but it's just not what we've seen in practice.

serverlord · 2024-06-11T17:40:49

My product has a GPT-4 API to convert text to SQL and it works just fine. Reason: Good data engineering and using ClickHouse to have a single materialized view to create any dashboard. It will be out soon. It works like magic.

timacles · 2024-06-11T18:46:54

As someone who has been a part of implementing BI in every possible form across several companies in the last 10 years, its always the same problem.

No one uses it for jack shit.

All the colorful graphs, charts and cool visualizations have very little actionable information for 99% of people. The other 1% is management and executives who need some random 3 KPI points charted over 3 months to look at so they can feel like they're doing their job or have something to complain about to their underlings.

I have never seen any substantial discussion about any BI metric. Its always a passing thought of curiosity, get the data, chart it, go "Oh Wow, would you look at that." and then immediately move on.

And to take a step back, the only time people actually use any sort of metric consistently, is when they have an obsessive curiosity about a particular area.

camjw · 2024-06-11T19:27:55

Why is it that $BILLIONS is spent on this problem then, if it doesn't exist. I get that, yeah, management and execs only care about 3 KPIs but presumably people lower than them in the chain care about improving those KPIs and digging into the data helps you do that?

Or is this not true?

gofreddygo · 2024-06-11T21:01:50

> No one uses it for jack shit.

Dashboards are lossy, compressed selective subset of hyper processed data. They don't show the why, or how, or when.

yet they survive in the form of a report or a chart because they shift the control of the narrative on the designer of the dashboard. You pick and choose what you want to emphasize and what to hide. Convenient, if thats what you want.

OTOH, i've never found or resolved or identified a single issue from a dashboard. I've spent hours on why something wasn't highlighted or something was and found the culprit to be the dashboard itself.

Yep, jackshit