
The Rise of SQL-Based Data Modeling - huy
https://www.holistics.io/blog/the-rise-of-sql-based-data-modeling-and-dataops/
======
dbatten
> This implies that SQL is not reusable, causing similar code with slightly
> different logic to be repeated all over the place. For example, one cannot
> easily write a SQL ‘library’ for accounting purposes and distribute it to
> the rest of your team for reuse whenever accounting-related analysis is
> required.

Data scientist here. I think some of this problem is handled by effective use
of views. Oh, everybody is constantly joining these three accounting-related
tables and aggregating by, say, order number? Have your Data
Engineer/DBA/analyst/whoever create a view that takes care of that. Boom. Now
everybody's using the same data, calculated the same way, nobody's reinventing
the wheel, and you don't have to worry about somebody fat-fingering something
when they re-write that query for the 10th time.

With that being said, I still think there's some truth to this criticism, in
that it's not as easy/common to be able to build an abstract query that does a
common operation on arbitrary data. You can't import trend_forecast.sql, hand
it arbitrary time-series data, and generate an N-month linear forecast from
your historical data points. At least, not easily in ANSI SQL.

~~~
wodenokoto
My problem with views is that you do the join before the filtering.

~~~
mumblemumble
I'm curious, what DBMS are you using whose optimizer can't see through views?

For what it's worth, MSSQL, PostgreSQL and Oracle don't have that limitation.

~~~
wodenokoto
Bigquery.

If a make a view that joins table a and b, and I query that view with a
filter, bigquery won’t push the filtering down unto a and b and then join.

~~~
mumblemumble
Ah. That's not too surprising, then, is it? Bigquery's column-oriented, so I
imagine efficient row-selective queries isn't really what it's for in the
first place.

------
cjf4
One of the things I'm constantly baffled by given the growth of data science
and analytics is that data modelling isn't treated as a first class concern.
It's absolutely fundamental to doing quantitative work efficiently at an
organization with any amount of complexity, yet the majority of people in the
field seem to be unaware of the concepts.

This ignorance is especially surprising given that it's essentially a solved
problem (Kimball), yet if you talk about data modelling, people usually think
regression, not schema.

~~~
boublepop
Could you elaborate on why you think data-modeling is a solved problem? It’s
seems like an opinion warranting more than a one word explanation.

~~~
mumblemumble
I'm guessing that the implication was that Kimball's dimensional modeling
techniques (so, in a nutshell, snowflake and star schemata) will get the job
done in almost any case.

I'm not sure I would 100% agree with that - e.g., denormalization, while
useful for many things, isn't always your best option. But I would say that
that there are a lot of tools in the box, and that is absolutely one of the
critical ones to know.

~~~
atwebb
> denormalization, while useful for many things, isn't always your best
> option.

From my view, it is generally not a good option for cases it wasn't designed
for, an example being non-analytical reporting. If you are running operational
support, getting the source data immediately and aggregating/displaying can be
more helpful than modeling for analytics workloads. The line between these is
blurred in most orgs. To the OPs point, data modeling seems like a sidenote in
most analytical discussions. You can accomplish a lot using the star model
which is essentially:

Prepare things to be fast by sorting them into proper groups (fact/dim/bridge)

Rely on ints

Store atomic data

Provide summaries/aggregates

Model after the questions you ask; not the system it comes from

------
swalsh
" Up till a few years ago, the traditional way of managing data (in SQL-based
RDBMSs) was considered a relic of the past, as these systems couldn't scale to
cope with such a huge amount of data. "

I must have missed the boat on this one. I remember in 2010 there was a brief
period of time where NoSQL was in fashion, but it rightfully died pretty
quickly to a small set of specialized use cases. There have been some cases
where big data systems have replaced more traditional rdbms systems, but now
you can use SQL for those too (like Hive SQL).

SQL is the one skill that has not become obsolete in the course of my career.
Frankly I've started relying on it more, because it never goes obsolete. Also
it's fast as hell. When I first started my career I did C#, and the .net
Framework 2 was fairly new. Since then WinForms, and WebForms have gone away.
ORM's changed, Javascript changed. Then I moved to Ruby, and Python, and PHP.
Those ecosystems have evolved too. But the one thing that I learned 15 years
ago, that I still use every day is SQL.

~~~
gwbas1c
> I must have missed the boat on this one. I remember in 2010 there was a
> brief period of time where NoSQL was in fashion, but it rightfully died
> pretty quickly to a small set of specialized use cases. There have been some
> cases where big data systems have replaced more traditional rdbms systems,
> but now you can use SQL for those too (like Hive SQL).

A few weeks ago I had a lead server-side architect call me up and ask if my
client could tolerate eventual consistency. My answer was that "he could get
away with that 10 years ago but that NoSQL went out of fashion and SQL came
back in again."

To put it in a different context: Some people just go through periods in their
life where they don't keep up and don't realize what was "new and exciting" 10
years ago turned out to be smoke and mirrors.

~~~
pdexter
What's the connection between NoSQL vs SQL and eventual consistency?

~~~
dragonwriter
NoSQL is a bad name that really doesn't have anything to do with SQL per se,
but is a shorthand for non-relational datastores (no one calls non-SQL
relational systems “NoSQL”), which, while they depart from the relational
model in a number of different ways, typically include eventual rather than
strong consistency to support liveness in a distributed context.

------
harshaw
This quote irks me: "Any tool that relies heavily on SQL is off-limits to
business users who want more than viewing static charts and dashboards but
don't know the language"

The good business folks know SQL and aren't afraid of it. I used not to be
sure of this until I worked in an organization where most PM types use SQL
with comfort.

~~~
anbotero
_know SQL_ is kind of hard to describe. I also have business people that know
SQL, yet don't use (or request, if they knew there are not) indexes. So we get
this amalgamation of monstrous subqueries all hitting the same unoptimized
fields, of course, blocking the database.

I created a mini-training for the onboarding process, so they could at least
look for things without killing all our read replicas. But that’s just one
step; some people are just not _good_ at it.

Fear... fear is having everyone blocked because one person is killing all the
replicas for them to work on.

~~~
nemothekid
That sounds more like an ops problem than a business one. I’ve worked with a
number of a non-technical roles that dabble in SQL, and it’s almost always
they are querying a data warehousing solution (e.g Redshift, Pig, BigQuery)
that doesn’t support indexes to begin with and most times just have to live
with the performance.

It’s not their job to optimize the database, and it’s no different from seeing
that a dev pushed a new feature that causes bad performance as well

~~~
anbotero
It’s not their job to optimize the database, but it sure is to recognize the
basic bottlenecks in their scripts. My training addresses things they have to
take into account, and how to approach us when they think they need
optimization to what they are doing. It’s just that some of them don’t do it.
I made it so it’s part of their onboarding process, so they should know who to
contact for these things, yet some of them start literally complaining instead
of contacting the team that could help them out.

------
trollied
Don't really get this article. SQL never went away or got less popular. Always
been in the background doing its awesome thing.

~~~
mnky9800n
In data science it has been an uphill battle between the people who only think
csv exists and those who switch to a new nosql flavor every 6 months

~~~
overcast
CSV/Excel has run the world since the beginning, that's going to be a never
ending battle. Large orgs are still entrenched into SQL when their enterprise
ERP/MRP systems run off Oracle, and their analysts are extracting reports to
Excel.

~~~
tartoran
Right after flat files which are still in use on mainframes.

------
blowski
I don't really understand what this article is trying to say, other than
"isn't Looker great!".

Having timely and detailed data available to a wide range of people in the
organisation is now seen as a competitive advantage in many industries.
There's a lot of tech out there to help with this.

But I could have written this article talking about how companies like Reltio
are relying on NoSQL solutions to "empower the enterprise", or how Firebase is
allowing startups to not worry about data structures, or how HSBC is deploying
blockchain solutions, or how Spark means you can combine data from all
different sources. It would still be just as accurate and meaningful. As it
is, it just sounds like an infomercial for Looker.

~~~
contravariant
The weirdest part to me is that they went from 'Everyone is back to using SQL
because it's good at querying and you don't need to learn a proprietary
language' to 'Let's use LookML instead'.

~~~
buremba
Correct but the problem with raw SQL is that it's not composable easily.
LookML essentially provides that composability on top of SQL.

~~~
garyclarke27
I don’t agree - I find SQL easy to compose using CTEs and Functions - Postgres
12 is amazing for this because chained CTEs are now just as fast as
traditional nested joins (which can be hard to decipher). Also lateral joins
are great for composing set returning functions - Postgres joins them
automatically - like magic.

~~~
buremba
CTEs solve the problem to some extent only for the data-sources. You often
need to have snippets, expression functions, being able to compose multiple
expressions, be able to compile SQL queries that are both efficient and
readable if you want to be able to cooperate with different data analysts.
It's not because SQL is limited, it works the way it's designed.

Take this example: [https://github.com/rakam-
io/recipes/tree/master/segment/ware...](https://github.com/rakam-
io/recipes/tree/master/segment/warehouse/page) It also makes use of CTEs and
lateral joins but it also needs to use a templating engine (Jinja) for data
models and Jsonnet for analysis models similar to LookML in order to provide
that composability.

------
oneofthose
DBT (data build tool) [0] embraces this idea - it's like make for data
transformation. Just like make its syntax is sub-optimal. But that's the only
draw-back. There is an open source version, it generates documentation
automatically, you can materialize tables on a schedule, it allows you to
write unit tests for your data ("this select must return 0 rows" kind of
tests). I'm not affiliated with them, just happy user.

[0] [https://www.getdbt.com/](https://www.getdbt.com/)

~~~
mmeasic
Yes, that was my first thought when I opened the article. dbt is amazing and
basically converts your SELECT statements to CREATE TABLE statements.

I guess they target both EL from the ELT, while dbt relies on data being in
your data warehouse.

------
iblaine
The problem with modeling data isn't the lack of tools or the need to approach
this problem differently. Plenty of solutions exist. People just don't care,
and this is ok.

It used to be that you needed tools like Informatica and Kimball inspired
datamarts, but databases are now bigger and faster. Whatever data modeling
problems you may have, you can easily clean up in an ETL or a BI layer, with
relatively little effort. This makes tools like Looker, dbt, and Holistics a
luxury and not something you need to have. I wish the industry would put more
effort into defining clean data models, but I think that ship has sailed. The
prevailing trend seems to be to create Data Lakes, add a BI layer, then call
it a day.

[edit] Also...some database points. The industry never shifted to using NoSQL
to replace RDBMS systems. But event processing matured, NoSQL db's are ideal
for storing unstructured data, so you see them in data engineering stacks.
Greenplum is a free MPP database that has been around for nearly a decade. The
point about Spanner SQL is interesting for the fact that Spanner evolved from
NoSQL like methods to a SQL like dialect, but Spanner is a unique flower in
the industry, due to being an HTAP db.

~~~
huy
Can you elaborate more on what you meAn by “creating clean data models”?

~~~
iblaine
Using Inmon or Kimball's approach to modeling data structures is clean. The
thing with modeling data is there's no one right way but there are many wrong
ways.

------
ckastner
The article doesn't mention performance.

I haven't benchmarked this yet, but after my first experiences with somewhat
complex data transformations in numpy and pandas, I was left with the feeling
that despite them being optimized, any modern RDBMS would still have run
circles around them.

They've been optimized to this kind of stuff for decades, after all.

~~~
barrkel
Most modern RDBMS are optimized for point-lookup queries and have a very hard
time choosing good plans when you're selecting millions of rows amongst
billions.

Remove the assumption that the end result is going to be a row or ten (which
usually have good obvious plans with clear indexes), and the query planner is
forced to make decisions about choosing the best plan amongst very costly
alternatives. The exact costs are sensitive to assumptions about distribution,
I/O costs, memory costs etc. and those assumptions need delicate tuning,
especially when you don't have hinting as a tool (e.g. Postgres basically
doesn't let you give many hints - you can kind of force join order, but the
biggest tool is the CTE optimization barrier - I can get 100x speedups by
moving subqueries out into CTEs and ensuring they're opaque).

To write efficient big queries, you need to think in terms of data flow and
write declarative SQL with an execution plan in mind; it reminds me of my days
tweaking source code to encourage a compiler to make certain register
allocation decisions.

PG is a bitch here. It's often easier to get good performance by fetching one
query, transforming it into a comma-delimited string, and then injecting it
into the subsequent query (possibly in batches) to be sure you're getting the
plan you need.

MySQL's planner is kind of stupid, but it's predictably stupid and there's
lots of hinting tools available. I find it a lot easier to make it work fast
for big queries. Sometimes it simply doesn't implement the best strategy (e.g.
no hash joins), of course, so sometimes it's not as fast as you could get with
PG, but it's usually easier to outperform.

~~~
keanzu
> To write efficient big queries, you need to think in terms of data flow and
> write declarative SQL with an execution plan in mind; it reminds me of my
> days tweaking source code to encourage a compiler to make certain register
> allocation decisions.

That mirrors my experience with Oracle. The only difference is that there's a
hinting mechanism provided which will _almost_ always allow you to force the
execution plan you want.

~~~
iimblack
Ugh hints. The things they almost always recommend you don’t use because the
optimizer should know better. Unsarcastically, this is usually true. More
often I see people who think they’re being clever killing performance with
undocumented hints instead of either letting the engine do its work or looking
for other ways to optimize.

------
Twisell
Being a total stubborn asshole to my boss over last decade because I refused
to even investigate about how we could replace our old and "complicated" SQL
integration/export workflows with a modern and "intuitive" visual proprietary
ETL this article is a real relief for me!

I'm now officially a bleeding edge DevOps with 10 years expertise on the brand
new "old school" ELT (Extract,Load, Transform).

LoL

~~~
barrkel
Complicated SQL is definitely complicated, though. Get an expected 1:1 join
wrong (so it ends up 1:n) and duplicates muck things up; and typically unit
tests don't catch these kinds of errors because the test set is minimal.

Long scripts of SQL with intermediate steps (e.g. building temp tables) are
hard to reason about and debug, and especially painful when they fail only in
rare production scenarios.

Developers find it a lot easier to reason on an item by item basis, and not on
a set or batch basis, and it's very tricky to get error handling and
transactions right on a batch basis unless you want to fail the whole batch.

Putting all the work on the database also means you need to scale up that
layer. From another POV, it can be simpler to have cheap point lookup,
secondary indexes in stores that are tuned for the specifics of the queries,
and get performance back by scaling out the compute.

~~~
Twisell
1.If you rely on an ETL to automagically remove duplicates you are lying to
yourself because you treat external symptoms, the illness is the underlying
query and hiding this fact can only lead to more headaches later on.

2.Testing for duplicate is a SQL one-liner

3.If your data model is well conceived you can actually unwanted 1-n
relationship to be inserted in the first place. Yeah sharding database is
coming late to the game but ACID compliance is a killer feature. Code to
manage inconsistencies in NoSQL is way less trivial as far as I know... unless
it's ok to have an inconsistent database... Guess it depend on use case but
ACID seems nice for any important dataset.

~~~
barrkel
I think you misunderstood me on the duplicates. I'm talking about how easy it
is to e.g. left join an attribute from another table and under-qualify the
association so that you accidentally pull in unwanted rows. It's not that you
have duplicates in the data; the data is correct and the model is correct.
It's rather that the query is incorrect and the set of tests, owing to them
typically being written by the same person who wrote the code, doesn't
populate the association table with more rows than expected.

I mention this because it's a bug I've seen crop up several times in
production. When you write 'join' you're typically fetching data to add extra
columns to your result set; it's relatively unusual to want a row product in
the absence of group by & aggregates, or an ORM doing eager loading of a
detail table. But nothing in SQL requires you to state your product
expectations on the join, apart from left/right outer vs inner join. I
personally think SQL could do with forcing people to be more explicit on this
point.

I fully agree on the inconsistencies in NoSQL; I think the most sensible way
to use NoSQL is as a document store without relations, and keep the relations
somewhere else. Manually maintaining invariants (especially of the
denormalized bidirectional kind that some stores require) in an environment
with less than complete transaction support isn't my idea of fun.

~~~
Twisell
It's actually worst than that because cartesian product is the default mode of
SQL. JOIN clauses are already a kind of mitigation that force you to be more
explicit.

With modern SQL even if you only mention no table at all in the FROM clause
you can get "duplicate" if you use set returning function (see:
[https://www.postgresql.org/docs/current/functions-
srf.html](https://www.postgresql.org/docs/current/functions-srf.html)).

This is both why you should be careful and why this is a really powerful tool
to manipulate datasets. But usually "everything is a cartesian product" is one
of the first (and yes very important) thing you learn of SQL.

~~~
barrkel
Yes, I know, everybody knows. My point is that SQL doesn't help you catch
those errors where your join clause doesn't match rows uniquely. You can know
that the product is the default and still forget that extra bit in the
predicate.

------
sixdimensional
It’s a little bit off topic, but the tone of this discussion brings up an
interesting conversation I’ve been having with some of my colleagues in the
same age group - a perceived skills gap between the newest developers and
ourselves.

I don’t mean that in a negative way. I mean it in the sense that many of the
newest developers don’t know where their cloud based NoSQL database came from
(for example). They never were taught the history of what came before, during
and after RDBMS. They are only now rediscovering some of what the “old” tools
could do.

Many of these developers seem eager to learn, and I am happy to mentor them
and teach them the history that I know.

But it has surprised me, it almost feels a little like so many years of waves
of marketing and hype maybe actually had a real detriment to teaching people
what is real, what is the best tool for the job in different cases, etc.

I have no real evidence other than my anecdotal experience, but this article
lends credence to the argument that some never learned or never were given the
time to understand the discipline of databases.

Possibly, the discipline of databases and related development has just been
continually developing and never settled, so that is why the curriculum hasn’t
kept up. But, it really does concern me when a new developer doesn’t even
understand what a JOIN is.

Edit: Or even moreso that SQL is an interface to a data engine, and was not
necessarily always tightly coupled to relational databases (although it
evolved often in lock step with them which is why you see them there more
often).

------
0x5002
> Instead, NoSQL systems like HBase, Cassandra, and MongoDB became fashionable
> as they marketed themselves as being scalable and easier to use

HBase did not - the project has always been very clear that they cater towards
a very specific set of use cases - fast writes with little schema constraints,
fast single-key and range/fuzzy lookups, not big ETL pipelines.

Even during the rise of Hadoop (everything is a file... I mean file based!)
and the subsequent absorption of that into the Public Cloud vendors, SQL has
always been there, just wrapped in different tools. These days, someone else
hosts it and it's now called Athena instead of Hive, but fundamentally the
same thing and has been the same thing.

Even Apache Sparks entire Dataset/Datframe interface yields SQL-like execution
plans, exposing the same functions that an RDMBS would, just in
Scala/Python/R.

------
code4tee
Yes. Plus all the SQL-like things you can do now. For example storing data on
S3 and then querying it with AWS Athena is a simple and powerful way to keep a
huge archive of data that you may want to query at some point.

Also, never underestimate the power and speed of a well tuned SQL-family
server/cluster, even at surprisingly large scale. A lot of use cases for the
older “Hadoop Cluster” type stuff have been overtaken by these approaches.
I’ve seen a lot of operations spend silly sums to build ultimately quite
clunky Hadoop-based systems when really they probably just needed one half
decent SQL admin and a well tuned cluster.

------
__ian__
BigQuery is good. Looker is OK. This reads like paid-for PR content.

~~~
keanzu
It's a blog page on holistics.io so it is paid-for PR content. Or a blogvert.

------
davewritescode
I think part of the real shift back to SQL is in part because of the fact that
modern streaming platforms like Kafka give relatively simple mechanisms to
implement eventually consistent databases. Aside from the simple key-value
store use case, this was often a very good reason to use something like
Cassandra instead of MySQL.

------
lostsoul8282
Great article. SQL is so widely known and relatively simple to implement that
I'm baffled business don't start with it as a solution and then work their way
into our solutions if they find it doesn't meet their needs.

It's so easy and quick to get started, it should be most people's first
choice.

------
lincpa
[Everything is
RMDB]([https://github.com/linpengcheng/PurefunctionPipelineDataflow...](https://github.com/linpengcheng/PurefunctionPipelineDataflow/blob/master/doc/Everything_is_RMDB.md))

~~~
threeseed
Except when you have large, telemetry style datasets e.g. web/product
analytics which won't fit. Or when you are trying to build a wide table and
you run out of columns. Or when your favourite SaaS products gives you highly
nested JSON data.

RDBMS works great up until the point that it doesn't.

~~~
jacques_chester
> _Except when you have large, telemetry style datasets e.g. web /product
> analytics which won't fit._

Web analytics was one of the first applications for Greenplum. My
understanding is that Yahoo collected tens of billions of events per day in
the mid-2000s.

> _Or when you are trying to build a wide table and you run out of columns._

HAWQ can run SQL queries over Hadoop clusters. Clickhouse's table width is
limited by how much RAM you give it.

> _Or when your favourite SaaS products gives you highly nested JSON data._

This is why major databases have JSON querying capabilities and why it's been
added to the next SQL standard. PostgreSQL even allows you to define indices
on fields inside your JSON structures.

Better yet: decompose the highly nested data. Relational databases begin to
shine when you get past at least first normal form.

> _RDBMS works great up until the point that it doesn 't._

RDBMSes _do_ work great until they don't. Which means they are almost always
the best solution and almost always _remain_ so.

Folks regularly overestimate the size of their problem and underestimate the
capabilities of the literally dozens of RDBMSes now available for use. Yes, it
irks me.

Disclosure: I work for VMware, which sponsors Greenplum development.

------
kohtatsu
A few days ago I designed a DSL for CREATE TABLE statements.

[https://gist.github.com/nomsolence/69bc0b5fe1ba943d82cd37fdb...](https://gist.github.com/nomsolence/69bc0b5fe1ba943d82cd37fdbd23df6d#file-
db-png)

Being able to focus on the relationships without worrying about commas is
nice.

I'm still writing the compiler (it's my first, I'm sure it will be awful), but
I'm starting from the finish so it's been easy for me to pick up where I left
off; I started by deciding the language, then writing the outputs by hand for
the tokenizer, my two stages of AST, and the actual SQL.

------
cube2222
Hey, just wanted to chime with a tool I'm a co-author of, OctoSQL[1].

I too very like a common interface to various data sources. OctoSQL allows you
to use plain SQL to transform, analyse and join data from various datasources,
currently including MySQL, PostgreSQL, Redis, JSON files, CSV, Excel.

However, we're very inspired with yesteryears paper "One SQL to rule them all"
and should have ergonomic steaming support with Kafka as a new datasource
available very soon.

[1]:
[https://github.com/cube2222/octosql](https://github.com/cube2222/octosql)

~~~
marvel_boy
Do you plan support for Sqlite?

~~~
cube2222
I see no reason to not add it, but we're currently busy so it'll have to wait
till we either have time or somebody contributes it. Should be an hour of
copypasting one of the sql data sources and adapting it for sqlite.

------
exabrial
Does anyone have any ML Packages that directly integrate with SQL Databases? I
feel like the ETL market is well covered with a myriad of tools, but our data
scientists at the end of the day still want to extract giant CSVs b/c their
python progs "just don't work natively" on an RDBMS (they probably do, but
it's not the way it's done, and these guys aren't programmers for their first
job anyway).

~~~
olooney
They exist, they're just not really state-of-the-art.

Apache MADLib[1] for PostgreSQL and Greenplum. Everything is done inside of
SQL; for example, to fit a linear regression model, you execute the query
"SELECT madlib.linregr_train(...)".

MLlib[2] for Spark now has a DataFrame based API (spark.ml) for easy
integration with Spark SQL.

Every such library I've seen has been a little behind the times, usually
offering nothing more advanced than logistic regression, random forest, maybe
a multi-layer neural network. I've also never seen a SQL-based ML library that
offered GPU training.

For reference, state-of-the-art neural network libraries like TensorFlow or
PyTorch support GPU-based training (something like an order of magnitude
improvment in training performance) and automatic differentiation (allowing
for easy specification of complex models.) For trees, state-of-the-art now
includes boosted tree algorithms like XGBoost or CatBoost. Random forest is
still a workhorse, but a library which _only_ provides random forest (like the
two mentioned above) are a little behind the curve.

It seems that libraries that focus on doing one thing well (ML) are able to
offer more features and have an easier time keeping up with times than
libraries that also take on the burden of SQL integration. Which isn't to say
MADlib and Spark MLlib are _bad_ \- not everyone needs state-of-the-art
algorithms all the time, and it can be convenient to be able to do so simple
things fully inside the SQL environment - but they're never going to be
cutting edge, so they're not going to get much attention.

[1]
[https://madlib.apache.org/index.html](https://madlib.apache.org/index.html)

[2] [http://spark.apache.org/docs/latest/ml-
guide.html](http://spark.apache.org/docs/latest/ml-guide.html)

------
32gbsd
Looks like a promo

------
threeseed
Just a slight conflict of interest having a company that makes SQL-based data
modelling tools telling us that SQL-based data modelling in on the rise.

But given how many companies have setup data lakes with unstructured and semi-
structured data (think SaaS exports) and how SQL layers have largely been
unimpressive not sure it's the case.

~~~
tkyjonathan
And a lot of those lakes remain untouched for years..

~~~
EdwardDiego
It's not too far from datalake to dataswamp.

------
sashavingardt2
This article was written by yet another kid a few years out of college who
doesn't know the history, the tooling and thinks he and his teammates are
providing a solution to an existing problem. What he doesn't realize is that
SQL reusability and data modeling have been solved 20 years ago.

~~~
jszymborski
Have you seen their tool? Without having used it, it looks pretty slick. I
think you're being a little too dismissive, considering "It's All Been Done
Before(TM)" and that they seem to be building a rather nice UI for BI... a
business with multiple players all making a pretty penny.

------
IpV8
I'm surprised to see only a cursory mention of Snowflake in this article. In
my experience, they are really the pioneers of the new distributed, cloud-
first database. They really enabled large scale relational data warehouses,
and are still miles ahead of even the big cloud players.

~~~
edmundsauto
What is it you like about them? IIRC, their base plan starts at $1500/mo, so
it's a high end offering that I haven't played with too much.

------
tkyjonathan
Already started a DataOps team in my company!

------
cryptonector
That's because SQL is pretty awesome.

