Data Fabric vs. Data Mesh: What's the Difference?

DrBenCarson · on Nov 18, 2021

These are two buzzwords that I still, for whatever reason, feel no momentum for.

recursive · on Nov 18, 2021

So is data lake over now?

F_J_H · on Nov 19, 2021

Now hearing about "data lake houses"....

djohnston · on Nov 19, 2021

Dried up in 2019

claytonjy · on Nov 18, 2021

So Data Fabric is a bolt-on solution that we can't even honestly attempt today, while Data Mesh requires everyone in the engineering org to embrace data products?

Sounds like startups might adopt Data Mesh, while it's easy to reorient org-level behavior, but big enterprises are doomed to carry forward their current messes until AI magically delivers us Data Fabric as a viable option.

Anyone having more success with these approaches than my pessimistic take implies? Is it easier to adopt Data Mesh in a large org than I realize? Is Data Fabric a more viable option than the author considers?

apahwa · on Nov 19, 2021

it isn't either-or. it is both.

if you view the data pipeline, you start with Data Mesh and individual teams create datasets that maintain the data for their team's domain. following that, you then get the data fabric which blends those domain specific datasets together automatically based off of the combination of a defined data model and the declared needs of consumers. a centralized team then owns those tables but not the business logic.

an example of this is Airbnb's data stack. You can read about it here: https://link.medium.com/qzqciW7Milb

ekzhu · on Nov 18, 2021

There is no reason for both approaches to not coexist: a centralized catalog managed by a small team, setting the “gold standard” for the many decentralized data producers and curators, who are incentivized to maximize their impacts (i.e., usage) by having higher quality data following the standard.

Another thing to point out: besides relying on the future promises of ML, there are already many signals that can be used by a centralized catalog for data discovery. For example: data sketches (MinHash, Hyperloglog) for joinable datasets, social signals (likes, comments, stars, etc. see Alation and Select Star SQL), lineages through data movements (e.g., Azure Data Factory and Azure Purview). If the centralized catalog uses those signals, then the data producers are incentivized to provide them for better visibility.

abadid · on Nov 18, 2021

I agree that in theory they could both co-exist for the reasons you state, but in practice I think it's unlikely a company that invests in a data fabric (which is largely a technology cost) is going to simultaneously invest in the incentives for the data product creators that are necessary for the data mesh not to become the wild west.

chrisjc · on Nov 18, 2021

I had never heard of Data Fabric before. Now that I have, I'm not sure they can exist without each other. In fact I would imagine that the metadata accumulated by through the data fabric would/could end up driving the data mesh implementation.

Perhaps apps and services will end up having to go through data-coverage and data-quality verification steps before being released. Analytics (and caching, joins, etc) as an after thought is unacceptable in this day and age.

nerdponx · on Nov 18, 2021

Maybe it's good to think of "fabric", "mesh", "warehouse", and "lake" as design patterns for data.

oconnore · on Nov 18, 2021

This reminds me of David Mindell's work on situated autonomy: https://www.youtube.com/watch?v=M0-tafxh7gc https://www.robotics.org/userAssets/riaUploads/file/20-OurRo...

> The highest levels of technology are not necessarily full autonomy, but situated autonomy

> All autonomous systems are joint human-machine cognitive systems

Fundamental questions: Where are the people? Which people are they? What are they doing? When are they doing it?

star-trek-fleet · on Nov 18, 2021

A surprisingly well-organized and meaningful description of 2 marketing concepts, seems targeted towards CXO's in making IT solution buying decisions.

AtlasBarfed · on Nov 19, 2021

They both get muddy when dropped in the Data Lake.

geoduck14 · on Nov 19, 2021

I haven't heard of a compelling reason why the fabric and mesh are superior to the Data Lake.

Decentralized adding of data. Sure, but at the expense of giving up Enterprise wide access governance and performance. It makes me uncomfortable.

DemocracyFTW · on Nov 21, 2021

> At its core, the Data Fabric is about eliminating humans from the process as much as possible.