Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The reason for data lakes appears in large enough organizations where it becomes exceedingly likely that there is some data that may be useful to you that's maintained by people you'll never meet in a department you don't know about, where it's impractical or even impossible to get them involved in your project that would consume this data.

It's not so much about data itself as an attempt to solve a communications and coordination organizational problem; you decouple sources of data and consumers of data (not the technical systems/databases, but the people and organizational units) to a 'hub-and-spoke' model where the providers of data just supply raw data without getting into a multinational project that takes a year just to identify the potential stakeholders for that data throughout a distributed organization with tens of thousands of employees.



Yea, data lakes are a tech solution to an org problem. Good or bad, it’s what it is.


Done right, they are awesome though. As a DS you can iterate a lot faster if you don't have to access multiple different stores for features or data.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: