Hacker News new | comments | ask | show | jobs | submit login
How We Deal with Data Quality Using Circuit Breakers (intuit.com)
66 points by r4um 62 days ago | hide | past | web | favorite | 4 comments

SO many angles on this topic -- aside from the professional and clearly competent article on data engineering .. but I wont say 'modern' engineering, to underscore the following..

The personal computer became wildly popular decades ago, by empowering individuals (not feeding dark or dysfunctional central-server patterns that existed at the time). The technical differences in designing what used to be Quicken, and what is now Quickbooks, underscores the larger change from user-centric to streams and server-based software.

When the Intuit/Quicken products were first implemented decades ago, it was a user-centric software problem.. the Graphical User Interface (GUI) meets the human user with context and goals, which executes on the Operating System that the software sits on .. The engineering involved required accuracy and consistency for the purpose of human activities.

Fast-forward more than twenty years, and this engineering is centered on tens and hundreds of thousands of 'streams' to a central service. The smarts are going toward the categorization, classification and filtering of streams, for the purpose of the whole.. much more like an ant colony or similar.. where the individual user is not at all the point, and in fact is disposable to some extent. Many, many corollaries are possible here..

Again, great work by the engineering teams and this author, however, it is not at all certain that the enterprise, law-enforcement and oversight here is trust-worthy over time. History has shown humans to do bad things to other humans, for many reasons. Putting money flows into concentrated streams like this creates efficiencies, and is also highly susceptible to manipulation, not at the moment-to-moment data ingestion side, but rather at the long-term management side.

Am I the only one who finds it concerning that they capture like everything ("Data entered by customers in using the products" and "Clickstream data capturing usage of the product") and persisting it just for the sake of having it and finding a use for it later? Also they collect data directly from 100s of relational databases, that sounds like a terrible idea to make your DB schema an API for data collection.

> Am I the only one who finds it concerning that they capture like everything and persisting it just for the sake of having it and finding a use for it later?

I am pretty sure this has been a standard practice for at least a decade now. Isn't that what the "big data" meme is about? Store everything, because you can always get more computational power and statistical techniques to extract value from it later on.

And also you sometimes don't have capacity to do all the analyses you want at the start. Throw all the computational power you want at it, but if you don't have enough humans down stream to make decisions from it, then storing that data for later use makes a lot of sense.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact