
How We Deal with Data Quality Using Circuit Breakers - r4um
https://quickbooks-engineering.intuit.com/taming-data-quality-with-circuit-breakers-dbe550d3ca78
======
mistrial9
SO many angles on this topic -- aside from the professional and clearly
competent article on data engineering .. but I wont say 'modern' engineering,
to underscore the following..

The personal computer became wildly popular decades ago, by empowering
_individuals_ (not feeding dark or dysfunctional central-server patterns that
existed at the time). The technical differences in designing what used to be
Quicken, and what is now Quickbooks, underscores the larger change from user-
centric to streams and server-based software.

When the Intuit/Quicken products were first implemented decades ago, it was a
_user-centric_ software problem.. the Graphical User Interface (GUI) meets the
human user with context and goals, which executes on the Operating System that
the software sits on .. The engineering involved required accuracy and
consistency for the purpose of _human_ activities.

Fast-forward more than twenty years, and this engineering is centered on tens
and hundreds of thousands of 'streams' to a central service. The smarts are
going toward the categorization, classification and filtering of streams, for
the purpose of the whole.. much more like an ant colony or similar.. where the
individual user is not at all the point, and in fact is disposable to some
extent. Many, many corollaries are possible here..

Again, great work by the engineering teams and this author, however, it is not
at all certain that the enterprise, law-enforcement and oversight here is
trust-worthy over time. History has shown humans to do bad things to other
humans, for many reasons. Putting money flows into concentrated streams like
this creates efficiencies, and is also highly susceptible to manipulation, not
at the moment-to-moment data ingestion side, but rather at the long-term
management side.

------
mkreis
Am I the only one who finds it concerning that they capture like everything
("Data entered by customers in using the products" and "Clickstream data
capturing usage of the product") and persisting it just for the sake of having
it and finding a use for it later? Also they collect data directly from 100s
of relational databases, that sounds like a terrible idea to make your DB
schema an API for data collection.

~~~
tree_of_item
> Am I the only one who finds it concerning that they capture like everything
> and persisting it just for the sake of having it and finding a use for it
> later?

I am pretty sure this has been a standard practice for at least a decade now.
Isn't that what the "big data" meme is about? Store everything, because you
can always get more computational power and statistical techniques to extract
value from it later on.

~~~
veritas3241
And also you sometimes don't have capacity to do all the analyses you want at
the start. Throw all the computational power you want at it, but if you don't
have enough humans down stream to make decisions from it, then storing that
data for later use makes a lot of sense.

