Data Cascades in Machine Learning (googleblog.com)
52 points by theafh 7 months ago | hide | past | favorite | 2 comments

These kinds of papers are my favorite ML papers (see also: 'machine learning is the high interest credit card of technical debt'). The org design and project aspects of ML projects are some of the most pernicious issues I face, while the modeling and other fun stuff often ends up not being that hard once the right pieces are in place.

Good article. I would suggest following it up with https://pair.withgoogle.com/chapter/data-collection/

Monitoring drift in the inputs, predictions, and performance are all crucial for any models in production. Personally, for input/target drift, I prefer using the D-stat from Kolmogorov–Smirnov test to look for any distribution changes.

