I agree with this - when there's an economic incentive to get clean data, you get clean data.
For instance, there's a lot of manual clean up work put into things like training data sets for speech recognition because there has been a lot of investment there. Same with self driving I assume because so much $$$ got invested there.
Radiology scans or cough based COVID detectors or medical claims on the other hand? I wouldn't expect it. It's just researchers trying to get a quick paper without adequate funding.
For instance, there's a lot of manual clean up work put into things like training data sets for speech recognition because there has been a lot of investment there. Same with self driving I assume because so much $$$ got invested there.
Radiology scans or cough based COVID detectors or medical claims on the other hand? I wouldn't expect it. It's just researchers trying to get a quick paper without adequate funding.