How do you detect the ground truth for training the model? Do you manually label...

Alir3z4 · on Sept 28, 2021

Yes, simple classification. Nothing fancy.

Basically, pulled the database into CSV file and anything that was published before the bad content was classified as HAM.

We had content that were OK, so marked as HAM and then our new bad content all marked as SPAM.

When deployed to production for some hours HAM content got wrongly marked and model got trained on them as well which made so many confusion but the problem was taken care of once the model got properly tuned and safer to let it be automated.

benjaminjackman · on Sept 28, 2021

Hmm I wonder if it picked up timestamps as its initial filter.