I wouldn't call it to 'tweak' the data collection. He is simply normalizing the ...

waqf · on Feb 22, 2011

He is right to normalize the results, but parent's point is that he is wrong to do that by modifying his data collection.

He should just collect as many commit messages as possible, then divide the profanity count for each language by the commit message count. Because that has lower standard error [and no more bias] than what he did.