Hacker News new | past | comments | ask | show | jobs | submit login

I mean it is extremely common to mislead (often unintentionally and with the best motives) and use the performance of collecting data to give credence to that. The alternative is to be up front about your methodology, which means not making assumptions at multiple stages in the process, and not shading the conclusions by 'looking for other factors' or other things. When you do multiple rounds of 'fixing' data you are just injecting assumptions about the true distribution, which violates the entire point of collecting data at all. If you 'know' what the answer should look like, just write that down that assumption and skip the extra steps, OR ensure the methodology will allow the data to prove you wrong, or allow the data to show a lack of a conclusion (including by lack of data).

I realize I'm taking a very harsh stance here, but I've seen again and again people 'fixing' data in multiple rounds, the effect of which is any actual insight is removed in favor of reinforcing the assumptions held before collecting data. When you do this at multiple steps in the process it becomes very hard to have a good intuition about whether you've done things that invalidate the conclusion (or the ability to draw any conclusion at all).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: