Hacker News new | past | comments | ask | show | jobs | submit login

"Not to mention that you can also do individual analysis as well."

This is the key point to argue against in the context of people, privacy and mass surveillance.

It is the touchstone of privacy, anonymity and crowd protection.

Regarding noise suppression: yes, the more queries (available data whether raw or extracted) the more you can filter (ask a Kalman student) to reduce your error bars and margins. This is a reason why DP is overhyped. Also, if there are no differences between queries, then data is redundant. See deduplication (database) or scaling (measurement).

About the analysis pipeline: this is why the mantra "know your detector". Coincidentally, this is why releasing only recorded datasets is next to useless for people outside the given research group. You would need to capture detailed knowledge of your data taking operations and instruments, which happens rarely, if ever. Please cite a thing such as "the NASA pipeline", perhaps you mean a given mission/experiment? In any case, detector recalibration is a usual, almost daily activity...




> Please cite a thing such as "the NASA pipeline", perhaps you mean a given mission/experiment?

The specific pipeline I was referring to is the Kepler pipeline that NASA uses to take their raw pixel data and produce photon counts that everyone uses for their research (this wasn't a detector issue, it was a software bug at the final stage of the data publishing process). The point was not the pipeline issue, it was that noise is everywhere.

But as to your point, yeah okay. Maybe I shouldn't talk about statistics when that's not my field. :D




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: