Adding noise and fuzzing has a long history in statistics since the '70s , and while it does work on large numbers, it almost always messes up the details ie. the error bars.
C.D. DP is essentially a cheap ripoff of the ideas implemented in ARGUS.
 1977 Dalenius, see Do Not Fold, Spindle or Mutilate movement and earlier:
Disagree. Data is why databases exist.
> Any "average" is simply readily apparent, therefore irrelevant for serious in depth analysis.
I said "aggregate", not "average". There are many kinds of aggregate analysis useful (in Astrophysics, you can take many different samples from different stars and use the aggregate to compute commonalities in the sample that you would've detect with a single measurement). There is more to aggregate analysis than averaging data.
As for the rest of your points, I'm not a statistician so I can't comment. Also, I didn't downvote you (HN rules).
But as you say: your "aggregate analysis" NEEDS "many different samples from different stars". Commonality is the result of your analysis based on different samples. But since they are common, you can go and sample and have the result without doing mass surveillance on every star.
ps: I am fully aware of photo stacking, but also note, that stars are not humans, see context of privacy. Please look at argus or sdcMicroGUI from CRAN to get a feeling for data utility vs. reidentification risk.
"Mass surveilance" reduces noise and lets you get more data in a shorter period of time (telescopes have large fields of view, but they can't make time pass faster). Stacking (which is what the technique is called in Astrophysics) is very useful in this case. Not to mention that you can also do individual analysis as well.
Actually, most interesting of all is that you can do this type of analysis on objects like neutron stars that we can't observe directly because they're too faint. Because noise in telescopes can be modelled as a Poisson process, stacking actually increases S/N in a way you can't do without making much bigger telescopes.
PS. I'm not a statistician, so I can only speak to what I know. But my whole point is that researchers do know how to deal with noisy data, regardless of whether or not that noise is man-made or not. Interestingly enough, I found out recently that the NASA pipeline actually breaks certain data sets they have released (which have papers written about them) so man-made noise is a problem regardless of whether or not it's intentional.
This is the key point to argue against in the context of people, privacy and mass surveillance.
It is the touchstone of privacy, anonymity and crowd protection.
Regarding noise suppression: yes, the more queries (available data whether raw or extracted) the more you can filter (ask a Kalman student) to reduce your error bars and margins. This is a reason why DP is overhyped. Also, if there are no differences between queries, then data is redundant. See deduplication (database) or scaling (measurement).
About the analysis pipeline: this is why the mantra "know your detector". Coincidentally, this is why releasing only recorded datasets is next to useless for people outside the given research group. You would need to capture detailed knowledge of your data taking operations and instruments, which happens rarely, if ever. Please cite a thing such as "the NASA pipeline", perhaps you mean a given mission/experiment? In any case, detector recalibration is a usual, almost daily activity...
The specific pipeline I was referring to is the Kepler pipeline that NASA uses to take their raw pixel data and produce photon counts that everyone uses for their research (this wasn't a detector issue, it was a software bug at the final stage of the data publishing process). The point was not the pipeline issue, it was that noise is everywhere.
But as to your point, yeah okay. Maybe I shouldn't talk about statistics when that's not my field. :D