Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
A Plan Made to Shield Big Tobacco from Facts Is Now EPA Policy (nytimes.com)
111 points by justin66 on Jan 5, 2021 | hide | past | favorite | 35 comments


"More transparency" and "only making policy based on public, not proprietary data" sounded great in theory (and aligned with how some scientific journals are leaning), so I was confused at first. But the crux of it was that

> The new rule, public health experts and medical organizations said, essentially blocks the use of population studies in which subjects offer medical histories, lifestyle information and other personal data only on the condition of privacy. Such studies have served as the scientific underpinnings of some of the most important clean air and water regulations of the past half century.


Here's hoping that this will lead to better (properly anonymized) public health datasets.

Open data has provided tremendous value in recent years, not just for transparency and civic engagement, but for innovation and progress. The fact that we can find large, detailed datasets for topics like sports and birdwatching but not for health is a travesty.


I don't know. I'm pretty good with this change. There is abundant evidence of activism in certain scientific fields in the last ten years (particularly on diet). Verifiable data seems like an appropriate hurdle to me for expensive and disruptive regulation.


You really can't take this kind of thing at face value. Do you think the party of "climate change is a hoax", "covid is a hoax", "the earth is 5000 years old", and "we should cut all regulations and taxes" just suddenly cares about scientific integrity? Or is it more likely that they found a new way to gut the rules that force business to pay for the negative externalities they create?


Maybe the bias should be we should take public health seriously, and if it costs some corporations some money for no improvement, that isn't actually the worst thing in the world.

Something like regulations to reduce lead in the air could be the biggest material improvement in people's lives since we eliminated lead from paint and gas. Bigger than any improvements we've made in education directly. But rules like these just get in the way.


Surely making everyone's health data public is a fair price to pay to allow regulations on smoking.


Because there are no regulations on smoking?

What would showing off private data do? Put a minimum age on smoking? Stop it from being advertised to minors? Ban ads in a majority of media? Put health warnings on packages? Put high taxes on cigarettes?

Oh wait, we already have that.

Short of prohibition, what new law do you want at the price of health data privacy? Health data is pretty much our last legally appointed sacred data point in the US and it's already weak enough as is. HIPAA is great and all, but it's still not enforced that great. At that since we are decriminalizing weed and alcohol prohibition didnt work... no. Everyone's health data is not a fucking fair price to pay for your short sighted emotional knee jerk.


okay real talk is this (a) a serious suggestion that health data privacy should be eliminated to buy us a regulation that we already have, or (b) sarcasm?

because I'm leaning (b) on principle from my own analysis of the tradeoff but there aren't powerful sarcasm flags in this post that I see, nor is there much in the way of arguments to convince me


Not OP, but scientific conclusions need to be verifiable, especially if they are driving policy. IMO you'd need a very good reason to keep any data related to such studies secret, and thats only if the data being kept secret isn't meaningful to the reproduction and verification of the results. Knowing who is X data point is most certainly relevant to independently verifying whether or not the data of the study is accurate, so i have a very hard time seeing how that should be kept secret.


Most Americans don’t realize HIPAA is less than 25 years old. They’ve never thought about why it might not be in their best interests or why their lives might be better without it.

The typical counterargument is that insurance providers would maliciously charge higher based on what they see. However, that’s silly since they already have that data today.


Smoking regulations can be substantiated with studies that do not rely on anonymous public health surveys.


Two separate issues here -

1. Make the underlying model public (e.g., you ran such and such regression, here’s what you included and here’s a table of coefficients). Inarguably a good idea. Totally unobjectionable. You should correctly be suspicious if the model (and the R, Stata, SAS or whatever code to produce it) is not available.

2. Make the underlying data public. Completely different! I have done research with (e.g.) detailed birth certificate level data. Such data are confidential by state and federal laws generally. Other researchers work with detailed income histories from the IRS. As another example, I have worked with electronic health records, which include names, addresses, health conditions, etc. Almost no one thinks this stuff should be public.

It is fair IMO to require researchers to describe what data they use and how someone else might get it (even if that includes the line “find a collaborator at the IRS”). But it is unreasonable to require researchers to make data itself public, when they may not have the right to do so.

What’s worse is that the reason this regulation is being proposed is to prevent research informed by private, confidential health data (eg birth certs) from being used to make public policy, especially in areas around pollution and environmental regulation.


I think this separation falls down badly in some cases: after someone claims a method worked but it's a dubious model, you can't possibly reproduce their results then there's no way to know that their model actually even worked once when they did it or if they just made an entirely stupid mistake (like using the complete data set for training).

Concretely there's applied ML papers on confidential military data, where I'm fairly sure the results are just bogus or cherry-picked or done with incorrect procedure, but with the data being unobtainable (even to those with clearance the data isn't materialized in some DoD database for reproducing) it makes the published results entirely unreproducible and unchallengeable.


Point 1 might be less clear cut in some settings. Obviously, if you're just doing a simple regression, it's no big deal, but more powerful models can memorize and expose parts of their training data. See for example: https://arxiv.org/abs/1802.08232


I have a tendency to believe that any research, in any field, veing used for any purpose, should be discounted if it doesn't present its source data. And in computer science, the source code. The fact that this doesn't happen 100% of the time already is baffling to me, it feels like science is broken.


This is the kind of black-and-white dogmatic thinking that is the tell-tale sign of software engineer applying their software engineering context to a context where it doesn't fit. I have no doubt many people on this board (not necessarily you) share a similar view, yet are similarly disdainful of privacy protections in the US around one's use of technology (e.g., cell phone tracking, DNS monitoring, cookies, ad-tech).


I suspect they're all for privacy. They don't want people's data made public, either. Rather, they're OK with the fact that policy can't be made, because they're not especially interested in the policy implications.

They likely assume that any regulation will be bad by default unless it's proven to them -- and won't put any effort into reading that proof when offered.

Software engineers get used to the notion that things either work or they don't. Policy is a much messier process, where you do the best that you can with incomplete information. And software engineers are generally comfortable enough under the status quo that they will set high standards for change -- and aren't interested in how that affects anybody not in their immediate circle.


That's definitely not me. The point is that an effort should be made to make available the data to the fullest extent possible. I don't think it's unreasonable to expect anonymised data to be made available. If the data connot meaningfully be anonymised that might be a special case, but I find it hard to imagine a scenario like that, unless the study only involved a very small number of participants (which would also make it less valuable). Maybe there are such scenarios though, I'm not stating categorically that there aren't, just that it seems unlikely, and if there is I'd expect them to be a small minority.


When your subjects are human beings, the rules are different - both for ethical reasons, and because if you violate your subjects' privacy you won't have any more subjects for next time.


Part of the EPA's mission is to protect human health. It is often illegal to release medical data underlying studies that link environmental factors to health problems.


So, we discount the research, and just go by what private industry says? Because that's the alternative, as industry will massage and cherry pick the data until it fits whatever narrative it wants. It has a lot of money at stake.


>massage and cherry pick the data until it fits whatever narrative it wants. It has a lot of money at stake.

People already do this when there's basically nothing at stake though.


Does public research do it even remotely to the same extent as what gets us advertisements like "Four out of five dentists recommend Marlboro cigarettes"?

Policy has to be based on something. Public research is being attacked because it's not perfect. Okay. If you're going to do that, you should apply the same expectations to private research, and also require that firms prove that their products are safe, to the same rigorous standards that you expect out of public researchers.


>"Four out of five dentists recommend Marlboro cigarettes"?

I don't see how that's any different than when congressmen get interviewed by the talking heads and say "the research is clear, there's been study X by Y and P by Q that show that what we need to do is enact <insert some asinine extremist legislation that that congressmen supports>". They're both lying in a plausibly deniable manner in order to market something to the public. Anyone who bothers to verify either claim will find it's all crap.

The problem isn't the research. It's that we blindly trust "the research" and people know this so the research gets manipulated and cherry picked and whatnot. Two hundred years ago these sorts of people and entities all invoked gods name in order to peddle crap. I have no idea what the solution is but I think whatever it is will involve the public being more disapproving of parties that behave in a slimy manner like this.


> I have no idea what the solution is but I think whatever it is will involve the public being more disapproving of parties that behave in a slimy manner like this.

One of the two parties is currently seriously considering not accepting the results of an election they lost. The leader of that party is busy calling up election officials, and threatening them into magically finding thousands of votes that will give him the victory.

And the rank and file of the party isn't distancing themselves from this behavior. Instead, they are turning their ire at the person who leaked it.

On the order of slimy political behavior that voters don't punish politicians for, this issue we're discussing won't even register. We're currently so far down the rabbit hole, I don't think we'll ever find our way out.

> I have no idea what the solution is

That's because there is no perfect solution. We don't live in a perfect world. We just have to find the best solution out of a space of sub-optimal ones. And in this case, public research is a much better starting point for policy than private research, or no research (which is what would happen if we applied your standard.)


If they upload the raw data publicly along with the source code any and all cherry picking will be pretty damn evident.


They don't, though.

You have two sets of incomplete data. One from public research, where some of the data is missing, and one from private research, where nearly all of the data is missing.

Which do you base policy on? Doing nothing, by the way, is also making a policy decision.

I will also point out that changing the rules post-facto, like we're doing here, is like asking an open source project to verify that every single contributor to it has consented that the project migrate to a new, incompatible license.



> A spokesman for President-elect Joseph R. Biden declined last week to comment on the expected rule, but activists said they expected him to quickly work to suspend and then repeal it.


"Activists said they expect"; which activists? Why do they have that expectation? Language like this would be flagged in places like wikipedia.


> places like wikipedia

Did you mean NY Times?

I too also hate when this sort of language is used. Typically it's "Twitter thinks" or "Facebook reacts" when the correct terminology should be "These 3 users on Twitter" or "This specific person on Facebook".


I've lost count of how many times I've seen news people claim that a video has "gone viral" only to find out that it has something like 80,000 views.

80k views is nothing to sniff at, but that's hardly "viral".

Not really the same as what you're talking about, but I was reminded because it's another example of lazy use of language by journalists.


The very nature of "gone viral" is based on the concept that the content was created and then spread by a minority of users. Context matters. If your town population is 5K and a video of it's town administrator accepting a bribe gets 80K views that's viral, relative to the townspeople.

I think we're in agreement here; stop lazy journalism.


I joke about this a lot, but if anything is a sign of the state of journalism, its "Twitter is outraged" articles, which consist of a "journalist" finding 2-3 tweets that agree with their stance and have moderate to high engagement, writing up a few paragraphs, and embedding the tweets.


Unfortunately the words of a complete and utter hack can be delivered with the exact same servers, fonts, and layouts as actual journalists.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: