Hacker News new | past | comments | ask | show | jobs | submit login
What to do when you don’t trust your data anymore (ucdavis.edu)
89 points by di4na 8 months ago | hide | past | favorite | 20 comments



I've had a similar case to this problem, not in research but with production data for a company. Decisions were made with wrong data and therefore, bad decisions. It's really an agony and hard to swallow. But the best thing to do is, as you did, face it and be honest with yourself and all the others and admit the error.

I think this is part of being a data scientist.


It's amazing how common it is to see something that doesn't make sense in the data and instead of deciding to investigate, they just move on. I'm guilty too sometimes. But almost every time I dig, I find something that invalidates it. Maybe an ETL somewhere is putting an empty string instead of NULL in particular cases. Maybe my join is duplicating certain rows. Maybe you got a table (or spreadsheet) that doesn't make sense. It all violates the assumptions of an analysis, but collectively we just say "can't eliminate all bugs." I hope the future is better.


Some places, they do QA, and when they find something in a random sample, they fix that instance and continue, rather than looking for all the places it happened and why.


> “Why Sheet 2 exists is an interesting question,” was Jonathan’s response and he agreed that “it is well that the paper is being retracted” when I asked him directly about this and that perfectly sums up my feelings as well. ... While I’m not sure what Sheet 2 means or why it exists; I do know that the data in this paper also suffer from inexplicable irregularities rendering any results untrustworthy.

> Given the problems in my data sets, these folks are proactively investigating data that they received from Jonathan ...

It's great that she is sticking solely to the facts and being very scientific. But I'd love to read a journalist's in-depth investigation into Johnathan's motivations. Everything points to this being a deliberate, dishonest fabrication of data.


Some additional details about these and related retractions from Pruitt's work can be found here:

https://retractionwatch.com/2020/01/29/authors-questioning-p...


Glad she came clean, and glad the retractions are coming in.

And now she has a tenure track position at a UC.

While the scientific record might be corrected, the historic impact on a cohort of people who got less because of this remains unacknowledged and uncorrected.

Walk it all back.


Did you read the article? Sounds like she didn't do anything wrong, but was sent falsified data by a more senior scientist that she trusted. After discovering that, she took steps to correct the record, alert other scientists that they need to double check their papers, and build automated tools to catch similar issues going forward. That sounds appropriate, and high in integrity, why should she be punished for doing the right thing?


That sounds reasonable as far as it goes, but if you were the person next in line for that UC Davis position and your research wasn't based on falsified data, I think you'd be feeling pretty unhappy about this.

(I hope that in reality there's a lot more to the author's research than the retracted papers, but of course in such a competitive job market, every bit helps.)

Look at it another way: the author sure was lucky they found out about the problem after they were securely in their tenure-track position, and not just before.


> Look at it another way: the author sure was lucky they found out about the problem after they were securely in their tenure-track position, and not just before.

Look at it another way: The author was sure unlucky to have based their research on shoddy data from a trusted colleague. And it took guts and integrity to react in the way they did.


That is also true.


They have proven their integrity. That seems worth a bit more than being right on these spiders‘ specific kinks.

Not because I don’t care about spiders, but because the first is entirely within their control, while some hypothesis finding the data to support it comes down to luck far too often to use a single case a meaningful measure of an individual.


> That sounds reasonable as far as it goes, but if you were the person next in line for that UC Davis position and your research wasn't based on falsified data, I think you'd be feeling pretty unhappy about this.

Her technique was presumably (not my field) otherwise quite good and she didn't know at the time the source data were bad. Apart from her willingness to follow up to the query on an old paper, her approach to followup was excellent. And she seems to have learnt from the experience.

All in all this sounds like what you want from a good scientist. After all once you have tenure, you can just ignore all that "old stuff" if you are so inclined.

As far as not getting the position: there are more people than tenure track positions these days so "luck of the draw" is also pretty significant.


Don't make "did you read the article" style comments. It's rude and against the HN guidelines. Even if you think someone has such a bad take on an article that it stretches your credulity to believe that they actually read the article, there is probably something more productive you can say instead.

https://news.ycombinator.com/newsguidelines.html


To be clear, I'm more angry at Jonathan N. Pruitt, the fraud.

Looks like he hasn't retracted these papers from his Pubs list: https://labs.eemb.ucsb.edu/pruitt/jonathan/publications

But still, there is no fucking way that you get three papers into your research and then figure this out. Having done an MS, PHD, and postdoc in evolutionary biology, including sociality in insects, I can attest that one scrutinizes every (insignificant) data point.

And fuck off Did I read the article.

If you see fraud and don't say fraud, you are a fraud. ~Taleb (I think)


Agreed, this reads like a pretty serious indictment of Pruitt. It sounds like he had no spine when they went back with questions about the data he provided. There didn't seem to be pushback from him about why his data was correct, simply "it's good that you're retracting". Reading between the lines it seems like the author didn't get this answer easily.

Also curious, how did no one question the data earlier when some guy, albeit respected, sends you a data file and you write several papers on the matter? No one knew what sheet #2 was and we're writing scientific papers based on this excel file? I think we need to revisit correct data hygiene and reasonable suspicion.


For some reason, I can't access Pruitt's papers anymore. That link just redirects me to the main UCSB site. But here are some of his papers in Google Scholar: https://scholar.google.com/citations?user=ryyIEucAAAAJ&hl=en


To be honest, I'd much more happily hire soneone with the guts to admit their (or their co-authors) mistakes rather than someone who seems and claims having done everything 100% correct during their previous career.

At least according to my experience as a researcher in geosciences, I'd be ready to claim that borderline fraud (i.e. being too sloppy about the data, even if not with explicit bad intentions) is waaay more common than generally thought or admitted. She at least had the guts to do something about it.

This is also a great story about hiw important it is to fund research that tries to verifie previous research. That is virtually non-existent in the current academic world.


If anything this proves that UC made the right choice. People of integrity should be in positions of real research. Walking back the research in this one area doesn't invalidate the entire academic history of this person.


> historic impact on a cohort of people who got less because of this remains unacknowledged and uncorrected.

On whom? It's a paper about the sociology of spiders, it's not a Reinhart-Rogoff scale disaster.


If your criticism of this position is "who cares, it's just spiders" then you're essentially saying it's not a big deal because it doesn't involve something you give a shit about, never mind that it's the principle -- not the subject matter -- that is being argued here.

What's the name of this logical fallacy. Ad idiotim?




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: