I wouldn’t be surprised if a significant portion of published computational research has bugs that totally invalidate the conclusions. I think we need to push hard to require all taxpayer-funded research to make any code that results in a journal article publicly available.
> We also noticed significant improvements in performance of RND every time we discovered and fixed a bug [...]. Getting such details right was a significant part of achieving high performance even with algorithms conceptually similar to prior work.
They call bugs 'details' which, I find, is a frightening state of mind for someone publishing an algorithm.
It really depends on the algorithm. For example, a bug in the random number generator of a stochastic search algorithm that affects, say, the variance of a distribution won't have a relevant impact on the outcome.
I also had a few do-overs at the end of my thesis, but fortunately had a cluster standing by...
Well, there's also this notion of testing and regression. As I said in another comment a few days ago:
>A few weeks ago I had a conversation with a friend of mine who is wrapping up his PhD. He pointed out that not one of his colleagues is concerned whether anyone can reproduce their work. They use a home grown simulation suite which only they have access to, and is constantly being updated with the worst software practices you can think of. No one in their team believes that the tool will give the same results they did 4 years ago. The troubling part is, no one sees that as being a problem. They got their papers published, and so the SW did its job.
An independent implementation of the experiment is neccessary for a full reproduction anyways. If you just run their code again, you'll end up with all their bugs again.
(But, don't get me wrong. I like when researchers release their code. It's still very useful.)
1. People will not bother. It took a lot of minds to come up with the software used (in my friend's case, several PhD's amount of work). No one is going to invest that much effort to invent their own software libraries to get it to work.
2. Even when you do write your own version of the software, there are a lot of subtleties involved in, say, computational physics. Choices you make (inadvertently) affect the convergence and accuracy. My producing a software that gives different results could mean I had a bug. It could mean they did. It could mean we both did. Until both our codes are in the open, no one can know.
It is very unlikely that you'll have a case of one group's software giving one result and everyone else's giving another. More like everyone else giving different results.
Case in point:
If there's one thing I could convey to the world from what I learned from my time in academia, it is this: Most scientists at universities do not care about reproducibility. Not only that, many people intentionally omit details from papers so that it is hard for rivals to reproduce their work - they want the edge so they can publish without competition. This isn't a shadowy conspiracy theory - this is what advisors openly tell their students. Search around on HN and reddit and you'll see people saying it.
 My experience is in condensed matter physics - it may not apply to all of academia.
I can believe that. I have seen code in academia and it was all one or two-letter variables without even linebreaks between statements.
Ideally the code would be part of the peer review process, but code review is really expensive, so who knows how that would play out.
Yes, but closed source helps ensure that low quality code is hidden from sight. It also means that people who distrust or doubt the conclusions have no chance to identify any bug(s) and disprove the results or conclusions.
We stop publishing in papers, and instead adopt smaller chunks of our work as the core publishing units.
Each figure should be an individually published entity which contains the entire computational pipeline.
Figures are our observations on which we apply logic/philosophy/whatyouwannacallit.
Publishing them alongside their relevant code makes the process transparent, reproducible and individually reviewable,
as it should be.
We can then "publish" comments, observations, conclusions etc on those Figures as a separate thing.
Now the logic of the conclusions can be reviewed separately from the statistics and code of the figure.
As it is, research that yields a "failure" is buried. That means wheels are being reinvented and re-failed. That means there's no opportunity to compare similar "failures", be inspired, and come up with the magic that others overlooked.
Unfortunately, I would imagine, even if you can get researchers to agree to this the lawyers are going to have a shit fit. Imagine Google using an IBM "failure" for something truly innovative.
I agree in principle. But, for the experimental sciences, we need better publication infrastructure to make this practically possible.
For example, consider a figure that summarizes compares, between several groups, the mechanical strain of tensile test specimens for a given load. Strain is measured from digital image correlation of video of the test. Some pain points:
1. There is a few hundred GB of test video underlying the figure. Where should the author put this where it will remain publicly accessible for the useful lifetime of the paper? How long should it remain accessible, anyway? The scientific record is ostensibly permanent, but relying on authors to personally maintain cloud hosting accounts for data distribution will seldom provide more than a couple years' of data availability.
2. Open data hosts that aim for permanent archival of scientific data do exist (e.g., the Open Science Framework), but their infrastructure is a poor match with reproducible practices. I haven't found an open data host that both accepts uploads via git + git annex or git + git LFS and has permissive repository size limits. Often the provided file upload tool can't even handle folders, requiring all files to be uploaded individually. Publishing open data usually requires reorganizing it to according to the data host's worldview or publishing a subset of the data, which breaks the existing computational analysis pipeline.
3. Proprietary software was used in the analysis pipeline. The particular version of the software that was used is no longer sold. It's unclear how someone without the software license would reproduce the analysis.
Finally, there's the issue of computational literacy of scientists. In most cases, the "computational pipeline" is a grad student clicking through a GUI a couple hundred times, and occasionally copying the results into an MS Office document for publication. No version control. Generally, an interactive analysis session cannot be stored and reproduced later. How do we change this? Can we make version control (including of large binary files) user-friendly enough that non-programmers will use it? And make it easy to update Word / PowerPoint documents from the data analysis pipeline instead of relying on copy & paste?
If any of these pain points are in fact solved and my information is out of date, I would be thrilled to hear it.
3: analysis that uses propriatory is marked appropriately as second class
> computational literacy of scientists
research just jumped onto jupyter notebooks, it's halfway there, someone helps the remaining step
Science should prove things...
Logical proofs will never happen for software development, but surely standards for scientific programming can be tightened up a few levels!
I think I heard of some reform proposals from the Reproducibility Crisis reformers.
There's also the fact that we have had a pretty solid grasp about the chemical reactions of greenhouse gases since long before computers, both theoretically and empirically, and we know roughly how much is put out in the atmosphere.
Where the models diverge is on far finer points than what is needed to make the basic policy changes that seems to be where we are stuck right now.
Entertaining badpun’s skepticism, it is not necessary they have identical bugs, only that their bugs yield similarly biased results.
For example, if a significant number of bugs are identified by their affect on the results, then bugs contributing to “wrong” results might be more likely to be identified and fixed.
 In the Aristotelian sense.
If global warming is actually wrong, it's most likely because of bad/corrupt data in the datasets used.
Since it was sound science in 2009, why would it not be now?
this will be the least of our problems. we understand very little about nature to have any reasonable prediction model. we don't know the inflection point which will cause massive collapse of major ecosystems we depend on.
all we know is that things are changing fast. faster than many non-human organisms are able to adapt.
Are you talking about predictions (and not measurements, you don't need a model to measure temperature)? Assuming you are, there's unfortunately a huge problem with modeling (and heavy math and stats-based science in general), in that researchers tend to stop looking for bugs in the model when it returns the results that they expect. In other words, if a bug in the model tells the researcher that Earth's temperature will decrease by 4 C by 2100, he will look over the model until he finds the bug, but if the model tells him that the temperature will increase by 2 C, thus confirming his inner bias, he'll declare it correct and move on to writing a paper based on the "finding".
Alternatively, as a thought experiment, imagine if math research were done in the way climate science is done. We would have proofs that are millions of pages long and were never verified by anyone. We would trust in them only because the author says that they are correct. Is this science?
>>Researchers tend to stop looking for bugs in the model when it returns the results that they expect.
Again... they are _ALL_ biaised ?
Also, how many independent, comprehensive models (with codebases) for global climate change are there in the science world?
False. Lots of past climate changes were due to changes on Earth and its atmosphere (and sometimes, specifically, life on Earth), not changes in solar output (e.g., notably, the Huronian glaciation believed to have resulted from the Great Oxygenation Event, which resulted from the exponential growth of photosynthetic life.)
That the post-industrial rise in atmospheric CO2 is of anthropogenic origin is hopefully not a point of dispute, but it is demonstrable if necessary. Thus it remains to show that this must raise the equilibrium temperature. So, as you say, CO2 selectively absorbs outgoing IR. In the lower atmosphere, this actually does not have as much of an effect as you might think. Water vapor blocks quite a bit of the absorption spectrum, and the effect of CO2 is more-or-less saturated already.
The mean free path of an outgoing IR photon in the lower atmosphere is quite short. Absorbed photons are re-emitted in a random direction, but take an overall upward course, the mean free path rising with altitude. At the (radiative) top-of-atmosphere, the mean free path is infinite: the photon is more likely to leave Earth. At the edge of space, there is essentially no water vapor, so the action of CO2 is greater. The effect of increasing the amount of CO2 in the atmosphere is to push the CO2-dense region of the atmosphere further out into space. Photons must take a longer path out of the atmosphere, and this must raise the overall temperature of the Earth proportionally, specifically by 3.7 W/m^2 per doubling of CO2, which is commonly held to be equivalent to 1 degree C of global temperature. This must be the case unless our understanding of thermodynamics is very wrong (and if you have an issue with thermodynamics then you have some pretty serious issues).
So, one degree C ain't so bad, right? Well, it wouldn't be if that were it. However, there are several problematic feedbacks. One is that melting a lot of ice lowers the Earth's albedo, which causes it to absorb more heat. Another issue is that there is a lot of this "water" stuff around, which is very readily absorbed by the atmosphere, in a manner that increases very sharply with temperature. Water vapor is a much better greenhouse gas than CO2 by all accounts.
Climate science is not an extrapolation from the temperature record. There is a solid minimum bound on the temperature effects of doubling atmospheric CO2, and a variety of amplifying positive feedback effects. So far, in the last twelve decades, we have not managed to find anything which would reduce those effects to something manageable. At this point, the effect would need to be both very large, in order to offset the strong H2O feedbacks, and very small, to not have been noticed. The most plausible option would be "something poorly understood about the H2O feedbacks". I believe the most successful of such theories would be Dr. Richard Lindzen's Iris Hypothesis, which has generally failed to find support. At this point, there are no particularly plausible mechanisms which would transfer this extra energy to space, and if those did exist, then they would not necessarily be a non-issue: even if thermodynamics and optics are entirely wrong, the planet is warming, and we will have to deal with that even if it can't be prevented.
If you have any other questions, or would like citations for any of the above, do feel free to ask.
Interestingly, the original paper proposing AGW (in 1896) was actually intended to explain Ice Ages:
There could be lots of reasons for that. Anyway, the temperatures were not always rising, clear temperatures rise was observed in 1930-40s and in 1980-90s. Cooling in 1960-1970s. And yes, prediction has to be precise. If a model predicts rise of 3K in 100years, and you measure 1.3K - your model is wrong. It's even more wrong if you don't take into account any of the natural cycles, even if prediction is accidentally correct.
I think at this point the group think is so strong that if anyone came out with evidence against global warming they would commit career suicide by publishing it
Oh good. So you'd rather see millions of people be displaced. We're fucked as a species when "I'm a contrarian" is a valid reason to disagree.
> they would commit career suicide by publishing it
Meanwhile some researches create their career by publishing this sort of stuff, even when it isn't well researched
The bugs, if serious enough, can make it not scientific, in the same way that a paper that has grave errors in it is wrong and thus obviously not scientfic. So, before we do a thorough review of the code, we should treat it with due scepticism.
EDIT: this can also be expressed in terms of risk analysis. For typical software, the consequence of a bug is low - most software is commercial, so the impact of the bug will be limited to that company's bottom line (with exceptions of software that can kill, but these people already are serious about bugs). Also, most bugs in most software are either highly evident (the button does not work, you get random segfaults etc.) or have limited impact.
On the other hand, the bugs in climate change models, given their "pipeline" nature (wrong result from one module is propagated downstream all the way to the final prediction of expected temperature change), can quite often have severe impact. They can also be not evident at all - they can for example change the final outcome predicition by 1 C.
Compound that with the fact these predictions are used to make trillion dollar decisions on global policies, and you can see that the actual damage done by bugs is not unlikely to be in the trillions. And that's why I say it's probably wise to subject the models to extreme scrutiny.
"Already 18824 SIGNATURES – sign the open letter now!"
I don't see how a research scientist could say this; something seems culturally wrong there. Nature is a pretty serious publication. You spent months on it, maybe someone else will. How can it not be worth your time to 'try to correct the issue or write to the journal'?
-I don't think you realize the step it takes to actually write to Nature and say: "Excuse me, that paper you published is wrong and you should retract it." You're liable to alienate all of the authors as well as piss off the editorial board for pointing out that they let a mistake slip through. You should be prepared to face vigorous backlash, and be completely confident in your own results if you don't want the repercussions to overwhelm you. Many communities related to specific fields are small and niche, so making enemies with one team often means that you've actually cut yourself off a good part of that community. You should be prepared for awkward moments in conferences when you meet each other, incendiary questions when giving talks and scathing "anonymous" peer-reviews for your future submitted work. It's far easier to assume good faith, give the authors the benefit of the doubt ("yeah the implementation is buggy but what software isn't? The main idea is probably sound") and move on.
-I don't know about other fields, but in mine, there's an unofficial accepted consensus on a set of very high-profile papers that happen to be either complete bunk or utterly useless. Again, we're talking about Science, Nature, etc. These papers made a lot of noise when they came out years ago, and now no one in their right mind would base their work on them. Again, you can't just knock on Science's editorial board and go "Excuse me, that hyped paper from a very big shot is useless and you shouldn't have published it", so it's just something people in the community know and whisper among themselves. I guess it may seem strange to outsiders though.
Of course it seems strange. The output of the scientific/academic community is often presented to people as being a source of truth and scientists do little to dissuade this. To learn that it's all rotten to the core is upsetting.
What I try to do when reading academic papers is, if there are multiple on the same topic, read them and try to understand the differences. If different papers come to the same conclusion, that should make you a lot more confident in the result. If there aren't many papers on the subject then you just need to understand that this is kind of a "best effort" thing that is more likely to be true than a random guess, but not certainly accurate.
Different labs also do different things - mine is currently reworking a paper that was basically done because something we did might matter, and my students have developing tests as part of their workload for developing code.
Unfortunately, you can't build a career out of doing QA for science. Nobody will fund you.
It's all about trust, really. Many people choose to trust state-of-the-art medical research when they do vaccinations on their kids; some do not. You can indeed choose not to trust us as a 'source of truth', and it seems a fair chunk of the US political landscape is being led down that path; we'll see where it goes.
The core principles of science are in conflict with natural human behavior (which is one reason it took so long to invent science), so saying “they’re only human” is really no excuse.
And believe me I hate being defeatist, it made me a lot angrier when I was younger and I consider it to be one of the things that stopped me trying to go down the academic route so far.
While coders (who aren't just being boosters for particular technologies) have to live with the fact that the world is run on mountains of bad code, scientists of anyone engaging with academia or journal papers has to sleep at night realising how much bad science is being done and how much the publication process is skewed to publishing and politics over quality or fact checking.
I can no more fix it than I can fix all software bugs or politics in big corporations...
I think you're making it to be a bigger deal than it actually is. Most of these things are just noise. We just ignore it. If someone rises to prominence with outrageously fake results, rest assured that they will get shot down quickly by concurrents (all the incentive to publish fake results lies within highly competitive fields). If you know what you're doing, and work in a lab where people know what they're doing too, you can still do pretty good science. I would guess than in all communities, academic or not, there's a nonzero amount of bogus, over-hype, nepotism, politics and what not, and a subset of people who actually get things done and make the whole field advance. After a while, the test of time truly determines what was actually useful from what is bunk. That's how it's always been.
> people in the community know and whisper among themselves
If no one else, tenured academics should be calling out bunk in their fields, or are failing, IMO.
Scientific papers are made public for a reason. The correct information should not be accessible only to a small circle of people in the know. If incorrect papers are out there, those who can correct them have a duty to do so.
Often it's not so much false as empty or useless. Nevertheless, since all people in a community know the values of individual papers, and these are the same people 'making progress' in that field, I don't see how science is so much harmed. I agree the situation is kinda ridiculous, but it's not a "sky is falling" situation either.
>If incorrect papers are out there, those who can correct them have a duty to do so.
Sure. How much are you willing to pay me for me to prioritize this 'duty' over the literal hundreds of duties I already have at my lab?
Better rewards for uncovering bad science keeps it from devolving into a political contest. If you're not seeing the right incentives in place, you're either missing something or you're seeing an opportunity to capture some value by implementing those incentives.
No. If you increase rewards for uncovering bad science, it will just lead to false accusations.
> If you're not seeing the right incentives in place
Some people think that the very existence of incentives is the problem here.
Isn't that what this study suggests is already happening, in as much as false accusations are just more bad science?
> Some people think that the very existence of incentives is the problem here. https://hbr.org/1993/09/why-incentive-plans-cannot-work
This article is full of opaque generalizations about human behavior that more accurately describes the baser impulses of individuals attempting to game systems they don't understand (participants in psychological experiments) and don't so much reflect the actions of self-conscious professionals with any semblance of dedication to their fields.
It's Saturday morning (and I got students' papers to mark) :)
I can tell you haven't spent time as a grad student in a top university in the US. There is almost no incentive for a research scientist to pursue this. They're very busy and stressed out, and this will not help them in any way. Your argument that one person wasted a lot of time, and so others will be spared that pointless effort is a sound one, but you have to realize that in some (sub)disciplines, scientists view a good bulk of the research as problematic and a waste of time. Why go through the trouble for this particular case?
Also, I doubt a simple email to Nature will change much. There would likely be a somewhat lengthy process, which will suck more of your time. And to be brutally honest, the chances are higher that the code did produce those results, but the grad students/post docs have been modifying the code for their next batch of research. Even something as basic as version control is unheard of in much of scientific research.
Definitely a cultural problem, as you describe it.
I'm sorry, what? It almost sounds like everyone in your lab is scum. Hopefully YOU at least spent the 60 seconds to write an email?
Even if you do manage to get a tenure track job, you pretty much have to keep your head down for 7 years in order to secure your position.
And once you have tenure, you still get attacked vociferously. Look at what happened when Andrew Gelman rightly pointed out that Susan Fiske (and other social psychologists) have been abusing statistics for years. Rather than a hearty "congratulations", he was called a "methodological terrorist" and a great humdrum came about .
When framed against these circumstances, it should be evident that there is literally nothing to gain and everything to lose from sending out a short e-mail pointing out that someone's model doesn't work.
I really believe we need a better way. Privately funded / bootstrapped OPEN research comes to mind as a potential solution to bring some healthy competition to this potentially corrupt system. Mathematicians are starting to do this, I think computational researchers have the potential to be next.
The question is, would additional incentives promote good behavior or just lead to more measurement dysfunction. Some people think that just giving the "right" incentives is needed, but actual research shows otherwise.
There is near infinite evidence to the contrary. That said, constructing a system with "the right incentives" can of course be devilishly hard or even impossible.
I don't think there is any doubt that humans follow incentives.
But working out what the core incentive problems are, and actually changing them might be both (1) intellectually difficult, and (2) challenge some sacred beliefs and strong power structures, thus making it practically impossible.
With regard to the issue of grad students being unwilling to come forward and report mistakes, incentives wouldn't be added, but rather positive punishment  would be removed, which would then allow rewards for intrinsically motivated  actions.
Any change in precision or numerical methods that affects results surely must be well within the error margins.
Containerisation is fairly mature and simple to use. Many in other fields struggle with these exact same issues and are able to create reproducible environments just fine.
I find it amazing that those publishing don't include their implementation, all that work locked away on a rusty hdd.
Hell, I'm pretty sure in my department, the first response would be "What's that?"
I'm thinking of making a VCS that simply runs in the background and
- Automatically records every file save (effectively a git commit without a message)
- Allows adding messages through tagging (like git tag)
- Handles 'branching' just by asking you make a copy of the directory with a different name, properly understands how to diff/merge/etc copied files/directories that have since diverged.
- Has built in support for large files
Candidly, I probably would have told you to present it as is, and add a caveat to the last slide that this was work in progress (most internal presentations are assumed to be so) and you're still chasing down some problems in your code. The reason for this is two-fold:
1) It's the night before. Many students I have known and mentored aren't at the point in their career that they can "wing" a major presentation. It would be setting them up to fail in a way I couldn't shield them from (I can deal with changing results, but a bad presentation is largely on the student).
2) The quality of your checking is likely to be poor the night before. There's a number of times I've found an error as something was being prepped for presentation/poster printing/etc. and been convinced it changed everything, only to discover after 48 hours of thought and more checking that the difference in results was pretty negligible - especially in the sense of the qualitative take aways from a presentation.
This, of course, may not have been your exchange with your PI. But I thought it was worthwhile that there are reasons not to have you change everything the night before that aren't the result of villainous fraud.
I was told to reverse my conclusions - refused - and, hey, I no longer work in bioinformatics.
Or am I mistaken in reading "present" as "stand up in front of a crowd of people"?
I'm not saying it should have been preventable. It just looks like there may be opportunity to improve practices.
Admittedly I'm completely naive to the domain. Are there no forms of validation checkpoints you can reach where your foundations are rock solid and well backed with tests and such?
I had a Prof in grad school who lost years of cryospheric research data because an external hard drive was stolen. This was in 2010. It was a head scratcher, especially being faculty-adjacent to some of the best CS and engineering faculties in Canada.
That stuff just isn't thought at universities and it's assumed, like oh so many things, that you take it up along the way.
If you have been working 18 months on a topic you will have substantial knowledge you can communicate to your peers and often some work inspires other work even if they are not using your findings.
-> http://www.biomath.gatech.edu/people/ ->
Jake Boggan (Advisor: Bunimovich)
Jake ask the mods if you can remove your post.
In the bigger picture, it drives me crazy that these results aren't welcomed. We learn from failures too. Funding sources should recognize this.
First of all, a lot of people writing it are inexperienced in SW best practices. Then, turning formulas into code is hard, not to mention some article "handwavings" like showing pseudocode that's almost ok except for "do hard thing in this line" and that line expands into a lot of code.
I've also had some weird bugs in code like that (but nothing that would invalidate 18mo of results - btw did you know RAND_MAX is as low as 64k in different compilers, like some older versions of VS?)
(And computation times, though the cloud helps a lot with this)
People, including professors, understandably react poorly to other people trying to end their careers. It's important to recognize that something like that is coming out swinging.
Also, in my experience, there's a tendency with many graduate students to conflate "unethical" with "I don't like this". Not saying that's true in this case, but an "incentive to root out professors" is likely going to result in some pretty strong undesirable outcomes as well.
Ironic given that this is the primary power of being a thesis advisor.
It's also something the project officers on my grants watch.
It also means I lose whatever I've invested in that student.
One does not idly destroy their graduate students, regardless of what HN occasionally thinks.
There is little incentive to root out professors for any reason. The process of becoming a professor (grad school -> postdoc (N times) -> tenure track faculty -> tenured professor) is generally believed (by tenured professors, of course) to root out anyone unworthy of the position. You can believe what you want about the efficiency of such a process.
Public shame requires public understanding of scientific (mal)practice, so, good luck communicating that. Most of the time, the bad actors in question have already gotten papers past referees; what makes you think the public is capable of more thorough review?
Fraud is considered a serious allegation and as a result accusing someone of it requires going through a thorough process involving a host of university administrators, whose incentives are aligned with the profit motives of the university system.
Transferring graduate schools is essentially impossible, and even in the exceptionally rare circumstances that it happens, it always involves burnt bridges and often has to do with bigger fish (i.e. your advisor being offered a position elsewhere, and you're lucky enough they take you with them.) Without external funding to support you, you are usually replaceable. All graduate departments receive applications far in excess of the number of students they can support. They certainly will not consider taking on another from a school at which you've proven to be a problem. Academia has already established a quite successful leaky pipeline; the beginning (graduate school) is no different.
In academia, hierarchy is the rule, flat organizations the exception. You must purchase your influence, usually at significant cost (and luck is a significant component). As an undergraduate, the system is designed to cater to your interests; as a graduate student, you cater to the university's interests. Scientific integrity is a noble notion, and in some corners of academia, it survives, but it does so in spite of bad actors who thrive in a system designed to produce ten times the number of qualified applicants for each job, all of whom are judged according to easily gamed metrics. It would be nice if things weren't this way.
But the problem is, typically... if you decide to get a PhD in science, it is possible that you're already too obsessed with the subject to ever, truly, give it up, especially if it's "just" over working conditions. I can't speak for everyone, but most people leave because they were forced to.
Like Gresham's law bad science drives out good, because it is much faster to do bad science than good. Those that try to maintain quality can't pump out as many papers as the bad and so lose on the grant treadmill.
Any solution that does not address the incentives is doomed to fail. Not fixing this problem will kill science.
About the only point you can actually slow down enough to care about quality is at the emeritus stage, but even then if you don't produce your lab will be moved into a broom cupboard.
If you cheat, you get "high ceilings" and a "marble fireplace" on the "upper east side":
The destruction of the reputation of science is probably the most dangerous thing going on in our society right now. If we lose science we have nothing.
We surely don't know the answer yet, but pre-registration of experiments/methods to get a guaranteed publication is one quite interesting.
But there are others, and I applaud any initiative in these areas, because it's at it core about the hardest problem of all:
How do we overcome bias in a large system under a lot of financial pressure?
It's such a worthwhile issue to pursue, but also usually quite thankless, so anyone fighting for it is doing something great even if it's in seemingly small ways!
What kind of failures should we celebrate? There are two kinds that I can see: a failure to produce a result because of a lack of skill or knowledge and a failure to produce a result because of a demonstrable or provable reason. The last one is actually a success because that kind of failure produces new knowledge. But we have a culture in far too many fields where this is seen as not publishable.
He eventually succeeded.
I almost think we should have a magic card you get to play once in your career that resets your track record. This way you could take risk without totally destroying your career.
Just like school teachers in America, PhD students are taken advantage of for the exact same reasons
People want the jobs, and America doesn't believe in regulating supply and demand
Several times when I've worked on a project at work that involves data analysis, I get really impatient responses when I don't come to a firm conclusion. We're not even talking about highly charged subjects. I'm not aware if my stakeholders are biased toward one conclusion or another -- I think they're just very upset that my analysis contains uncertainty. They expect me to be able to massage any kind of data to derive clear and obvious facts.
Thankfully, I can refuse this without much consequence, but it's really opened my eyes to the potential pressures to corrupt the integrity of fact-finding in even mundane circumstances.
Not to be rude, but if that's in your job description, then I can understand where they are coming from. As someone in IT, it's always my fault that the computer is not working even if the ten year old device simply broke. That doesn't mean they are trying to get me to do an imprudent analysis.
Sure, there is a difference between operations and trying to find correct answers to questions (i.e. science), but they do have money to make, and if they're not (even slightly) asking you to bias your work, I can understand if they put a bit of pressure on getting the results clear.
Edit: As u/jeremyjh points out, I completely misread this post. If it's about uncertainty due to not enough data, then please completely disregard what I said. (I can't delete the post anymore because there exists a reply to it.)
There's a lot of people who would think the whole point of data scientists is to look at the numbers and say, with certainty, what they see. But sometimes the numbers don't say anything, and that's what OP was talking about I think - sometimes a clear correlation just doesn't exist, at which point the person who has hired you probably thinks you're not looking hard enough, and might encourage you to 'clean the data' until you get a result with high confidence.
I often get that with business partners. "The data says <likely X, but with caveats/nuance/uncertainty/under certain assumptions we can't justify>" to which they respond "Can we just say X?" Or "can we get numbers on Y to support a presentation on Z?" when Y seems to support Z, but actually you can't draw that connection, so it's misleading.
Stuff like this happens because people treat extra rigor as pedantry and are comfortable making supporting assumptions that aren't supported by data. The people making fraudulent requests aren't aware that they're fraudulent (usually). In my experience, they just think they're being practical.
"Picking battles" is one way to describe my counterpoint: caveat exactly as much as needed so that a proper decision can be made with the risks involved.
If you want to query your whole team for a joint lunch location but coworker X is out, it is still appropriate to say (assuming more than you and X on team) that you asked the team and you decided on lunch place Y. It's not rigorous (X is left out) but it's still accurate.
This is very different from, say, regulatory or securities reporting where ambiguity is not appropriate.
My friend was half way over her 5 years vesting with a startup as when the CFO asked her to help then improve their numbers due a foreseen investment round. The idea was basically bump revenue basing it solely on the GMV masking returns and not calculating discounts and shipping. They also wanted to hide running costs by forcing vendors to agree on a 90+ days billing cycle, they also pushed the CTO on turning off parts of the system during weekends and holidays and forbid PTO until the deal was closed.
She refused doing the number masking and was asked to leave.
And, of course, if the consultant didn't play ball, the insurance company could always consult with a different actuarial firm.
Humanity has built up an enormous amount of legitimate scientific knowledge and understanding. There has been much difficulty, confusion, and dead-ends along the way -- and it seems this continues to be the case. But it really is an incredible thing, how much we know at this point. It took me a long time and a lot of self-learning to develop a rich appreciation for this.
Of course, there is still much for us to figure out. And we should keep doing so, difficulties be damned (and hopefully mitigated over time).
One of many times, stats made someone look bad, and they made me change them.
Most the time, support metrics are altered to make it look like support contracts are hitting everything contractual.
This is daily business everywhere in tech. Mostly fudging to downright lying, just depends on the importance of the data if some mone is tied to it.
I don't approve of this shit, but lucky, I never been asked to commit fraud, like reporting sales that don't exist...
Luckily our board was pretty understanding the reduction in infrastructure strain took a nice percentage out of operating costs.
1) Did the statisticians refuse or comply with the request?
2) When they refused, how did the requester react? Were the requesters actually malicious, or just bad at statistics? If everything was fine once the statisticians explained that removing "just an outlier" wasn't a valid option, then this report isn't quite as concerning, and is maybe just an indication that more researchers need to hire statisticians to help them out.
I got out of academia and stopped trusting most university research because I observed too much of this culture of fraud.
1 in 4 Statisticians weren't asked to commit scientific fraud. Within the article:
"Researchers often make "inappropriate requests" to statisticians. And by "inappropriate," the authors aren't referring to accidental requests for incorrect statistical analyses; instead, they're referring to requests for unscrupulous data manipulation or even fraud."
This isn't even remotely close to what the title of the article claims.
I kind of have a mental box where I squirrel away little tidbits that meet two criteria: 1. They seem to come from rigorous studies and 2. They also fit with my general understanding of how life works.
Over time, I mentally group things -- a la A and B seem related -- without assuming that I know how they relate. I seem to have an inordinately high tolerance for ambiguity.
Most people seem to need An Answer even if it's wrong. They have two categories -- black and white -- and when presented with purple or pink or blue, they force fit it to one of their existing categories and don't confuse them with the facts.
I wasn't trying to prove anything to anyone. I was just trying to deal with my life. But having gotten substantially healthier, I now wonder how to talk about such. It seems a wasted opportunity for the world for me to not share, but the world has treated me pretty horribly and done all in its power to tell me to STFU, I'm just crazy and spouting nonsense.
So I sometimes think I should write what I understand to be true and then carefully back it up with citations to try to support it. Then I read articles like this, throw my hands in the air and go "Why bother?"
The way I have been treated seems particularly unfair when you learn how much "real scientists" cook the books. Like why? It feels like pure prejudice when I read things like this.
In fact there is a 25% chance that the person involved in working that out has been asked to commit fraud at some point.
But there's only a 10% chance of that.
The bottom line is that there's not much incentive for doing the boring statistical validations (I can tell you that no one likes doing statistics apart from statisticians, and not even all of them) and verifying that everything is reproductible, and a huge incentive for, let's say, 'arrange' a thing or two so that the paper looks better. So many people just kinda do it, and it slips through the cracks because:
-In many fields, the reviewers are not statisticians themselves
-No one really bothers to download the data, setup the environment and libraries, and reproduce the exact steps taken to obtain the very same figures in a paper. Which is understandable given how doing all of that can be such a chore. Anyway, most of the time the exact steps aren't even described. No, jupyter notebooks aren't a solution either (it would take too long to explain why and the post is already long).
-In many cases, the results turn out to be true anyway so people don't notice they were initially put forward with fraudulent validations
-When they turn out to be false, people just shrug and move on with what's actually true. Bogus results often fail to stand the test of time and get forgotten quickly despite initial hype. No one bothers to say: "Hey, that paper from 8 years ago is bullshit and their authors are hacks!" because nobody cares.
-There's a huge psychological barrier to actually call out one of your peers and affirm that they're an impostor and their work is bogus. Especially when said impostor happens to be a big name in the field, that part of you still doesn't believe they would commit fraud, not to mention the social repercussions and backlash of doing such an accusation. We scientists aren't a very confrontational bunch.
So most of the time it kinda works and we're all bumbling along hoping to find some modicum of truth at the end of the road with minimal harm done. But of course sometimes you get these guys who kick off their entire career on a high profile, much hyped fabricated result (Woo-Suk) or even a series of bogus papers (Sato), and that may lead to long-term harm. The good news is that hyped papers or very prominent figures quickly attract scrutiny from their peers, and sooner or later reality catches up with them as labs around the world fail to replicate their 'breakthroughs'.
All in all, I'd say we're doing fine. We're just not the ethereal source of truth that some people hold us to be, the very same people who, after claming that 'God is dead', are very quick to replace Him with His sillicon-based equivalent around which we would act like priests, except with lab coats in lieu of clerical garments.
On top of that, many steps necessary to reproduce a pipeline typically need to load enormous datasets. Terabytes of simulated protein structures, hundreds of gigabytes of sequencing reads, phylogenetic trees, alignment files, what have you. Once you somehow acquire that dataset, you need the appropriate tools, many of which need to be specifically compiled for your platform, then run them onto a powerful machine if you don't want the pipeline to take months to complete, etc. (And that's if you didn't use any proprietary software or any GUI based application with no command line interface.) You can't exactly load all of that into Github+Mybinder and call it a day. You can't ask a community of people that doesn't like coding to learn about Docker containers either.
Nevertheless, we (at our lab) do use notebooks when we can because we know they're fashionable. We can only present parts of our results, though, due to the aforementioned constraints, but it's still pretty looking and people like them, so we write short demos using them.
This is a pervasive problem. I was asked to comment on a review paper on multiple testing corrections for a biology journal. The paper was so bad, so completely misunderstood the technical underpinnings of multiple testing corrections, that I sent the recommendation that it be rejected and that they not even try to do revisions. The authors were not competent to write such a paper. At the same time, they were probably considered the experts in their corner of the scientific world.
> -There's a huge psychological barrier to actually call out one of your peers and affirm that they're an impostor and their work is bogus.
I think it's less a psychological barrier than fear of retribution and there being no effect when someone does call them out.
I always knew I could walk out the door of my university and get a software job making twice as much as my advisor, which rather messed up the power imbalance, and I pissed off a lot of people by speaking up. It had no effect. Lazlo Barabasi is still publishing his nonsense of about scale free networks. Kim Lewis is still churning out papers on toxin-antitoxin complexes in antibiotic persistence, when they are irrelevant to any case of persistence in nature. There are many more names to name, but I've largely forgotten them at this point. The only reason to remember them is to have good counterexamples of how to do science, and most of them are repetitive.
This sounds depressing. I assume your recommendation was promptly ignored and the paper was published anyway?
It would be hilarious if this report were debunked on the grounds that they inflated the number of statisticians surveyed
/haha, only serious.
In other words, the incentive to just go-along if one has evidence to the contrary is low relative to the opportunity cost.
In even less words: money can be made if strong counter evidence to these topics can be verified.
Conclusion: its unlikely that these examples are really areas of hidden wisdom that runs counter to prevailing understanding.
On the other hand, 1/4 is a large number, we'd have to break it down by discipline.