Hacker News new | past | comments | ask | show | jobs | submit login
Faked Crystallography (science.org)
91 points by mhb on July 31, 2022 | hide | past | favorite | 64 comments



While pursuing my Ph.D. in chemistry, I was asked to peer review an inorganic chemistry publication in which the author attempted to fake crystallography data. In this case the telltale sign was that they left the hydrogen atoms in the structure - X-ray diffraction is unable to resolve the hydrogens, you need a more sophisticated technique like neutron diffraction for that.

What it turned out they had done is optimize a predicted structure of their molecule using density functional theory (DFT), then draw it like a crystal structure would be displayed (a 3D plot with error ellipsoids at each atom, with probable bonds shown) and tried to pass it off as a true crystal structure. My advisor missed this when he reviewed it first - if you're reading quickly, you just see a crystal structure and accept it for what it is. So I wonder how many like this slip through the cracks every year. Apparently a lot, if the CCDC identified ca. 1000 of them.


Does this not immediately destroy one's scientific career and invalidate all past papers? It seems like it should.

Maybe there should be a bounty on uncovering these?

This shit has to stop. If publish or perish is going to continue to be a thing, maybe just switch to publishing negative results. It's not sexy, but it's valuable too.


>This shit has to stop. If publish or perish is going to continue to be a thing, maybe just switch to publishing negative results. It's not sexy, but it's valuable too.

I had an argument recently about why fundamental research was important. The other person insisted that money should only be spent on endeavors where the outcome is known to benefit humanity. They couldn't accept, despite not having any science or math training, that sometimes we study things that start off seemingly useless and become important, or that we sometimes stumble upon things. (And this is why we should spend money on science in general).

Think about all the scientifically illiterate politicians who control money and funding who probably have similar views.

I can totally understand how we let things get this bad. If you don't provide a positive result, you might be facing unemployment. It's not about advancing our knowledge anymore.


Why spend money on the military if we’re not at war?


> If publish or perish is going to continue to be a thing, maybe just switch to publishing negative results. It's not sexy, but it's valuable too.

If we are to be allowed to publish papers from negative results then I should go and dig out my PhD notes and get writing!

Seriously, though, I started noticing the many broken incentives early on, after a explanation about "impact factor" from an older PhD student in our lab.


Same here - I went into science thinking that it would be all about trying new ideas and pushing the boundaries of what we know. And while in a certain sense that is still present, it takes a back seat to the perverse incentives you describe.

This was one of (and unfortunately not the only) reason I decided to leave academia after my Ph.D., leaving my field entirely and going into software engineering rather than taking a postdoc position. Academia isn't for everyone, and neither is industry... but wow, I sure am happier where I am now.


Yes, they'd get retracted if caught and be a black mark / destroy a career if someone did something like this and it was proven they intentionally faked it.

What often happens though is that the paper will get retracted but then the authors play hot potato in blaming different people for faking it and it turns into a whole mess where you don't always know (i.e. cannot prove) exactly who did the faking.

"Someone else faked it and I missed it, just like all the reviewers!!"


I published a number of crystal structures using a powerful X ray diffractor and high quality crystals where H density was both found and modeled. Modeling H density in and of itself is in no way a sign of fake data, it is just a rare occurrence.


Hang on, isn't a model by necessity not the same as the thing it's modeling? Obviously a photo is sort of a model, but depending on what the subject is, you can visually verify it. If it's a software model, and say you enter lots of parameters, I have to wonder how valid that model is.. How can it be verified?


The way you solve x-ray diffraction structures is that you refine your model until the electron density matches.


Thanks.. But until your electron density matches what?


You change the position and identity of the atoms in your model until there is very little residual electron density left remaining.


These are words, but I don't see how they prove the model. I guess it's like some sort of check after the operation is complete?

Why does a low residual electron density prove the model? What is electron density anyway? And how do we measure it?


Yes humans communicate using "words". Nothing is ever "proven" in science since there is always an arbitrarily complex hypothesis than can explain captured data. Instead, the weight of evidence is considered.

Anyway it seems like this whole conversation is out of your depth currently. There are many good X-ray crystallography texts available if your interest continues.


Sure. But in this case there was no claim of a sufficiently strong X-ray source for this, so it fell flat on its face.


The X-ray I used was "in house". The OP didn't mention what the X-ray source used was so the absence of mentioning "sufficiently strong" isn't proof.


"optimize a predicted structure of their molecule using density functional theory (DFT), then draw it like a crystal structure would be displayed (a 3D plot with error ellipsoids at each atom, with probable bonds shown)"

why not just publish this instead. surely an optimized model is still valuable.


Typically not on their own - it's very easy to calculate an optimized structure that is incorrect, so you typically want some sort of justification for your structure. Typically you would be calculating one of these geometries along the way to calculating other interesting properties to either compare with experiment or make new predictions (or both). The "gold standard" is... comparison with a crystal structure from X-ray diffraction.

In my field (small inorganic molecules used as catalysts), I typically calculated the structure along the way to calculating the molecular orbitals for describing the electronic structure (important for understanding many properties of the catalyst and its reactivity) and predicting the infrared and visible spectra (useful for comparison with experiment and for explaining the electronic and magnetic properties of the molecule). It's the whole picture that's valuable, not just the structure - and the more experimental properties you include, the harder it is to fake the whole thing.


I absolutely agree. Most of the time, getting very good DFT results is actually far more difficult that getting the experimental results of cystollography itself. They're both quite hard, and it depends on which techniques are applied, but getting computational methods to match with great quantum accuracy is very, *very* difficult. I know more about condensed matter physics and GW Bethe Salpeter approaches etc, than proteins, but the difficulty in getting good computational methods to work is still out of reach for ~95% of scientists even in that field.


Not to say this disproves anything, but you can definitely see hydrogen atoms in extremely high resolution crystal structures- typically with resolution below 1A, which is rare for proteins but should not be uncommon in inorganic crystals (right?)

ANy time you have a 3D structure (model) you can run the equations in forward mode to create the structure factors of the crystal that would have created that model.


Yes, with a powerful enough source you can resolve protons, that's true. They didn't make such a claim - they listed a run-of-the-mill instrument in their methods section that would not have been capable of this.


Most of the time hydrogens are placed in calculated positions because you know where they are and it also reduces the number of parameters needing refinement. In good datasets hydrogens will show up in the density map (aren't all datasets good now in the age of area detectors), and you can refine (possibly with restraints) those hydrogens that you are uncertain about.


Raleigh's criterion says hydrogens should be just on the cusp, at least with home source xrays. Not sure about synchotron.


I mean, I know this is possible- I've definitely seen maps from very high quality (<.8A) crystals that clearly had defined bumps on the heavy atoms that were where you'd expect the hydrogens to be, and later on, neutron scattering confirmed it. But the only times I've seen this it was world-class competitive groups trying to elucidate the specific enzymatic pathway for a protein of industrial interest.


> the author attempted to fake crystallography data

That is not what fake data is. It might be a misrepresentation is the author tries to pass it out as experimental results, but it is not fake, i.e. it came from a real study and actually has some meaning (though not much if it is indeed an unstable structure that can just be observed in a computer, and it is ground for rejection depending on the journal).

I mean, I don’t have much patience with people over-hyping their results, but fake data of the common sort is either photoshopped graphs, re-used spectra coming from other studies or compounds, or things like that. And this definitely happens. But the “it came from a simulation, therefore it’s fake” argument is very unhelpful in general.

A spectrum from a simulation is not fake, it’s just a simulation result.


It is fake if it's being misrepresented as experimental results instead of just a simulation


Again, that is not what this word means. It’s misrepresented data, not fake. And yes, the distinction matters.

This is also not what the article is about. The problem they face is completely fabricated data, not DFT structures in a poorly worded paper.

Of all places, this is probably the last one where I would expect this argument.


by this metric no data is fake. It's just misrepresented and was actually numbers I pulled out of my ass.


Intent to deceive is what makes it a fake/fraud. No?


It is arguable that it is more than a misrepresentation if the author's intent is to fool the reveiwers. If so, it is "fraudulent data".


Most of these are probably not even possible to catch, if the perpetrator has a basic understanding of the theory they could generate those correctly no problem.


Hence why fake it till you make it is so popular


You can see the hydrogens if you have enough resolution (below 1A). These days you start to see such good resolutions pretty often.


Here are two articles supporting my point as you don't seem to believe it. I also recommend you go look at PDB and select structures below 1A so you can see for yourself.

https://bmcresnotes.biomedcentral.com/articles/10.1186/1756-...

https://www.science.org/doi/10.1126/sciadv.1600192 """ Textbooks and teaching classes still emphasize that hydrogen atoms cannot be located with x-rays close to heavy elements; instead, neutron diffraction is needed. We show that, contrary to widespread expectation, hydrogen atoms can be located very accurately using x-ray diffraction, yielding bond lengths involving hydrogen atoms (A–H) that are in agreement with results from neutron diffraction mostly within a single standard deviation"""


> One final note: the "authors" of all these current 992 papers are from Chinese medical institutions, most of them appearing only once.

I’ve observed something sociologically fascinating about China. In multiple fields, gaming, research, academics etc, the proportion of the population likely to cheat seems to be higher in China. Examples include,

https://www.theatlantic.com/education/archive/2016/03/how-so...

https://www.ign.com/articles/2018/02/16/99-of-pubgs-banned-c...

https://www.reuters.com/investigates/special-report/college-...

https://www.nature.com/articles/d41586-021-02587-3

There are cheating scandals everywhere, but, as an outsider, it seems to be somewhat more endemic in China. The anomaly begs the question, why.

Being born in China doesn’t radically wire your brain in a different way. People seem to cheat everywhere, but more people are more likely to cheat here, in this specific environment, why is that? What’s causing them to adopt this behavior as an acceptable strategy?

I can’t seem to find any interesting research on the topic beyond idle speculation. Does anyone know of anyone investigating this?

Note - this comment is not meant to be denigrating nor a judgement, but an observation and a question. The question begs a deeper attempt to ascertain a) if it’s true and b) the factors driving it.


If your peers in online classes are cheating and you’re not, you’re at a material disadvantage when it comes to grading. Unless the teacher decides to start reporting your peers for academic integrity violations [1].

Same goes for academia I suppose. When incentives for academia are aligned towards publishing, and if there is minimal or no enforcement of cheating at an organizational level, you’re at a disadvantage if you don’t cheat.

I don’t think Chinese academic institutions have robust academic integrity and enforcement policies and unless that changes, you’re likely to see Chinese academics stuck in a Nash Equilibrium.

[1] https://news.ycombinator.com/item?id=31544634


In grad school I had to do a term paper; my three partners were Chinese. Two of them approached me and asked what we were going to do when the third one cheated/plagiarized his part of the paper. Apparently, cheating is a phenomenon specific to subcultures in China — and this one student belonged to that culture.

When we got his part of the paper it had just been copied wholesale from academic sources; the professor had already been prewarned. We all got an "F", and the one student was put on probation, and lost his funding. It really sucked. The rest of us were given a chance to rewrite the term paper.


> the professor had already been prewarned

> We all got an "F"

> The rest of us were given a chance to rewrite the term paper.

That seems really unfair. Why didn't the rest of you just get a grade based on the quality of the rest of the paper?


There is a strong prejudice against the Fujianese. Cheating is an endemic problem throughout China due to extreme competitive pressure and lax oversight, but when someone from Fuijian is found to be cheating the explanation will of course be that they are Fujianese.


Cheating on this scale doesn't (usually) happen because a thousand people independently come up with this idea. Something about this environment makes it possible for prolific cheaters to scale farther than they do elsewhere. There's a deep dive that could be done here, but at a high level, it comes out to: there's money in it, incentive structures drive money here. Follow the money.


this is a good place to start: https://scholar.google.com/scholar?hl=en&as_sdt=0%2C47&q=aca...

e.g. https://eric.ed.gov/?id=EJ1178415

which describes cultural differences in cheaters


Assuming this is actually true, my guess would be the social pressure.


A friend of mine got his first programming job in 2010 as a programmer assistant of a physics professor who needed to automate the procedures of proving crystal structures from mass produced PhDs impossible using graph theory. He was finding so many fakes, mostly coming from Chinese PhD students he needed to write (and optimize) the software to disprove them at scale.

These PhD papers were apparently the easiest to quickly edit from someone else and push past the uninterested reviewers.


I remember in grad school a principal investigator that was notorious for driving his grad students and postdocs hard: he had set up a cot for them to sleep on in the office, if the bioreactors we're turning making protein he would ask wtf was going on (got to keep the pipeline filled)

Anyways they reported a structure of a very important protein, the pump involved in exporting antibiotics from a bacterium. But none of the data made sense, there were known mutations that changed activity that were in strange places in the molecule. He chalked it up to "allosteric interactions".

The problem with crystallography is that we can't directly measure the phases of the diffraction pattern (we can measure the intensities) and as anyone who does Fourier transforms knows, the phases contain more information than the intensities. So we typically bootstrap a guess of the phases using a similar protein, then calculate a rough structure, then assign more detailed locations for specific atoms based off of chemical constrainst, then recalculate phases of the model, then apply these fake phases to the known intensities to get a refined electron density, and so on iteratively.

The risk is you can think your shit smells like roses, and use math to support your claim .

Anyways someone didn't bite the bs, and actually redid all of the phases using three orthogonal methods (xenon soak, gold soak, selenomet) and showed that three transmembrane helices were totally misassigned. The original pi wrote an apology, chalking it up to his python script he'd written hastily flipped the signs of some of the rows.

This is bullshit. I cannot imagine how flipping some of the signs could have resulted in this error. What is more likely is that the PI bootstrapped with a premature model of the protein, that made the threading errors, where he threaded the sequence through the electron density completely wrong, as a guess, to beat competitors to the punch.

Anyways the pi, Geoffrey Chang, was never called out on this bullshit or his toxic workplace. He's still a professor in academia, at a highly ranked institution.


Here is a link to some commentary on the ABC transporter debacle.

http://xray0.princeton.edu/~phil/Facility/Guides/ABCtranspor...

I'm not 100% sold that it was malicious, but imo that doesn't change much. Chang's job was to be a PI and vet these things and whether he failed in his duties intentionally or unintentionally, he should lose his job.

The Xray data and model bias and handedness is so complex and hard to deconvolute that since that incident I've always wondered if we are doing enough to vet the huge amount of xray structures on deposit. I'm very glad to be primarily in cryo instead...


Just to be clear: I don't think it was malicious, I just think the excuse he gave was bs. I could be wrong -- I'm not a crystallographer, just a nonprofessional expert (I have given talks) at the Fourier transform. I just can't see how a centrosymettrically inverted structure would get some of the helices correct

Ok, downvoted. Can someone explain to me how a centrosymettrically inverted structure will get some of the helices correct?


That's unfortunately all too common of a story. And with the way peer review works - where the leaders of any niche topic are the ones who will be asked to peer review articles in the same area - the system makes it very easy for an established professor to shoot down dissenting voices.

Now, sure, if you can prove that there was deliberate misconduct on the part of the PI, then you can go through channels and hopefully get an investigation opened. There was an article on HN recently (linked in another comment here) where this happened/is happening with Alzheimer's research. But it's often not so clear cut, as you've pointed out - you leave enough room for ambiguity, then "apologize" and retract the paper. There's no (significant) negative consequences for the retraction if you're already an established leader of your field (and you can still use your influence over the peer review process to shape the field to continue favoring your research). Meanwhile, the accuser is all but guaranteed to be blacklisted from their field for the same reasons - and if they don't have tenure yet, then game over, hope you had a backup career in mind.

The current incentives in academic research are broken. Sometimes I wonder if the tenure system is partly to blame - does it go too far in creating these protected positions of power that can be abused? Then again, tenure exists for a very good reason - to protect dissenting voices by being pushed out by these same forces. It seems more and more however that tenure is no longer sufficient to protect against this (or maybe it never was), and we need broader reform of the peer review system in order to foster healthy and open discussion in the research community.


I don't think it's tenure. Chang did not have tenure at the time.


>One final note: the "authors" of all these current 992 papers are from Chinese medical institutions, most of them appearing only once. If anyone got a raise or promotion based on their publication record off this stuff, what a waste of money that was. . .

This is the result of mandatory publication requirements placed on the Chinese medical professionals. Gaming the system with paid fake publications is just another industry that popped up to meet this requirement.


Reminiscent of "Two decades of Alzheimer’s research was based on deliberate fraud" (also published by science.org):

https://news.ycombinator.com/item?id=32212719

https://news.ycombinator.com/item?id=32183302


Eh, I'm gonna posit something other that what's proposed: that these articles are intended to raise someone's publishing record.

I'd say there's 2 reasons one might pollute the datastream:

1. Obfuscate competitor's data queries

2. Use "approved" articles as a tool to trick regulators in a grander series of fraud.


What's staggering is that small-molecule crystallography is so cheap. A x-ray diffractometer with area detector is maybe 50 kUSD, and after 8 hours you have a dataset, the full hemisphere. It's baffling why anyone would cheat!


Growing good, X-ray quality crystals is more art and luck than science.

It's particularly difficult with most proteins and other biomolecules. [0] But even for small molecular compounds, it isn't always easy. I'm not sure about growing high quality crystals of metal-organic frameworks, but they could easily be limited by defects. Some compounds might be too waxy if there are long aliphatic chains involved.

Even if you do get decent-looking crystals, they might be twinned. If you are working with air or water sensitive compounds, they need to be handled with extreme care and might degrade before the data collection is complete. Some might also have temperature sensitivity, such that they need to be kept at -20 C to prevent melting or another phase transition.

Overall it can be a real make-or-break issue for someone engaged in chemical research.

[0] Check out this PDF of a PowerPoint on the topic: https://hamptonresearch.com/uploads/documents/ramc/RAMC2011_...


Growing high quality crystals of Organometallic compounds is easier than many other disciplines in chemistry, orders of magnitude easier than proteins, as an comparison.


Depends. A random metallo-organic compound vs a random protein? Yes. But the problem can be further up the chain. Synthesis can be expensive and difficult, and you need a relatively large crystal (at least for X-ray). If you have low symmetry, you need several relatively large crystals all of which need to collect enough data before they die to integrate across the datasets. It typically isn't thought to be as challenging, but it can be.


In my experience it's always been easier both growing and solving organometallic compounds. Borrowing the phase from previously solved structures is trivial for organometallic compounds which simplifies the density modeling while the fact that most organometallic compounds lack the chirality found in every protein generally means they pack into higher symmetry groups facilitating the growth of larger high quality crystals.


I guess the hard part is making the real substance and then making nice crystals to put in the device. I think they are just lying about everything.

Also, it looks like they are changing the topic each year to follow the current fashion, and they are publishing like 200 papers per year, so $50K is like $250 per paper. My (totally unsupported) guess is that they are charging a few hundred bucks per paper, so using a real diffractometer to make a new unique fake image for each paper would duplicate the price of each paper.


Unfortunately Chinese academic output is heading towards default ignore mode. What's the alternative? Sad for all the great scientists there. This sort of thing is absolute cancer for research productivity. Think about the long term effects it has on good honest people doing research, and the ability of those people to attract funding.

Behind the scenes I imagine journals are finding it increasingly difficult to find people to review papers from inside China.


I'm not a specialist in the field, though it could be obvious for one, what makes criteria for paper to be fake one. There's no process described to check against some criteria. Except for weird generated wording.

There should be some border between an obviously fake paper and obviously real one.

Fake papers could be used by someone to remove and cancel a real papers together with them if no good criteria is used to separate them. Chinese author name or address does not seem to be a good one.


I wonder if those structures were compared to what AlphaGo output and found significant differences.


These aren't protein crystal structures, they are metal-organic frameworks (MOFs), so AlphaFold probably wouldn't work well on these ones.

It would be really interesting to see an equivalent model trained to predict these structures. The physical chemistry of transition metal complexes, especially when multiple metals are in close proximity to each other and connected by shared ligands, is much more complicated than proteins. The reason is because of multireference effects - essentially the quantum entanglement of multiple possible electron configurations. These are exceedingly difficult calculations to perform - common approximations are O(n^8) or worse and require highly specialized knowledge to apply correctly - so an ML model that can efficiently make predictions in this space would be a major transformative breakthrough.



That one was posted later, while this one was (independently) picked by a mod to go in the second-chance pool (https://news.ycombinator.com/pool, explained at https://news.ycombinator.com/item?id=26998308), so it got a random placement on HN's front page.

Probably the thread is interesting enough to deserve a merge, so we'll move the comments thence hither.


Someone did this with intent to prevent a discovery




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: