- "A key limitation of the study is its reliance on authors themselves revealing what instruments they used in figure captions and not cropping out relevant image metadata, Richardson said. "
2,400 out of "more than 1 million" is a much smaller fraction than 2,400 out of 8,515. Could there be an explanation along the lines of "neglecting to strip SEM instrument captions strongly correlates with quality issues elsewhere"? Is it a social norm to remove these metadata captions from images before publishing them in journals? I'd suspect something like that, since >99% of these >1 million papers did just that.
The title of the post here says “thousands” and the original preprint says “widespread”. 2400/8515 is a pretty significant fraction. My guess is that the metadata was stripped out of images in the publishing workflow. You don’t necessarily need that metadata banner to interpret the image, and if you remove it, you could make the image physically bigger (in print). But, nowadays, journals won’t accept the original images without the banner intact. The journal may remove it in preprocessing, but the authors still have to provide the full image.
Thus, the lack of removal of the metadata is probably also an indication of minimal editorial effort. Cropping the images takes time and it reformats papers. Getting a paper “camera ready” (which is a misnomer today), takes time and effort (which means money). If the banners aren’t stripped, it was probably a lower quality journal.
I suspect that out of the 900,000+ other papers (that they couldn’t analyze), there is still some rate of misidentification of the instrument (even by mistake), but no where near 28% (2400/8515).
They aren’t talking about metadata records from a photograph. I believe they are referring to information that’s included in the image itself as a banner overlay. In the link, it’s the box at the bottom of the image with the date, scale, settings, etc. Each manufacturer likely has their own information they include, thus making it obvious which instrument was used. The name of the manufacturer is also included on the image.
One reason why metadata like this is included in a medical/research image itself is so that it can’t be stripped unless the image is obviously cropped.
Intent, usually. People tend not to realize they can reveal their GPS location down to the centimeter when uploading a candid phone shot to Facebook, so the major services tend to strip off anything that's not the image itself. (Less bandwidth, as a bonus.)
Curious if you might have situations where the really good image for publication is taken from a higher quality device, while the day to day work is done with an array of cheaper devices.
Tempest in a teapot. The SEM photos are likely taken by a specialist technician who is not an author but who is known to the authors of the paper.
A grad student who is n-th author on the paper copied and pasted the instrument name from a previous paper, because (a) that paper used photos from the same SEM, (b) the grad student has never seen the SEM, (c) the name and model number of the SEM doesn't actually matter (gasp!) it's more included for tradition than anything else because the contents of the image are what matters - no one who tries to replicate the experiment in a lab will care what SEM model was used, they'll use the SEM they have at their own lab, period.
Of course copy-and-paste damage happens in an area that no one but this "researcher" cares about or bothers to read.
This is about turning data mining into drama factories, nothing more.
In an earlier life, I did a deep dive into academic publications on simulating camera lens blur computationally.
Nearly every paper that mentioned the topic cited the same original work for the same numerical result. The result in question was clearly a reasonable result (you could derive it yourself with a quick hand-waving calculation). The problem was it wasn't actually mentioned in that original paper. And yes, many of these papers with the incorrect citations were written by the greats of the computer graphics field.
At the time, I talked about this with friends who were academics from a wide range of different disciplines. No one was surprised. This isn't at all unusual.
The reality is copy-and-paste damage happens all the time in the published academic literature.
It's not a sign that every author is an evil malicious fake news generating bad actor. It's a sign that writing papers is a huge amount of work, frequently with tight deadlines, and every human faced with that context (a) is imperfect and (b) focuses their attention on what matters.
If you want to build a data mining machine to find witches or communists or whatever word you want to apply to supposedly "bad" people, your data mining machine will find people to blame, not because all people are bad but because the real world is full of real errors.
Related: I also went deep diving into witness encryption only to find some glaring issues with the most cited paper. You wouldn't notice it at first glance, as it appears sound. It isn't until you actually get into implementation that you find that it doesn't actually do anything at all.
Other papers on this topic also, apparently, haven't actually implemented it either. The actual real implementations barely resemble the papers.
> A grad student who is n-th author on the paper copied and pasted the instrument name from a previous paper, because (a) that paper used photos from the same SEM, (b) the grad student has never seen the SEM, (c) the name and model number of the SEM doesn't actually matter (gasp!) it's more included for tradition than anything else because the contents of the image are what matters - no one who tries to replicate the experiment in a lab will care what SEM model was used, they'll use the SEM they have at their own lab, period.
If this is true, then is the tradition counterproductive, in that it's encouraging false assertions to be put into scientific papers, for no good reason?
If so, why hasn't the tradition been reconsidered, and changed?
Perhaps because researchers in other disciplines secretly wish they were Physicists, since Physicists live in a world where this kind of equipment detail actually does matter?
I don't think the right take away is to not trust science. But a degree of checking for yourself and a certain suspicion of brand new results that are surprising isn't completely misplaced. A great way to learn this is to buy an engineering book from the early 1900's, you'll find chapters that were completely debunked often, not just evolved. We seem to forget this and assume all we know now must be true, but we also accept that even 50 years ago they believed wrong things and put them in engineering and science books.
2,400 out of "more than 1 million" is a much smaller fraction than 2,400 out of 8,515. Could there be an explanation along the lines of "neglecting to strip SEM instrument captions strongly correlates with quality issues elsewhere"? Is it a social norm to remove these metadata captions from images before publishing them in journals? I'd suspect something like that, since >99% of these >1 million papers did just that.