>“It will make it very easy for any researcher or group of researchers to create fake measurements on non-existent patients, fake answers to questionnaires or to generate a large data set on animal experiments.”
Perhaps I'm naive, but I think the people that want to fake data were already doing it without tools like chatgpt. Especially since a ton of biological data is normally distributed, so it's exceedingly easy to generate plausible fake results for such data without a system as advanced as chatgpt
Cases of fabrication have been caught because the fabrication is done so badly, i.e. inplausibly. It's often just imputing some "random" numbers or repeating samples etc. Many would be amazed how technically and mathematically illiterate scientists often are. And probably ones that fabricate even more so.
Maybe it will increase and/or get a bit higher quality with LLM fakery. But as with many "AI bad" themes, the problem isn't that "AI" can fabricate the data. The problem is fucked up institutions and cultures.
The issue is that we consider passing peer review to be enough for something to be taken as truth when it should really be reproducibility/replicability. That, and the incentives currently driving academia are absolutely ridiculous. Paper mills are merely a symptom of academia's ills, not the cause of them.
A null result, or the absence of evidence, is not evidence of absence. If you fake a null result, you’re not asserting anything other than you could not measure and collect supporting data using your experiment to prove or disprove a hypothesis. It is difficult for someone doing replication to accuse you of ill-intent, as opposed to faked data that proves your point when anyone else can replicate your experiment and get totally different or even contradictory results.
You still have to give out the statistics that show the null result. E.g. something with a high p-value. You are in fact "confirming the null hypothesis". They aren't any more difficult to replicate than results supporting "the alternative hypothesis".
(The whole binary hypothesis system and culture is a mess though, but that's besides the point.)
However, I think that no one will do the scut work necessary to find that a null result was faked, and even if they do since you the researcher got very little status out of it then it’s believable that you made a mistake, and didn’t falsify data.
I'm not sure this is much better than the state of the art. Training a model on data and then having it generate new, fake data, is not only easy, it's a standard tool for model boosting.
I wouldn't immediately call creating synthetic data 'poisoning the well' unless it is actually distributed as such. For training models with a minimal amount of quality data, it is a viable method for generating more data to increase the quality of the models. But any legit organization will obviously label synthetic data as such.
This is so worrying for me, the amount of digital garbage that can now be generated that is not obviously garbage, that one must read and discern first nonsensical bad text generation and second if factual, is truth becoming a needle in the hay stack? How can this be cleaned?
GPT definitely wins out if you want more novelty/variety in your fake data and are willing to accept extraordinarily higher cost, less rigor, and less reliability. I'm sure there's some occasion when those criteria win out, but Faker's pretty decent most of the time.
Sure, but what do you expect? It's like blaming Boeing for genocide and then trying to get rid of 747s.
There's plenty of research into AI safety. There were some damn coups going on over AI safety. The general public defines AI safety as Skynet and homemade bombs, but it's also things like this - political manipulation, astroturfing, fake data, the risk of another industrial revolution.
It's something we should be slamming the brakes on, but most of the people calling out AI safety are also building their own B52 bombers, so nobody takes them seriously either.
80000 Hours has been telling people to get into AI and nuclear policy for years now. Hopefully we have some competent people in govs who do something.
>You have already gotten the basics of faking data incorrect.
Let's get past the unnecessary antagonism. Can you please explain the above?
Or actually, forget it. You history suggests you don't respond and something like 99/100 of your comments are dead. Maybe this isn't the place for you?
No one believed those who said "make it idiot proof and they'll just make a better idiot", but we should have heeded this warning. They did it! They finally did it! They made a better idiot!
Roughly speaking, the whole point of an LLM is to create plausible sounding text, without regard to the truth (which it cannot determine or derive), so it's a tool that is perfectly suited to this sort of malfeasance.
I’ve read this over and over, and believe it - but it’s so easy to forget when it absolutely nails something like debugging or suggesting a CMake edit or finding 5 letter combinations that are reversible with a vowel in the middle, or etc.
It’s just still mind blowing I can get a sarcastic summary of an email in the theme of GlaDOS from Portal and in the same screen get an email proofread.
It’s funny how far “what should the next word be” can go.
It might seem that there are cases where the ability to generate heaps of plausibly sounding text on demand is helpful, and there are cases where it is pretty much the worst possible capability to have.
i don't agree. The way that most LLMs currently generate response might be through this modality. But the purpose of LLMs is to get closer to simulating how human minds and languages operate. A lot of models have been trying to overcome the problem of fabrications. GPT bots like SciSpace's ResearchGPT are trying to do precisely that.
"Gordon’s great insight was to design a program which allowed you to specify in advance what decision you wished it to reach, and only then to give it all the facts. The program’s task, which it was able to accomplish with consummate ease, was simply to construct a plausible series of logical-sounding steps to connect the premises with the conclusion."
From the thumping good detective-ghost-horror-who dunnit-time travel-romantic-musical-comedy-epic: Dirk Gently's Holistic Detective Agency
It's only a matter of time until someone comes up with a GPT that takes whatever off-axis theories a research paper writer wishes to promulgate, and searches the entire corpus of academic literature for references that can be strung together in such a way as to support any argument one likes.
A quack's dream come true, substantiating an argument by backsolving from its feeble or malevolent conclusion to a set of well-known premises but-with-citations. converting untenable speculation into something that passes many superficial tests of legitimacy, which is more than enough to boost it into broader and less critical visibility.
"thick with citations, therefore truthy" is a big blind spot in the casual heuristic used ro gauge the quality of a given piece of research writing, especially at the undergrad level where this tool, lets call it CheatGPT, would be stupendously popular.
Have you been watching American politics for the last 20 years? Tea Party to Q/Maga, they will cite any quack economist, quack scientist, or quack doctor that provides a theory that enforces their narrative, data be damned.
There is in fact a treasure throve of high level research that is to controversial for any expert to go near.
Wild things will happen if screaming hoax from ignorance can no longer shut down constructive efforts.
It will simply combine what is written about germ theory or heavier than air flying machines and produce sensible responses.
The patent db's are full of treasures if you have oh 1000 years? to study it. Maybe 10 000?
It should also be possible to take a seemingly unworkable idea that makes no sense and gather just what is needed to bring it into reality.
For stuff you can build or otherwise test properly it makes no difference what people think is possible.
People think very little is possible, we always did! Everything that can be discovered has been discovered has been the mantra for thousands of years. This while the things people actually accomplish seem to get more and more astonishing.
I understand what you describe, and it's possible consequences. But I would put forward two arguments;
1. Those who want to believe in bullshit conspiracies will do it regardless of the amount of citations in a research article terribly summarised by a clickbait web page of which they only read the headline. Any who can read more than 10words have been using and abusing google scholar for years to support their nonsense, the ability to find a reference does not equate to the ability to critically appraise the contents.
2. Science is a small world. In each specific field you get to know the big names and institutes, and those are used as a better gauge of the quality of a paper. The peer-review, for all it's pitfalls including its dependence on volunteers, does a good job at stemming a lot of bullshit. I'd proffer it's not the articles but rather the multiple for-profit publishing houses setting up multiple journals through which they funnel pay-to-publish articles that are contributing to the dilution of trust in published science. Again on that note, scientists in each field know which journals to trust and which to double check.
> scientists in each field know which journals to trust and which to double check
I've not seen much evidence of this in my own reading. Maybe in some fields, but certainly not all. During the COVID years I read a lot of epidemiological and public health papers. They all had dozens of references and would be published in well known journals like Nature, BMJ, the Lancet etc. Yet when checked many of the referenced papers would simply not validate. For example, they existed but wouldn't actually support the claim being made. Sometimes they wouldn't even be related, or would actually contradict the claim. Sometimes the claim would appear in the abstract, but the body of the paper would admit it wasn't actually true. That was only one of the many kinds of problems peer reviewed published papers would routinely have.
It became painfully apparent that nobody is actually reading papers in the health world adversarially, despite what we're told about peer review. The "a statement having a citation = it's true" assumption is very much held by many [academic] scientists.
It's a subcomponent of the very strong belief in academia that everyone within it is totally honest all the time. This is how you end up with the Lancet publishing the Surgisphere papers (a paper using an apparently fictional dataset), without anyone within the field noticing anything is wrong. Instead it got noticed by a journalist. It needs some sort of systematic fix because otherwise more and more people will just react to scientific claims by ignoring them.
This is already possible with search engines, there is enough information on the internet that you can substantiate just about any claim regardless of how much evidence there is to the contrary. (see flat-earth, plenty of plausible sounding claims with real, albeit, cherry picked evidence).
Yes of course, this is already possible with AI writing assistance as well, if you're willing to plug in some of the phrases they come up with into a search engine to figure out where they may have come from. But you still have to do the work of stringing the arguments together into a cohesive structure and figuring out how to find research that may be well outside the domains you're familiar with.
But I'm talking about writing a thesis statement, "eating cat boogers makes you live 10 years longer for Science Reasons" and have it string together a completely passable and formally structured argument along with any necessary data to convince enough people to give your cat booger startup revenue to secure next round, because that seems to be where all these games are headed. The winner is the one who can outrun the truth by hashing together a lighter weight version of it, and though it won't stand up to a collision with real thing, you'll be very far from the explosion by the time it happens.
AI criticism is essentially people claiming that having access to something they don't like will end the world. As you say, we already have a good example of this and while it is mostly bad and getting worse it's not world-ending.
As an academic researcher, I find analysis of GPT-4 itself — as it pertains to other fields — to be essentially meaningless. There are no guarantees that the version of the model that was used will be available in the future (the API endpoints seem to have a ~12 month future-looking guarantee at most).
Don't get me wrong:
1. GPT-4 is incredibly interesting
2. Studying GPT-4 is interesting for people working in that field
But when I see people writing about how GPT-4 can pass the USMLE (etc), it has no lasting meaning. It might as well be marketing for OpenAI, and to me it has roughly that amount of academic importance.
Yes, and that's interesting both popularly and to people doing research in developing better models.
But for people who are nominally using this to conduct scientific inquiries in other domains, the specific performance characteristics are what actually matter. When I am writing about the results of my semantic segmentation model, the characteristics of that specific model are more important than the notion that future models will be at least as good.
Hence my critique being pretty narrow (the academic use of GPT-4 for downstream science).
If you haven't read the breakdown of the epic Jan Hendrik Schön scandal in which he published about a half-dozen fraudulent papers in Science and Nature based on fabricated results regarding organic (chemically speaking) semiconductor devices cooked up out of thin air, start here. Required reading for any young graduate student IMO. The shame of Bell Labs, Science and Nature, all taken for a ride:
If that fraudster had started out with ChatGPT4, the fraud might have persisted for another decade (because organic semiconductors don't seem to have the capabilities he believed they had), because he was only detected via replicated datasets. If he'd had ChatGPT4 to generate new plausible datasets, well...?
I guarantee you that a significant fraction of the people in academia who 'got there first' on significant discoveries in science did so by fabricating data along the lines of Schön. They just guessed right, and fabricated data, and then more serious careful scientists were able to replicate their bogus work later.
Schön guessed wrong, and every effort to replicate his work failed, and Bell Labs, Science and Nature were left with egg on their face, which they're still trying to wipe off. ChatGPT4 and its shady parents and affiliates will only make this problem worse, not better.
"Benefit to humanity" my ass.
[edit: if you wonder why I sound so salty I read all those Schön papers with interest and fascination when I was a young graduate student myself, now I'm older and seriously jaded.]
"The authors asked GPT-4 ADA to create a data set concerning people with an eye condition called keratoconus"
That had me confused for a moment, since there's no GPT-4 model called Ada (the current embeddings model is called that, and there was a GPT-3 LLM model with that name too).
Then I realized they were using ADA as an acronym for Advanced Data Analysis.
A long time ago, I ran into the books The Art of Deception and How to Lie With Statistics. They seemed like a good start on training people to spot deceptions. There were articles here on things like p-hacking. Then, the replication crisis.
While no time now, I’m still interested in making a list of resources (esp free) that tells how to construct good studies, has comprehensive presentation of all categories of mistakes/lies we see in them, examples of each, and practice studies with known errors. Anyone here got good books or URL’s that could go in a resource like that? That could train new reviewers quickly?
If I return to AI or ever work in it, I also planned to teach all of that to AI models to automatically review scientific papers. Might contribute to solving the replication crisis. Anyone who’s doing AI now feel free to jump on that. Get a startup or Ph.D. with a tool that tells us which of the rest are fake.
The paper[1] itself reads like marketing material that was itself written by GPT-4. Why was this published? And why is Nature reporting on it? As someone who generates fake/simulated datasets for a living, the scientific value of this paper is totally lost on me.
I asked ChatGPT to come up with a list of references for a topic I was researching. Every single citation it came up with was hallucinated except for one which turned out to be extremely useful.
The framing makes it sound like it's a "bug" or something. From my understanding it's not, because it's hardly a reliable reasoning tool: whether using statements or using data. Unless we come up with or advance a better architecture, similar "panic porn" is useless, not to mention this reeks of a hit piece. Just verify everything and stop with the blind trust.
People will find such dumb reasons to get outraged. Folks have been photoshopping GFP expression for years. Then some halfwit will post it "there's some evidence that X is Y". Try to replicate it, you can't. Because the science is fake.
This is not particularly surprising, nor is it really bad - generating data that appears valid is really powerful. Although it was a bit funny since I happen to have keratoconus, and I don't hear much about it often.
If the criteria for truth is a convincing argument, and we have a machine for generating convincing arguments to fit anything, and analyzing the argument is too much trouble, then what?
There are plenty such papers out there - now obviously disproven. At some point smoking was prescribed as a cure or various heaelty issues. Letting you know so you don't feel impressed by what was "created" when a chat bot generates such a paper.
If Arxiv was used in training, some of those scientists were the ones that taught it to do this. Well, a good chunk of whatever they scraped off the Internet, too.
you literally have to put in every number for it to do mathematics correctly...
its as stupid as that. some try to get around it by indeed only having the 10 different digits and glue them together, but its a hallucination that that works.
an important point in generalization is for example that you teach it something. This is literally important
'ycombinator is a website' is a prompt that is almost impossible of ycombinator is not in your training set
It's easy to mistake entropy for novelty. Computers don't create: they compute. Calling an LLM "Artificial Intelligence" is a bit like mistaking a pseudorandom number generator for true noise.
I think that by that same logic, you could say human artists, writers, etc. don't create either, they just move existing matter (which typically isn't created nor destroyed) from one place to another. You could also say an electric company doesn't generate electricity but merely converts other energy into it -- one of the most popular uses of the word "generator" yet the atoms/energy are already here; we just arrange/convert and call it generation. I must insist that sorting some things a certain way is widely known as generation/creation.
I can see how other words are a bit more precise, though. Synthesize, perhaps?
This is a human supremacy argument, essentially claiming that because we have a "soul" or because of something inherent in us that cannot be proven we are better than something else. You are free to believe this, but it is a matter of faith. Not of any sort of reasoning.
Yep. I'm not much of a religious person, but if I were, I might say: not only does AI not create stuff, humans don't either, only the Creator did! I suppose the big bang theory doesn't stray far from this either. Point is, maybe we use this notion metaphorically/imprecisely even for human output, and therefore we might as well extend it to machines.
Its insane how they think they can have their AGI and eat it too! The closest thing we have to AGI is like a human child even though that's Synthetic (mankind made) General Intelligence and they can't really be truly relied upon and not learn or formulate sentences you don't like
Somewhere in the mind's eye of science fiction is a world where we have near-unlimited productivity and knowledge and we're all free to pursue our self-actualized lives as we see fit.
But as productivity increases, and as AI improves, in both cases individual greed holds back lifting up the many. And so we end up asking for caution on automating away someone's job, or caution on rapid AI progress.
Does anyone know any good writing on how humanity might fight its way through all these mires of progress to the other side - that science fiction world that may or may not even be possible? How the world might look as these things continue to progress over time? Either fiction or a serious analysis is fine.
Most sci-fi skips straight to "There is no more need for University, we simply ask the AI", missing the "students are using AI to cheat" phase entirely.
The world of Star Trek is one where humanity learns from the devastation of world war three and over a century, creates a society where almost everything is run by computers, money doesn't exist, and people work primarily for their personal satisfaction. But I think true AGI is frowned upon there.
Firm disagree! Example: take a tour of the tower of London and learn about all the nasty medieval tortures we used to inflict on people. Now we don't do that anymore.
Humans learn things collectively via culture and cultural transmission has been an extremely effective tool of knowledge preservation over the generations.
Who is "we"? Torturing people to death in equally horrific ways is still routine practice among radical Islamists and Mexican drug cartels, among others.
This has been the popular narrative in almost all common written history forever. It’s basically the biggest recurring theme in diaries from the Middle Ages, especially among religious text.
Things are never an upward hockey stick but they also aren’t saw waves skirting a baseline.
Yes, but my point us that repeating "humanity learns" can lead us down garden paths into thinking there is some species-level recollection, when history reveals a mixed bag at best.
Completely agree transmission of knowledge is a sticky wicket. Good reminder to stay logistically grounded. I have found many of the ambitious thinkers close to me tend to aspire to a "humanity learns" moment but if they're anywhere near politics they tend to be tempered fairly well by the logistical realities of bringing ideas to pass
For anyone jumping in who hasn't read the article, this is not about hallucinations and some researchers being surprised that GPT-4 doesn't always respond with absolute truth.
Instead it's about how easily it can be used to generate plausible looking datasets that would confirm a hypothesis. It's a warning note to journals about how fake data can more easily be created.
Exactly. It's no different from the fake data sets created by hand in scientific misconduct cases for years, just I guess easier. I guess that's not a good thing, but given even making a fake data set by hand is far easier than generating real data, I'm not sure if this will suddenly make more people fake data.
Having spent significant time implementing machine learning papers before the LLM age, I can promise you over 90% of papers you'll find are full of shit. The claims they make are true in only the most contrived of circumstances and don't hold up under any kind of scrutiny. How exactly they came to these lies (data lies, result lies, omitting lies) is really immaterial. The concept everyone is apparently struggling with is that producing a paper that is entirely lies is not doing the scientific world a disservice: It is not unusual and already happens at scale. Making it even easier might motivate someone to actually figure out a way to ensure papers are reproducible and not full of shit. In essence this is a good thing.
The volume of bogus research is already growing non-linearly. It suggests that there is a market for fake datasets, which will lead to better AI training to fix this problem.
Easier cheating tools allow more people to cheat. There is no particular reason it would lead to less fake data, and with the ability to get fake data in a few sentences rather than a few hours of work is apt to lead more people astray.
Ironically, I tried to ask ChatGPT to generate a signal + some plausibly background for a potential Supersymmetry particle discovery (with some details about model independent searches) and it started hallucinating like someone whose heart broken and spent the night drinking at the bar.
I chose the wrong field to be able to fake data /S.
It's awesome to know that the scientific process that's been in use for centuries, based on peer review and the importance of replication, is still so powerful that AI generated fake data can't do anything to undermine it. Reality doesn't care how the data was faked when you replicate an experiment!
Now, if only scientists and institutions would actually bother using those tools we developed centuries ago. Unfortunately if they don't - you don't exactly need chatgpt to fake data you know? Replication crisis etc etc.
Long story short: Man is it awesome that the scientific method is resilient to this! Too bad nobody uses it.
Ironically one of the major papers that showed that many findings couldn't be replicated couldn't be replicated. Academia is packed to the brim with perverse incentives. It's amazing that scientists are creative and capable enough to keep consistently changing the world and gaining deep, extraordinary, and replicable insights into our universe in a broken system. It would be nice if they didn't need to fight the system though, see the story of the mRNA vaccine nobel and how hard she had to fight against the establishment to do the research that would end up saving a truly staggering number of lives
Perhaps I'm naive, but I think the people that want to fake data were already doing it without tools like chatgpt. Especially since a ton of biological data is normally distributed, so it's exceedingly easy to generate plausible fake results for such data without a system as advanced as chatgpt