ChatGPT generates fake data set to support scientific hypothesis

sigmar · on Nov 23, 2023

>“It will make it very easy for any researcher or group of researchers to create fake measurements on non-existent patients, fake answers to questionnaires or to generate a large data set on animal experiments.”

Perhaps I'm naive, but I think the people that want to fake data were already doing it without tools like chatgpt. Especially since a ton of biological data is normally distributed, so it's exceedingly easy to generate plausible fake results for such data without a system as advanced as chatgpt

jampekka · on Nov 23, 2023

Cases of fabrication have been caught because the fabrication is done so badly, i.e. inplausibly. It's often just imputing some "random" numbers or repeating samples etc. Many would be amazed how technically and mathematically illiterate scientists often are. And probably ones that fabricate even more so.

Maybe it will increase and/or get a bit higher quality with LLM fakery. But as with many "AI bad" themes, the problem isn't that "AI" can fabricate the data. The problem is fucked up institutions and cultures.

shishy · on Nov 23, 2023

Being able to do it at scale is an issue because of paper mills.

cosmojg · on Nov 23, 2023

The issue is that we consider passing peer review to be enough for something to be taken as truth when it should really be reproducibility/replicability. That, and the incentives currently driving academia are absolutely ridiculous. Paper mills are merely a symptom of academia's ills, not the cause of them.

ipsum2 · on Nov 23, 2023

`np.random.normal(0, 1)`. You too can generate infinite amount of normal distribution data to use for fake datasets, free of AI.

joaogui1 · on Nov 23, 2023

People are probably thinking data more complex than normal distributions (though I'm also not sure if GPT-4 is the best method for that)

peteradio · on Nov 23, 2023

The easiest data to fake is null results because anything else is replicable, hence the importance of replication.

_3u10 · on Nov 23, 2023

null results are also replicable.

samtho · on Nov 23, 2023

A null result, or the absence of evidence, is not evidence of absence. If you fake a null result, you’re not asserting anything other than you could not measure and collect supporting data using your experiment to prove or disprove a hypothesis. It is difficult for someone doing replication to accuse you of ill-intent, as opposed to faked data that proves your point when anyone else can replicate your experiment and get totally different or even contradictory results.

jampekka · on Nov 23, 2023

You still have to give out the statistics that show the null result. E.g. something with a high p-value. You are in fact "confirming the null hypothesis". They aren't any more difficult to replicate than results supporting "the alternative hypothesis".

(The whole binary hypothesis system and culture is a mess though, but that's besides the point.)

true_religion · on Nov 23, 2023

This is true.

However, I think that no one will do the scut work necessary to find that a null result was faked, and even if they do since you the researcher got very little status out of it then it’s believable that you made a mistake, and didn’t falsify data.

jampekka · on Nov 23, 2023

Probably not, especially as null results are almost impossible to publish (which is mad).

peteradio · on Nov 23, 2023

But you won't find null results in a journal, so problem solved.

dclowd9901 · on Nov 23, 2023

So we’re pretending making something an order of magnitude easier makes no difference? Ok.

insanitybit · on Nov 23, 2023

I'm not sure this is much better than the state of the art. Training a model on data and then having it generate new, fake data, is not only easy, it's a standard tool for model boosting.

spookie · on Nov 23, 2023

Poisoning the well for others, huh?

Onawa · on Nov 23, 2023

I wouldn't immediately call creating synthetic data 'poisoning the well' unless it is actually distributed as such. For training models with a minimal amount of quality data, it is a viable method for generating more data to increase the quality of the models. But any legit organization will obviously label synthetic data as such.

passwordoops · on Nov 23, 2023

It's not an order of magnitude easier. It won't make a difference

lacrimacida · on Nov 23, 2023

ChatGPT will unlock new levels of both good and bad. The question is what the ratio between the two is going to be.

egberts1 · on Nov 23, 2023

If you think about it, it will make 99% of ChatGPT answer technically INCORRECT.

It is harder to keep bad data out than it is to keep it filled with good data.

BolexNOLA · on Nov 23, 2023

Reminds me of the “asymmetric bullshit principle”

egberts1 · on Dec 2, 2023

Precisely.

https://en.m.wikipedia.org/wiki/Brandolini%27s_law#:~:text=B....

throwaway14356 · on Nov 23, 2023

which is the best kind of incorrect?

sebosp · on Nov 23, 2023

This is so worrying for me, the amount of digital garbage that can now be generated that is not obviously garbage, that one must read and discern first nonsensical bad text generation and second if factual, is truth becoming a needle in the hay stack? How can this be cleaned?

devmor · on Nov 23, 2023

Are we pretending Faker hasn't been a staple library in software testing for years?

llamaimperative · on Nov 23, 2023

Are we pretending GPT isn’t a leagues better data-faker than Faker?

swatcoder · on Nov 23, 2023

As always, it depends on your requirements.

GPT definitely wins out if you want more novelty/variety in your fake data and are willing to accept extraordinarily higher cost, less rigor, and less reliability. I'm sure there's some occasion when those criteria win out, but Faker's pretty decent most of the time.

llamaimperative · on Nov 23, 2023

Thankfully using GPT doesn’t preclude you from using Faker, so I think we can all agree this is strictly an improvement in one’s ability to fake data.

muzani · on Nov 23, 2023

Sure, but what do you expect? It's like blaming Boeing for genocide and then trying to get rid of 747s.

There's plenty of research into AI safety. There were some damn coups going on over AI safety. The general public defines AI safety as Skynet and homemade bombs, but it's also things like this - political manipulation, astroturfing, fake data, the risk of another industrial revolution.

It's something we should be slamming the brakes on, but most of the people calling out AI safety are also building their own B52 bombers, so nobody takes them seriously either.

80000 Hours has been telling people to get into AI and nuclear policy for years now. Hopefully we have some competent people in govs who do something.

abnry · on Nov 23, 2023

Exactly. How is this news? What would be surprising is if ChatGPT _could_ generate fake data that passed analysis.

marcuskaufmann · on Nov 23, 2023

This could be the key to gain ultimate understanding

civilitty · on Nov 23, 2023

AGI Silicon Valley style: Fake it till you make it, just like everything else.

TeMPOraL · on Nov 23, 2023

I mean, that's how adversarial networks work, isn't it?

zitterbewegung · on Nov 23, 2023

Add bitcoin and a DAO and it becomes completely autonomous. Or lawyers…

spacecadet · on Nov 23, 2023

This.

chrisweekly · on Nov 23, 2023

I think you meant _couldn't_

aaron695 · on Nov 23, 2023

[flagged]

harry8 · on Nov 23, 2023

>You have already gotten the basics of faking data incorrect.

Let's get past the unnecessary antagonism. Can you please explain the above?

Or actually, forget it. You history suggests you don't respond and something like 99/100 of your comments are dead. Maybe this isn't the place for you?

thwarted · on Nov 23, 2023

So it's idiot human vs idiot GPT at this point.

No one believed those who said "make it idiot proof and they'll just make a better idiot", but we should have heeded this warning. They did it! They finally did it! They made a better idiot!

dkjaudyeqooe · on Nov 23, 2023

Roughly speaking, the whole point of an LLM is to create plausible sounding text, without regard to the truth (which it cannot determine or derive), so it's a tool that is perfectly suited to this sort of malfeasance.

SV_BubbleTime · on Nov 23, 2023

I’ve read this over and over, and believe it - but it’s so easy to forget when it absolutely nails something like debugging or suggesting a CMake edit or finding 5 letter combinations that are reversible with a vowel in the middle, or etc.

It’s just still mind blowing I can get a sarcastic summary of an email in the theme of GlaDOS from Portal and in the same screen get an email proofread.

It’s funny how far “what should the next word be” can go.

gardenhedge · on Nov 23, 2023

It's really worth watching this: https://news.ycombinator.com/item?id=38388669

lazystar · on Nov 23, 2023

a billion humans typing on a billion keyboards for a few decades... just needed someone to come along and categorize each of the outputs.

SV_BubbleTime · on Nov 23, 2023

I like that. And the numbers are way higher than a billion.

strogonoff · on Nov 23, 2023

It might seem that there are cases where the ability to generate heaps of plausibly sounding text on demand is helpful, and there are cases where it is pretty much the worst possible capability to have.

carlorolland56 · on Nov 24, 2023

i don't agree. The way that most LLMs currently generate response might be through this modality. But the purpose of LLMs is to get closer to simulating how human minds and languages operate. A lot of models have been trying to overcome the problem of fabrications. GPT bots like SciSpace's ResearchGPT are trying to do precisely that.

voganmother42 · on Nov 23, 2023

"Gordon’s great insight was to design a program which allowed you to specify in advance what decision you wished it to reach, and only then to give it all the facts. The program’s task, which it was able to accomplish with consummate ease, was simply to construct a plausible series of logical-sounding steps to connect the premises with the conclusion."

From the thumping good detective-ghost-horror-who dunnit-time travel-romantic-musical-comedy-epic: Dirk Gently's Holistic Detective Agency

ChainOfFools · on Nov 23, 2023

It's only a matter of time until someone comes up with a GPT that takes whatever off-axis theories a research paper writer wishes to promulgate, and searches the entire corpus of academic literature for references that can be strung together in such a way as to support any argument one likes.

A quack's dream come true, substantiating an argument by backsolving from its feeble or malevolent conclusion to a set of well-known premises but-with-citations. converting untenable speculation into something that passes many superficial tests of legitimacy, which is more than enough to boost it into broader and less critical visibility.

"thick with citations, therefore truthy" is a big blind spot in the casual heuristic used ro gauge the quality of a given piece of research writing, especially at the undergrad level where this tool, lets call it CheatGPT, would be stupendously popular.

demondemidi · on Nov 23, 2023

Have you been watching American politics for the last 20 years? Tea Party to Q/Maga, they will cite any quack economist, quack scientist, or quack doctor that provides a theory that enforces their narrative, data be damned.

throwaway14356 · on Nov 23, 2023

There is in fact a treasure throve of high level research that is to controversial for any expert to go near.

Wild things will happen if screaming hoax from ignorance can no longer shut down constructive efforts.

It will simply combine what is written about germ theory or heavier than air flying machines and produce sensible responses.

The patent db's are full of treasures if you have oh 1000 years? to study it. Maybe 10 000?

It should also be possible to take a seemingly unworkable idea that makes no sense and gather just what is needed to bring it into reality.

For stuff you can build or otherwise test properly it makes no difference what people think is possible.

People think very little is possible, we always did! Everything that can be discovered has been discovered has been the mantra for thousands of years. This while the things people actually accomplish seem to get more and more astonishing.

Staple_Diet · on Nov 23, 2023

I understand what you describe, and it's possible consequences. But I would put forward two arguments; 1. Those who want to believe in bullshit conspiracies will do it regardless of the amount of citations in a research article terribly summarised by a clickbait web page of which they only read the headline. Any who can read more than 10words have been using and abusing google scholar for years to support their nonsense, the ability to find a reference does not equate to the ability to critically appraise the contents. 2. Science is a small world. In each specific field you get to know the big names and institutes, and those are used as a better gauge of the quality of a paper. The peer-review, for all it's pitfalls including its dependence on volunteers, does a good job at stemming a lot of bullshit. I'd proffer it's not the articles but rather the multiple for-profit publishing houses setting up multiple journals through which they funnel pay-to-publish articles that are contributing to the dilution of trust in published science. Again on that note, scientists in each field know which journals to trust and which to double check.

mike_hearn · on Nov 24, 2023

> scientists in each field know which journals to trust and which to double check

I've not seen much evidence of this in my own reading. Maybe in some fields, but certainly not all. During the COVID years I read a lot of epidemiological and public health papers. They all had dozens of references and would be published in well known journals like Nature, BMJ, the Lancet etc. Yet when checked many of the referenced papers would simply not validate. For example, they existed but wouldn't actually support the claim being made. Sometimes they wouldn't even be related, or would actually contradict the claim. Sometimes the claim would appear in the abstract, but the body of the paper would admit it wasn't actually true. That was only one of the many kinds of problems peer reviewed published papers would routinely have.

It became painfully apparent that nobody is actually reading papers in the health world adversarially, despite what we're told about peer review. The "a statement having a citation = it's true" assumption is very much held by many [academic] scientists.

It's a subcomponent of the very strong belief in academia that everyone within it is totally honest all the time. This is how you end up with the Lancet publishing the Surgisphere papers (a paper using an apparently fictional dataset), without anyone within the field noticing anything is wrong. Instead it got noticed by a journalist. It needs some sort of systematic fix because otherwise more and more people will just react to scientific claims by ignoring them.

m3m3tic · on Nov 23, 2023

This is already possible with search engines, there is enough information on the internet that you can substantiate just about any claim regardless of how much evidence there is to the contrary. (see flat-earth, plenty of plausible sounding claims with real, albeit, cherry picked evidence).

ChainOfFools · on Nov 23, 2023

Yes of course, this is already possible with AI writing assistance as well, if you're willing to plug in some of the phrases they come up with into a search engine to figure out where they may have come from. But you still have to do the work of stringing the arguments together into a cohesive structure and figuring out how to find research that may be well outside the domains you're familiar with.

But I'm talking about writing a thesis statement, "eating cat boogers makes you live 10 years longer for Science Reasons" and have it string together a completely passable and formally structured argument along with any necessary data to convince enough people to give your cat booger startup revenue to secure next round, because that seems to be where all these games are headed. The winner is the one who can outrun the truth by hashing together a lighter weight version of it, and though it won't stand up to a collision with real thing, you'll be very far from the explosion by the time it happens.

krageon · on Nov 23, 2023

AI criticism is essentially people claiming that having access to something they don't like will end the world. As you say, we already have a good example of this and while it is mostly bad and getting worse it's not world-ending.

carbocation · on Nov 23, 2023

As an academic researcher, I find analysis of GPT-4 itself — as it pertains to other fields — to be essentially meaningless. There are no guarantees that the version of the model that was used will be available in the future (the API endpoints seem to have a ~12 month future-looking guarantee at most).

Don't get me wrong:

1. GPT-4 is incredibly interesting

2. Studying GPT-4 is interesting for people working in that field

But when I see people writing about how GPT-4 can pass the USMLE (etc), it has no lasting meaning. It might as well be marketing for OpenAI, and to me it has roughly that amount of academic importance.

blackoil · on Nov 23, 2023

It shows current state and progress. No strong reason to believe future models will preform worse.

carbocation · on Nov 23, 2023

Yes, and that's interesting both popularly and to people doing research in developing better models.

But for people who are nominally using this to conduct scientific inquiries in other domains, the specific performance characteristics are what actually matter. When I am writing about the results of my semantic segmentation model, the characteristics of that specific model are more important than the notion that future models will be at least as good.

Hence my critique being pretty narrow (the academic use of GPT-4 for downstream science).

barnabyjones · on Nov 23, 2023

They're testing a consumer product. They're product testers.

photochemsyn · on Nov 23, 2023

If you haven't read the breakdown of the epic Jan Hendrik Schön scandal in which he published about a half-dozen fraudulent papers in Science and Nature based on fabricated results regarding organic (chemically speaking) semiconductor devices cooked up out of thin air, start here. Required reading for any young graduate student IMO. The shame of Bell Labs, Science and Nature, all taken for a ride:

https://en.wikipedia.org/wiki/Plastic_Fantastic

If that fraudster had started out with ChatGPT4, the fraud might have persisted for another decade (because organic semiconductors don't seem to have the capabilities he believed they had), because he was only detected via replicated datasets. If he'd had ChatGPT4 to generate new plausible datasets, well...?

I guarantee you that a significant fraction of the people in academia who 'got there first' on significant discoveries in science did so by fabricating data along the lines of Schön. They just guessed right, and fabricated data, and then more serious careful scientists were able to replicate their bogus work later.

Schön guessed wrong, and every effort to replicate his work failed, and Bell Labs, Science and Nature were left with egg on their face, which they're still trying to wipe off. ChatGPT4 and its shady parents and affiliates will only make this problem worse, not better.

"Benefit to humanity" my ass.

[edit: if you wonder why I sound so salty I read all those Schön papers with interest and fascination when I was a young graduate student myself, now I'm older and seriously jaded.]

simonw · on Nov 23, 2023

"The authors asked GPT-4 ADA to create a data set concerning people with an eye condition called keratoconus"

That had me confused for a moment, since there's no GPT-4 model called Ada (the current embeddings model is called that, and there was a GPT-3 LLM model with that name too).

Then I realized they were using ADA as an acronym for Advanced Data Analysis.

nickpsecurity · on Nov 23, 2023

A long time ago, I ran into the books The Art of Deception and How to Lie With Statistics. They seemed like a good start on training people to spot deceptions. There were articles here on things like p-hacking. Then, the replication crisis.

While no time now, I’m still interested in making a list of resources (esp free) that tells how to construct good studies, has comprehensive presentation of all categories of mistakes/lies we see in them, examples of each, and practice studies with known errors. Anyone here got good books or URL’s that could go in a resource like that? That could train new reviewers quickly?

If I return to AI or ever work in it, I also planned to teach all of that to AI models to automatically review scientific papers. Might contribute to solving the replication crisis. Anyone who’s doing AI now feel free to jump on that. Get a startup or Ph.D. with a tool that tells us which of the rest are fake.

pjot · on Nov 23, 2023

Working in analytics long enough has shown me how often people will p-hack their data to support their desired outcome.

I’ve met many “data driven” teams that quickly turn their nose up at bad data

cosmojg · on Nov 23, 2023

The paper[1] itself reads like marketing material that was itself written by GPT-4. Why was this published? And why is Nature reporting on it? As someone who generates fake/simulated datasets for a living, the scientific value of this paper is totally lost on me.

[1] https://doi.org/10.1001/jamaophthalmol.2023.5162

mmasu · on Nov 23, 2023

In this instance the ability to generate fake data is seen as negative, however in other realms it’s seen as a feature, rather than a bug.

See:

https://en.wikipedia.org/wiki/Synthetic_data

or in scientific literature:

https://arxiv.org/abs/2208.09191

a few companies are doing exclusively this

SrslyJosh · on Nov 23, 2023

Yes, there are situations where it's useful to generate plausible fake data, but those are the exception, not the rule.

yieldcrv · on Nov 23, 2023

its sad that people write entire research papers about how they are prompting it wrong

this stuff is getting old. it doesnt need studies on how an LLM bullshits. nobody needs a study on that, they need an article in a tabloid at best.

LudwigNagasena · on Nov 23, 2023

At least it wasn’t published in Nature. I was kind of scared when I saw the submitted link.

colechristensen · on Nov 23, 2023

I asked ChatGPT to come up with a list of references for a topic I was researching. Every single citation it came up with was hallucinated except for one which turned out to be extremely useful.

sebow · on Nov 23, 2023

The framing makes it sound like it's a "bug" or something. From my understanding it's not, because it's hardly a reliable reasoning tool: whether using statements or using data. Unless we come up with or advance a better architecture, similar "panic porn" is useless, not to mention this reeks of a hit piece. Just verify everything and stop with the blind trust.

dukeofdoom · on Nov 23, 2023

Seems like lying gets a bar wrap... All kinds of great things came to be because of lying about the potential benefits, monorails included.

https://www.youtube.com/watch?v=ZDOI0cq6G

renewiltord · on Nov 23, 2023

People will find such dumb reasons to get outraged. Folks have been photoshopping GFP expression for years. Then some halfwit will post it "there's some evidence that X is Y". Try to replicate it, you can't. Because the science is fake.

insanitybit · on Nov 23, 2023

This is not particularly surprising, nor is it really bad - generating data that appears valid is really powerful. Although it was a bit funny since I happen to have keratoconus, and I don't hear much about it often.

swayvil · on Nov 23, 2023

If the criteria for truth is a convincing argument, and we have a machine for generating convincing arguments to fit anything, and analyzing the argument is too much trouble, then what?

sethbannon · on Nov 23, 2023

This is why for domains like science, tools like Elicit are so needed [1].

They've been doing some interesting work on factored cognition to avoid these sort of hallucinations [2].

1: https://elicit.com/ 2: https://blog.elicit.com/factored-verification-detecting-and-...

golergka · on Nov 23, 2023

> The authors asked GPT-4 ADA to create a data set

> The authors instructed the large language model to fabricate data

What did they think would happen, exactly?

Aloha · on Nov 23, 2023

How long til I can produce a paper that says smoking is good for you, and will extend life or increase quality of life? ;-)

gumballindie · on Nov 23, 2023

There are plenty such papers out there - now obviously disproven. At some point smoking was prescribed as a cure or various heaelty issues. Letting you know so you don't feel impressed by what was "created" when a chat bot generates such a paper.

lakpan · on Nov 23, 2023

Smokers are more likely to be social, so they have a better quality of life. Done.

As for extending life… social creatures live more? therefore smoking extends life.

SrslyJosh · on Nov 23, 2023

How fast can you sign up for an OpenAI account?

stainablesteel · on Nov 23, 2023

i'm more concerned about people who couldn't do this without chatgpt

CodeWriter23 · on Nov 23, 2023

It's just emulating a certain cross-section of human scientists.

nickpsecurity · on Nov 23, 2023

If Arxiv was used in training, some of those scientists were the ones that taught it to do this. Well, a good chunk of whatever they scraped off the Internet, too.

replwoacause · on Nov 23, 2023

Still not good though. Only makes the behavior more prolific.

thomastjeffery · on Nov 23, 2023

I think it would be more clear to say, "ChatGPT assembled a fake data set as a continuation to scientific hypothesis".

ChatGPT does not generate data. It reassembles the data (text) it was given, including the text in its training corpus.

nvrmnd · on Nov 23, 2023

I don't think that's accurate, it generates novel outputs that were not observed in the training data.

thomastjeffery · on Nov 23, 2023

It doesn't generate new tokens.

Train an LLM on text that only uses lowercase, and it will never output an uppercase letter.

nvrmnd · on Nov 23, 2023

So the model is limited to using words and characters that already exist. I agree with you but I don't see why is a limitation worth pointing out.

dacryn · on Nov 23, 2023

you literally have to put in every number for it to do mathematics correctly...

its as stupid as that. some try to get around it by indeed only having the 10 different digits and glue them together, but its a hallucination that that works.

an important point in generalization is for example that you teach it something. This is literally important

'ycombinator is a website' is a prompt that is almost impossible of ycombinator is not in your training set

pixl97 · on Nov 23, 2023

But can it put two tokens together

10 01 = 1001?

hunter2_ · on Nov 23, 2023

Would you say the same about all systems that are considered "generative"?

thomastjeffery · on Nov 23, 2023

Probably.

It's easy to mistake entropy for novelty. Computers don't create: they compute. Calling an LLM "Artificial Intelligence" is a bit like mistaking a pseudorandom number generator for true noise.

hunter2_ · on Nov 23, 2023

I think that by that same logic, you could say human artists, writers, etc. don't create either, they just move existing matter (which typically isn't created nor destroyed) from one place to another. You could also say an electric company doesn't generate electricity but merely converts other energy into it -- one of the most popular uses of the word "generator" yet the atoms/energy are already here; we just arrange/convert and call it generation. I must insist that sorting some things a certain way is widely known as generation/creation.

I can see how other words are a bit more precise, though. Synthesize, perhaps?

thomastjeffery · on Nov 23, 2023

We think objectively about it. We have goals and intentions. We use logic. LLMs don't.

The dirt in the ground sorts impurities from water, but we don't call it intelligent or generative. We call it entropy.

krageon · on Nov 23, 2023

This is a human supremacy argument, essentially claiming that because we have a "soul" or because of something inherent in us that cannot be proven we are better than something else. You are free to believe this, but it is a matter of faith. Not of any sort of reasoning.

hunter2_ · on Nov 23, 2023

Yep. I'm not much of a religious person, but if I were, I might say: not only does AI not create stuff, humans don't either, only the Creator did! I suppose the big bang theory doesn't stray far from this either. Point is, maybe we use this notion metaphorically/imprecisely even for human output, and therefore we might as well extend it to machines.

itissid · on Nov 23, 2023

DataColada people already had a lot of work prior to this.

mcqueenjordan · on Nov 23, 2023

This is some next-level human behavior prediction.

Threeve303 · on Nov 23, 2023

It is becoming more human every day

anigbrowl · on Nov 23, 2023

And people say LLMs can't think like a human.

(this is a joke; not all humans are able to analyze humor as well as an LLM)

Obscurity4340 · on Nov 23, 2023

Confabulation, folks

bedhead · on Nov 23, 2023

Just like people!

ThrowawayTestr · on Nov 23, 2023

Generative text AI generates text. How is this a revelation?

inglor_cz · on Nov 22, 2023

Turing test passed, because that is what quite a lot of human scientists did. Hello, Dr. Ariely, you've met your match.

Sarcasm aside, once such systems really learn to lie, they will be all too human-like. Perhaps the defining quality of real intelligence is deception.

Obscurity4340 · on Nov 23, 2023

Its insane how they think they can have their AGI and eat it too! The closest thing we have to AGI is like a human child even though that's Synthetic (mankind made) General Intelligence and they can't really be truly relied upon and not learn or formulate sentences you don't like

peteradio · on Nov 23, 2023

GENIUS!

Nition · on Nov 23, 2023

Somewhere in the mind's eye of science fiction is a world where we have near-unlimited productivity and knowledge and we're all free to pursue our self-actualized lives as we see fit.

But as productivity increases, and as AI improves, in both cases individual greed holds back lifting up the many. And so we end up asking for caution on automating away someone's job, or caution on rapid AI progress.

Does anyone know any good writing on how humanity might fight its way through all these mires of progress to the other side - that science fiction world that may or may not even be possible? How the world might look as these things continue to progress over time? Either fiction or a serious analysis is fine.

Most sci-fi skips straight to "There is no more need for University, we simply ask the AI", missing the "students are using AI to cheat" phase entirely.

hliyan · on Nov 23, 2023

The world of Star Trek is one where humanity learns from the devastation of world war three and over a century, creates a society where almost everything is run by computers, money doesn't exist, and people work primarily for their personal satisfaction. But I think true AGI is frowned upon there.

smitty1e · on Nov 23, 2023

"Humanity learns" is impossible. The unit of analysis is the individual.

Some state may cross between individuals via education, but the individuals still must learn.

History shows that knowledge transmission remains a sticky wicket.

chongli · on Nov 23, 2023

Firm disagree! Example: take a tour of the tower of London and learn about all the nasty medieval tortures we used to inflict on people. Now we don't do that anymore.

Humans learn things collectively via culture and cultural transmission has been an extremely effective tool of knowledge preservation over the generations.

pixl97 · on Nov 23, 2023

I mean ya in some places in the world they still do torture people like that.

Based on some of the shit my neighbors post online there is but a thin veneer on society that keeps them from doing it now.

nradov · on Nov 23, 2023

Who is "we"? Torturing people to death in equally horrific ways is still routine practice among radical Islamists and Mexican drug cartels, among others.

AndrewKemendo · on Nov 23, 2023

I’ve seen worse in person unfortunately

ClumsyPilot · on Nov 23, 2023

Arguably culture has been degrading somewhat lately

dmix · on Nov 23, 2023

This has been the popular narrative in almost all common written history forever. It’s basically the biggest recurring theme in diaries from the Middle Ages, especially among religious text.

Things are never an upward hockey stick but they also aren’t saw waves skirting a baseline.

mptest · on Nov 23, 2023

humanity learns just mean sufficient individuals learn to take power and enact such a society

smitty1e · on Nov 23, 2023

Yes, but my point us that repeating "humanity learns" can lead us down garden paths into thinking there is some species-level recollection, when history reveals a mixed bag at best.

mptest · on Nov 23, 2023

Completely agree transmission of knowledge is a sticky wicket. Good reminder to stay logistically grounded. I have found many of the ambitious thinkers close to me tend to aspire to a "humanity learns" moment but if they're anywhere near politics they tend to be tempered fairly well by the logistical realities of bringing ideas to pass

IanCal · on Nov 23, 2023

For anyone jumping in who hasn't read the article, this is not about hallucinations and some researchers being surprised that GPT-4 doesn't always respond with absolute truth.

Instead it's about how easily it can be used to generate plausible looking datasets that would confirm a hypothesis. It's a warning note to journals about how fake data can more easily be created.

jhbadger · on Nov 23, 2023

Exactly. It's no different from the fake data sets created by hand in scientific misconduct cases for years, just I guess easier. I guess that's not a good thing, but given even making a fake data set by hand is far easier than generating real data, I'm not sure if this will suddenly make more people fake data.

Tostino · on Nov 23, 2023

The fake data that's been caught so far in just about every one of the cases that I've read about has been ridiculously poorly constructed fake data.

I think the people that are good at faking data simply don't get caught.

krageon · on Nov 23, 2023

Having spent significant time implementing machine learning papers before the LLM age, I can promise you over 90% of papers you'll find are full of shit. The claims they make are true in only the most contrived of circumstances and don't hold up under any kind of scrutiny. How exactly they came to these lies (data lies, result lies, omitting lies) is really immaterial. The concept everyone is apparently struggling with is that producing a paper that is entirely lies is not doing the scientific world a disservice: It is not unusual and already happens at scale. Making it even easier might motivate someone to actually figure out a way to ensure papers are reproducible and not full of shit. In essence this is a good thing.

xbar · on Nov 23, 2023

The volume of bogus research is already growing non-linearly. It suggests that there is a market for fake datasets, which will lead to better AI training to fix this problem.

Attacks only ever get better, not worse.

pixl97 · on Nov 23, 2023

Easier cheating tools allow more people to cheat. There is no particular reason it would lead to less fake data, and with the ability to get fake data in a few sentences rather than a few hours of work is apt to lead more people astray.

Obscurity4340 · on Nov 23, 2023

Confabulation's a better word for this

seba_dos1 · on Nov 22, 2023

In other words: a text generator has generated a text.

elashri · on Nov 23, 2023

Ironically, I tried to ask ChatGPT to generate a signal + some plausibly background for a potential Supersymmetry particle discovery (with some details about model independent searches) and it started hallucinating like someone whose heart broken and spent the night drinking at the bar.

I chose the wrong field to be able to fake data /S.

peteradio · on Nov 23, 2023

If anyone ever tried to talk to me about supersymmetry again I'd probably react the same.

vikramkr · on Nov 23, 2023

It's awesome to know that the scientific process that's been in use for centuries, based on peer review and the importance of replication, is still so powerful that AI generated fake data can't do anything to undermine it. Reality doesn't care how the data was faked when you replicate an experiment!

Now, if only scientists and institutions would actually bother using those tools we developed centuries ago. Unfortunately if they don't - you don't exactly need chatgpt to fake data you know? Replication crisis etc etc.

Long story short: Man is it awesome that the scientific method is resilient to this! Too bad nobody uses it.

ChrisMarshallNY · on Nov 23, 2023

I’d be interested in seeing how AI can help replicate.

Apparently, we have a replication crisis. From what I hear, many papers can’t be replicated, and we don’t know, because no one tries.

vikramkr · on Nov 23, 2023

Ironically one of the major papers that showed that many findings couldn't be replicated couldn't be replicated. Academia is packed to the brim with perverse incentives. It's amazing that scientists are creative and capable enough to keep consistently changing the world and gaining deep, extraordinary, and replicable insights into our universe in a broken system. It would be nice if they didn't need to fight the system though, see the story of the mRNA vaccine nobel and how hard she had to fight against the establishment to do the research that would end up saving a truly staggering number of lives

ctoth · on Nov 23, 2023

Now all it needs to learn is which asses to kiss and it's well on it's way to a PhD!

spacecadet · on Nov 22, 2023

Im working on honeypot data... the challenge continues.

hackan · on Nov 23, 2023

"Unbiased"

> personalised for you

XD The joke writes itself.