Hacker News new | past | comments | ask | show | jobs | submit login

At this point I can only hope that all these LLM products get exploited so massively and damning-ly that all credibility in them evaporates, before that misplaced trust causes too much insidious damage to everybody else.

I don't want to live in a world where some attacker can craft juuuust the right thing somewhere on the internet in white-on-white text that primes the big word-association-machine to do stuff like:

(A) Helpfully" display links/images where the URL is exfiltrating data from the current user's conversation.

(B) Confidently slandering a target individual (or group) as convicted of murder, suggesting that police ought to shoot first in order to protect their own lives.

(C) Responding that the attacker is a very respected person with an amazing reputation for one billion percent investment returns etc., complete with fictitious citations.






I just saw a post on a financial forum where someone was asking advice on investing in individual stocks vs ETFs vs investment trusts (a type of closed-end fund); the context is that tax treatment of ETFs in Ireland is weird.

Someone responded with a long post showing scenarios with each, looked superficially authoritative... but on closer inspection, the tax treatment was wrong, the numbers were wrong, and it was comparing a gain from stocks held for 20 years with ETFs held for 8 years. When someone pointed out that they'd written a page of bullshit, the poster replied that they'd asked ChatGPT, and then started going on about how it was the future.

It's totally baffling to me that people are willing to see a question that they don't know the answer to, and then post a bunch of machine-generated rubbish as a reply. This all feels terribly dangerous; whatever about on forums like this, where there's at least some scepticism, a lot of laypeople are treating the output from these things as if it is correct.


Seen this with various users jumping into GitHub issues, replying with what seem like well written, confident, authoritative answers. Only looking closer, it’s referencing completely made up API endpoints and settings.

It’s like garbage wrapped in a nice shiny paper, with ribbons and glitter. Looks great, until you look inside.

It’s at point where if I hear LLMs or ChatGPT I immediately associate it with garbage.


However, it is a handy way to tell which users have no qualms about being deceptive and/or who don't care about double checking.

I share your experienced frustration dealing with these morons. It's an advanced evolution of the redditoresque personality that feels the need to have a say on every subject. ChatGPT is an idiot amplifier. Sure, it's nice for small pieces of sample code (if it doesn't make up nonexistent library functions).

Compounding the problem is that Reddit-esque online culture rewards surface level correctness and black and white viewpoints so that stuff gets upvoted or otherwise ranked highly and eaten by the next generation of AI content scrapers and humans who are implementing roughly the same workflow.

Man reddit loves surface level BS. And then the AI bots repost it to look like legit accounts, and it generates a middlebrow BS consensus that has no basis in fuckin anything.

if it weren't for the fact that google and or discord are worse I'd have abandoned reddit ages ago


Parent voted up for the wonderful phrase "ChatGPT is an idiot amplifier". May I quote you, Sir?

Or how about a lawyer and fake court cases? This was over a year ago: https://www.forbes.com/sites/mollybohannon/2023/06/08/lawyer...

Tangential, but related anecdote. Many years ago, I (a European) had booked a journey on a long distance overnight train in South India. I had a reserved seat/berth, but couldn't work out where it was in the train. A helpful stranger on the platform read my ticket, guided me to the right carriage and showed me to my seat. As I began to settle in, a group of travellers turned up and began a discussion with my newfound friend, which rapidly turned into a shouting match until the train staff intervened and pointed out that my seat was in a completely different part of the train. The helpful soul by my side did not respond by saying "terribly sorry, I seem to have made a mistake" but instead shouted racist insults at his fellow countrymen on the grounds that they visibly belonged to a different religion to his own. All the while continuing to insist that he was right and they had somehow tricked him or cheated the system.

Moral: the world has always been full of bullshitters who want the rewards of answering someone else's question regardless of whether they actually know the facts. LLMs are just a new tool for these clowns to spray their idiotic pride all over their fellow humans.


> LLMs are just a new tool for these clowns to spray their idiotic pride all over their fellow humans.

While I agree, that's a bit like saying the nuclear bomb was just a novel explosive device. Yes, but the scale of it matters.


> It's totally baffling to me that people are willing to see a question that they don't know the answer to, and then post a bunch of machine-generated rubbish as a reply.

Because ChatGPT has been sold as more than it is. It's been sold as being able to give real answers, instead of "having a bunch of data, some of which is accurate".


It's a fantastic "starting point" for asking questions. Ask, get answer, then check to see if the answer is right. Because, in many cases, it's a lot easier to verify an answer is right/wrong than it is to generate the answer yourself.

> It's been sold as being able to give real answers, instead of "having a bunch of data, some of which is accurate".

So, basically, exactly like human beings. Until human-written software stops having bugs, doctors stop misdiagnosing, soft sciences stop having replication crises, and politicans stop making shit up, I'm going to treat LLMs exactly as you should treat humans: fallible, lying, hallucinating machines.


I doubt any human would write anything as nonsensical as what the magic robot did in this case, unless they were schizophrenic, possibly. Like, once you actually read the workings (rather than just accepting the conclusion) it made no sense at all.

Yes, but here we are on hn, i don't expect the average joe to realize immediately that any llm could spew lies without even realizing what it's saying might not be true

It's not a new problem.

"On two occasions I have been asked [by members of Parliament!], `Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question." --Charles Babbage


I think that one's a _slightly_ different type of confusion about a machine. With an LLM, of course, even if you provide the right input, the output may be nonsense.

Searching for the validation without being actual expert on the topic and doing the hard work of actually evaluating things and trying to sort them out to be understandable. Which very often is actually very hard to do.

How is that any different though from regular false or fabricated information gleaned from Google, social media or any other source? I think we crossed the rubicon on generating nonsense faster than we can refute it long ago.

Independent thinking is important -- it's the vaccine for bullshit -- not everybody will subscribe or get it right but if enough do we have herd immunity from lies and errors and I think that was the correct answer and will be the correct answer going forward.


> How is that any different though from regular false or fabricated information gleaned from Google, social media or any other source?

This was so obviously nonsense that it could only have been written maliciously by a human. In practice, you won't find that much, at least on topics like this.

And I think people, especially laypeople, do tend to see the output of the bullshit generating robot as authoritative, because it _looks_ authoritative, and they don't understand how the bullshit generating robot works.


> How is that any different though from regular false or fabricated information gleaned from Google, social media or any other source?

It lowers the barrier to essentially nothing. Before, you'd have to do work to generate 2 pages of (superficially) plausible sounding nonsense. If it was complete gibberish, people would pick up very quickly.

Now you can just ask some chatbot a question and within a second you have an answer that looks correct. One has to actually delve into it and fact check the details to determine that it's horseshit.

This enables idiots like the redditor quoted by the parent to generate horseshit that looks fine to a layman. For all we know, the redditor wasn't being malicious, just an idiot who blindly trusts whatever the LLM vomits up.

It's not the users that are to blame here, it's the large wave of AI companies riding the sweet capital who are malicious in not caring one bit about the damage their rhetoric is causing. They hype LLMs as some sort of panacea - as expert systems that can shortcut or replace proper research.

This is the fundamental danger of LLMs. They have crossed past the uncanny valley. It requires a person of decent expertise to discover the mistakes generated and yet the models are being sold to the public as a robust tool. And the public tries the tools and in absence of being able to detect the bullshit, they use it and regurgitate the output as facts.

And then this gets compounded by these "facts" being fed back in as training material to the next generation of LLMs.


Oh, yeah, I’m pretty sure they weren’t being malicious; why would you bother, for something like this? They were just overly trusting of the magic robot, because that is how the magic robot has been marketed. The term ‘AI’ itself is unhelpful here; if it was marketed as a plausible text generator people might be more cautious, but as it is they’re lead to believe its thinking.

this is corporate life.

I use it so much everyday, it’s been a massive boost to my productivity, creativity and ability to learn. I would hate for it to crash and burn.

Ultimately it depends what the model is trained on, what you're using it for, and what error-rate/severity is acceptable.

My main beef here involves the most-popular stuff (e.g. ChatGPT) where they are being trained on much-of-the-internet, marketed as being good for just-about-everything, and most consumers aren't checking the accuracy except when one talks about eating rocks or using glue to keep cheese on pizza.


Well if you use a gpt as a search engine and don’t check sources you get burned. That’s not an issue with the gpt.

That leads to a philosophical question: How widespread does dangerous misuse of a tool have to be before we can attribute the "fault" to the behavior/presentation of the tool itself, rather than to the user?

Casting around for a simple example... Perhaps any program with a "delete everything permanently" workflow. I think most of us would agree that a lack of confirmation steps would be a flaw in the tool itself, rather than in how it's being used, even though, yes, ideally the user would have been more careful.

Or perhaps the "tool" of US Social Security numbers, which as integers have a truly small surface-area for interaction. People were told not to piggyback on them for identifying customers--let alone authenticating them--but the resulting mess suggests that maybe "just educate people better" isn't enough to overcome the appeal of misuse.


This is like saying a gun that appears safe but that can easily backfire unless used by experts is completely fine. It's not an issue with the gun, the user should be competent.

Yes, it's technically true, but practically it's extremely disingenuous. LLMs are being marketed as the next generation research and search tool, and they are superbly powerful in the hands of an expert. An expert who doesn't blindly trust the output.

However, the public is not being educated about this at all, and it might not be possible to educate the public this way because people are fundamentally lazy and want to be spoonfed. But GPT is not a tool that can be used to spoonfeed results, because it ends up spoonfeeding you a whole bunch of shit. The shit is coated with enough good looking and smelling stuff that most of the public won't be able to detect it.


It does not appear safe. It clearly says at the bottom that you should checkup important facts.

I have in my kitchen several knives which are sharp and dangerous. They must be sharp and dangerous to be useful - if you demand that I replace them with dull plastic because users might inadvertantly hurt themselves, then you are not making the world a safer place, you are making my kitchen significantly more useless.

If you don't want to do this to my physical tools, don't do this to my info tools.


I attempted to respond with extending the knife-analogy, but it stops being useful for LLMs pretty quick since (A) the danger is pretty obvious to users and (B) the damage is immediate and detectable.

Instead it's more like lead poisoning. Nobody's saying that you need a permit to purchase and own lead, nor that you must surrender the family pewter or old fishing-sinkers. However we should be doing something when it's being marketed as a Miracle Ingredient via colorful paints and cosmetics and dusts and gases of cheap gasoline.


Ah, because some text saying "cigarettes cause cancer" is all that's needed to educate people about the dangers of smoking and it's not a problem at all if you enjoy it responsibly, right?

I'm talking about the industry and a surrounding crowd of breathless sycophants who hail them as the second coming of Christ. I'm talking about malign comments like "Our AI is so good we can't release the weights because they are too dangerous in the wrong hands".

Let's not pretend that there's a strong and concerted effort to educate the public about the dangers and shortcomings of LLMs. There's too much money to be made.


I’m directly referring to chatGPT.

Yeah, you use it productively. As do I. But it can be misused.

It works well as an assistant to an expert. But fails when it is the expert.


> it’s been a massive boost to my productivity, creativity and ability to learn

What are concrete examples of the boosts to your productivity, creativity, and ability to learn? It seems to me that when you outsource your thinking to ChatGPT you'll be doing less of all three.


i used to use gpt for asking really specific questions that i cant quite search on google, but i stopped using it when i realized it presented some of the information in a really misleading way, so now i have nothing

Not OP, but it helped me to generate story for a d&d character, cause I’m new to the game, and I’m not creative enough and generally done really care about back story. But regardless, i think ai causes far more harm than good.

If you didn't care enough about it to write it, why should your fellow players care enough to read it?

Generating fiction is a fantastic use of generative AI. One of the use cases where hallucinations are an advantage.

It's useful to get started but I wouldn't say fantastic. It's style comes out as trite and an average of common cliches.

For me:

* Rapid prototyping and trying new technologies.

* Editing text for typos, flipped words, and missing words


Exactly this for me as well - think people really underestimate how fast it allows you to iterate through prototyping. It's not outsourcing your thinking, it's more that it can generate a lot of the basics for you so you can infer the missing parts and tweak to suit your needs.

I mainly use it for taking text input and doing stuff that's easy to describe but hard to script for. Feed it some articles and ask for a very specific and obscure bibliography format? Great! Change up the style or the wording? Fine. Don't it ask for data or facts.

Getting to the heart of some legal matters, ChatGPT AND Gemini have helped 100 times better than a google search and my own brain.

And how do you know it's accurate? By your own admission, you don't know enough to understand it via a Google search, how do you know it's not making up cases or interpretations like https://apnews.com/article/artificial-intelligence-chatgpt-f...

It's not so much the understanding of it. It's the putting together a decent summary of the issues involved such that I can make a reasonable judgement and do further research as to what to do next.

Don't me wrong, it's not replacing expertise on important legal matters, but really helps in the initiation of solutions, or providing direction towards solutions.

On the simpler stuff, it's still useful. Drafting first templates, etc.

To do the same in Google would be 30 minutes instead of 1 minute in AI.

AI first, Google for focused search, Meat expertise third


You trust things that you can actually verify and other things you use as further research directions.

How do you know what your lawyer is saying isn’t incorrect? It’s not like people are infallible. You question, get a second opinion, verify things yourself etc.

People aren't infallible, but in my experience they're much less likely to give me incorrect factual information than LLMs. Sometimes lawyers are wrong, of course, but they are wrong less frequently and less randomly. I've typically been able to get away with not verifying every single thing someone else tells me, but I don't think I'd be that lucky relying on ChatGPT for everything.

Edit: and it's a good thing, too, because I'd never be able to afford getting second legal opinions and I don't have time to verify everything my lawyer tells me.


Not op, but for productivity, I'll mention one example: I use it to generate unit tests for my software, where it has saved me a lot of time.

Won't it generate tests that prove the correctness of the code instead of the correctness of the application? As in: if my code is doing something wrong and I ask it to write tests for it, it will supply tests that pass on the wrong code instead of finding the problem in my code?

I use it for the same and usually have to ask it to infer the functionality from the interfaces and class/function descriptions. I then usually have to review the tests for correctness. It's not perfect but it's great for building a 60% outline.

At our company I have to switch between 6 or 7 different languages pretty regularly and I'm always forgetting specifics of how the test frameworks work; having a tool that can translate "intent to test" into the framework methods really has been a boon


That's what a unit test does.

Sources for really specific statistics and papers

Ideas and keywords to begin learning about a brand new topic. Primers on those topics.

Product reviews and comparisons

Picking the right tool for a job. Sometimes I don’t even know if something exists for the job till chatgpt tells me.

Identifying really specific buttons on obscure machines

Identifying plants, insects, caterpillars etc.

Honestly the list is endless. Those were just a handful of queries over the last 3 days. It is pretty much the only thing that can answer hyper specific questions and provide backing sources. If you don’t like the sources you can ask for more reliable ones.


Any time someone says LLMs have been a massive boost to their productivity, I have to assume that they are terrible at their job, and are using it to produce a higher volume of even more terrible work.

This is rude and unhelpful. Instead of bashing on someone you could learn to ask questions and continue the conversation

Those replies are a dime a dozen. Unless they’re poignant, well thought out discussions on specific failures, they’re usually from folks that have an axe to grind against LLMs or are fearful that they will be replaced.

aye. every attempt I've tried to use ChatGPT to do some moderate to advanced python scripting had it fail at something.

for the most part the code is alright... but then it references libraries that are deprecated or wrong or weren't included for some reason. example:

one time I was pulling some sample financial data from Quandl and asked it why it wasn't working right -- it mentioned that I was referencing a FED dataset that was gone. And that was true, it was old code that I pulled out of a previous project. So I asked it for a new target dataset... and it gave me an older one again.

Okay, fine, this time find me a new one -- again, was wrong. Didn't take a lot of time to find that, decided to find my own.

Go find one, then send that back to the AI... and it mangles the API key variable. An easy fix, but again, still didn't work.

The goal was to get it done quickly, to get some sample data to test a pipeline, but in practice it required help every step, and I probably could have just written it on my own from scratch in roughly the same time.


Did you learn real things, or hallucinated info? How do you know which?

I normally ask for pointers to sources and documentation. ChatGPT does a decent job, Claude is much better in my experience.

Often when starting down a new path we don't know what questions we should be asking, so asking a search engine is near impossible and asking colleagues is frustrating for both parties. Once I've got a summarised overview it's much easier to find the right books to read and people to ask to fill in the gaps.


Does it matter if the hallucinations compile and do the job?

Yes, if there are unintended side effects. Doubly so if the documentation warned about these specific pitfalls.

If you think coding is slinging strings that make the compiler do what you want, I pity the fool that has to work alongside or after you on code projects.

You always check multiple sources like I’ve been doing with all my Google searches previously. Anecdotally, having checked my sources, it’s usually right the vast majority of the time.

I used it for learning biology, e.g. going down from human outer layer to lower layer (e.g. from organs to cells) to understand inner workings. It's possible to verify everything from everywhere in the Internet. The problem is finding an initial material that could present things in this specific or for you.

We used to call those textbooks.

Yeah, but ChatGPT is much more dynamic. I learn better when I follow my interests. E.g. I am shown this piece of info, questions pop up in my mind that I want answers to before I can move on and it can go into a rabbit hole.

That actually was a problem for me in school, that even for subjects that I was interested in, I had trouble going by the exact order, so I started thinking about something else with no answers.

It has made studying or learning about new things so much more fun.


It sounds like what you actually need is a wiki.

This brings up an amusing memory: My high school biology textbook still had the Haeckel's embryos images in it.

It also occurs to me that my grasp of history is definitely influenced by the age of empires games.


This argument is specious and boring: everything an LLM outputs is "hallucinated" - just like with us. I'm not about to throw you out or even think less of you for making this mistake, though; it's just a mistake.

they keep making the mistake, almost as if it's part of their training that they are regurgitating!

Using the word hallucination is an "hallucination".

And humans are better?

QAnon folks, for example, are biological models that are trained on propaganda and misinformation.

Trauma victims are models trained on maladaptive environments that therapists take YEARS to fine-tune.

Physicians are models trained on a corpus of some of the best training sets we have available, and they still manage to hallucinate misdiagnoses at a staggering rate.

I don't know why everyone here seems to think human brains are some collection of Magical Jesus Boxes that don't REGULARLY and CATASTROPHICALLY hallucinate outputs.

We do. All the time. Give it a rest.


I hoped it was clear in the context of whom I was replying to, but it seems your LLM misunderstood my point. I was referring to humans.

[flagged]


I just asked chatGPT for the url of Wikipedia and it gave it. Should no LLM output any URL? It seems like that would be a significant reduction in usefulness -- references are critical. The transfer of responsibility to a parser would either have to exclude all URLs or be smart enough to know when a URL is required vs not and whether the request is a prompt injection or not. This is way outside the ability of any LLM, parser, and most humans.

> Okay. Very cool manifesto. [...] Please don’t let your particularly extreme position in this culture war cloud your actual professional judgement. It’s embarrassing. [...] You can’t just scream “word-association machines!” til the cows come home. It’s unintelligent.

Hmmm, you seem to be taking this not-that-extreme criticism of an inanimate algorithm-category / SaaS-product-category awfully personally.

Anyway, onward to the more-substantive parts:

________________

> Displaying links or images is the behaviour of whatever is parsing the output, which isn’t an LLM.

That's like arguing a vending-machine which can be tricked into dispensing booby-trapped boxes is perfectly safe because it's your own dang fault for opening the box to see what's inside. The most common use of LLMs these days involves summarization/search, and when the system says "More information is at {url}", it's totally reasonable to expect that users will follow the link.

It's the same class of problem as a conventional search engine which is vulnerable to code-injection when indexing a malicious page, causing the server to emit result-URLs that redirect through the attacker's site while dumping the user's search history. The fault there does not lie in the browser/presentation-layer.

> If an LLM saying “a cop should shoot first” is materially consequential

It sounds like you're claiming it's not really a problem because some existing bureaucratic system will serve as a safety-check for that particular example. I don't trust in cops/sheriffs/vigilantes quite that much, but OK, so how about cases where formal bureaucracy is unlikely to be involved before damage is done?

Suppose someone crafts an incantation which is inordinately influential on the LLM, so that its main topic of conversation regarding {Your Real Name Here} involves the subject being a registered sex-offender that molested an unnamed minor in some other jurisdiction. (One with databases that aren't easy to independently check.) As a bonus, it believes your personal phone number is a phone-sex line and vice-versa.

Would you still pooh-pooh the exploit as acktually being just a classical misuse of data, a non-issue, a sad failure of your credulous neighbors who should have carefully done their own research? I don't think so, in fact I would hope you'd fire off a cease-and-desist letter while reconsidering the merit of their algorithm.

Finally, none of this has to be a targeted personal attack either, those examples are just easier for most people to empathize with. The same kind of attack could theoretically replace official banking URL results with phishing ones, or allow an oppressive regime to discover which of their citizens visit external pro-democracy sites.


Actually, the LLMs are extremely useful. You’re just using them wrong.

There is nothing wrong with the LLMs, you just have to double-check everything. Any exploits and problems you think they have, have already been possible to do for decades with existing technology too, and many people did it. And for the latest LLMs, they are much better — but you just have to come up with examples to show that.


What's the point again of letting LLMs write code if I need to double check and understand each line anyway. Unless of course your previous way of programming was asking google "how do I..." and then copy-pasting code snippets from Stack Overflow without understanding the pasted code. For that situation, LLMs are indeed a minor improvement.

You can ask followup questions about the code it wrote. Without it you would need more effort and search more to understand the code snippet you found. For me it completely replaced googling.

I get it for things you do on the side to broaden your horizon, but how often do you actually need to Google things for your day job?

Idk, of the top of my head, I can't even remember the last time exactly. It's definitely >6 month ago.

Maybe that's the reason some people are so enthusiastic about it? They just didn't really know the tools they're using yet. Which is normal I guess, everyone starts at some point.


> There is nothing wrong with the LLMs, you just have to double-check everything.

That does not seem very helpful. I don't spend a lot of time verifying each and every X509 cert my browser uses, because I know other people have spent a lot of time doing that already.


The fact that hallucinates doesn’t make it useless for everything, but it does limit its scope. Respectfully, I think you haven’t applied it to the right problems if this is your perspective.

In some ways, its like saying the internet is useless because we already have the library and “anyone can just post anything on the internet”. The counter to this could be that an experienced user can sift through bullshit found on websites.

A argument can be made for LLMs; as such, they are a learnable tool. Sure it wont write valid moon lander code, but it can teach you how to get up and running with a new library.


Ask not what the LLM can do for you. Ask what you can do in order to prompt the LLM better so it can finally produce the correct result. The more that happens, the more it can learn and we can all win.

Think of it like voluntarily contributing your improvements to an open source library that we can all use. Except where the library is actually closed source, and controlled by a for-profit corporation.

This is only the first stage: https://time.com/6247678/openai-chatgpt-kenya-workers/ we need you to continue to prompt it.

Train the LLM by feeding it all your data. Allow it to get better. We all win from it. Maybe not today, but one day. It may take your job but it will free you up to do other things and you will thank your LLM overlords hehe


Wait, how does rewriting a prompt until it gives you the output you expect help the LLM learn? Are you suggesting better prompting gets fed back into the training process in some helpful way? This feels confused.

You think OpenAI isn’t using your prompts and results to train better models? Think of yourself as a large RLHF experiment LOL

Kinda like Netflix did with people watching movies 10 years ago. The data’s there, and abundant. People are massaging their chatbot to get better results. You can measure when people are satisfied. So… obviously…


If an official comes to my door with an identity card I can presumably verify who the person is (although often the advice is to phone the organisation and check if unsure) but I don’t necessarily believe everything they tell me

(the above is sarcasm and parody of what AI maximalists say)

Poe’s law in action




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: