Hacker News new | past | comments | ask | show | jobs | submit login

Can anyone comment on the issue of hallucinations? The author only mentions them briefly and I cannot gather how big of a problem this is. Apart from the literal quote the LLM hallucinated, wouldn’t all the other information have to be double-checked as well?



IMO, hallucinations make it basically unusable for things it should be very good at. For example, I have asked two different AIs what the option is for changing the block size with HashBackup (I'm the author). This is clearly documented in many places on the HashBackup site.

The first time, the AI said to use the -b option to the backup program, with examples etc. But there is no -b option, and never has been.

The 2nd time, the AI said to set the blocksize ("or something similar"; WTF good is it to say that?) parameter in hashbackup.conf. But there has never has been a hashbackup.conf file.

From examples I've seen, AI tends to do a passable job spewing a long-winded response where asking several different humans would give similar long-winded responses that contained a lot of judgement and opinions, some of which could be valid or not.


It is documented on your site.

But, before showing your site, Google 'features' chatGPT's hallucination from one of your earlier HN comments[0].

https://i.imgur.com/yxXo3GI.png

[0] https://news.ycombinator.com/item?id=38321168#:~:text=To%20c....


I'll echo this and say that I've run into very similar issues when evaluating local LLMs as the author of a popular-ish .NET package for Shopify's API. They almost always spit out things that look correct but don't actually work, either because they're using incorrect parameters or they've just made up classes/API calls out of whole cloth.

But if I set aside my own hubris and assume that my documentation just sucks (it does) or that the LLM just gets confused with the documentation for Shopify's official JS package, my favorite method for testing LLMs is to ask them something about F#. They fall flat on their faces with this language and will fabricate the most grandiose code you've ever seen if you give them half a chance.

Even ChatGPT using GPT4 gets things wrong here, such as when I asked it about covariant types in F# a couple days ago. It made up an entire spiel about covariance, complete with example code and a "+" operator that could supposedly enable covariance. It was a flat out hallucination as far as I can tell.

https://chat.openai.com/share/6166dd9f-cf67-4d9a-a334-0ba30d...


Yes, this. If the form of a plausible answer is known it is likely to be invented. API method names, fields in structures, options to command lines, plausible inventions that have a known form.

Similarly references of any kind really that have a known form, like case law, literature, science, even URLs.


I’ve had very similar problems asking technical things. I wish it would do like humans do, and say “not sure have you tried an option that might be called Foo?”. a good human tech repi doesn’t always have the precise answer, and knows it. unfortunately LLMs have mostly been trained on text which isn’t as likely to have these kinds of clues that the info might not be as accurate as you’d like.

I’ve found for technical things, I’m happier with the results if I’m using it as clues to getting the right answer, and not looking for an exact string to copy and paste.


There are many recently published or preprint research papers around that, not necessarily that hard to read I think. As a consultant this totally prevents me from making any professional use of LLMs at the moment (edit: aside from actual creative work, but then you may hit the copyright issues). But even without hallucination, using non scholar sources for training is also a problem, Wikipedia is great for common knowledge but become harmful at a certain point where you need nuanced and precise expertise.


The other problem of Wikipedia is it being a target of hostile politically motivated attacks that attempt to rewrite the history. It it will normally self-correct but time to time there are pieces of information that are maliciously incorrect.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: