Hacker News new | past | comments | ask | show | jobs | submit login
Can you trust ChatGPT’s package recommendations? (vulcan.io)
29 points by DantesKite 11 months ago | hide | past | favorite | 28 comments



No, that's the only thing to understand about it! You can't trust anything it says about anything. What it does is emit plausible-sounding information. But we've seen -- I think across every single realm of anything -- that this information includes complete nonsense, whether due to it being trained on people just shitposting bullshit, or due to combining tokens (words, sentence fragments, whatever) in new ways without the benefit of any actual understanding of it.

It can be useful, but it's at best as useful as a smart dog that's figured out how to operate a voice synthesizer, and is also on amphetamines or hallucinogens. I'm glad to see this term "LLM hallucination", as that's how it's felt to me.

I find ChatGPT really useful for things that I can double-check instantaneously (or nearly so), such as doc comments for the code I just wrote, or small unit tests that match the comment I just wrote.

But in my experience, it's worse than useless for writing real code, or any complex endeavor that isn't instantly verifiable/rejectable at a glance. Because vetting plausible looking code -- including dependency specification -- is almost always more taxing than just writing the code or package.json entries yourself.


> No, that's the only thing to understand about it! You can't trust anything it says about anything. What it does is emit plausible-sounding information. But we've seen -- I think across every single realm of anything -- that this information includes complete nonsense

And that is in my opinion the best argument why we won't be "replaced" by AI at work. This is consistent among all the tasks, NLP, vision, RL - they are all stumbling very quickly without assistance. How do you go from 0 to 1?

AI will make for a nice assistant, really nice actually, but not a human replacement. Of course in the far future there will eventually come a day when AI might work without hand holding on some tasks, but we have no idea when. 15 years of SDC development attest to it, the closer you get to the goal, the most hidden hurdles you discover.


You’re missing the reality of how automation “replaces” people. It doesn’t do one person’s job 100%, it does the easiest 90% of everyone’s job. You won’t be replaced by an AI, you and 8 other people will be replaced by an AI plus one guy who has good prompt-fu and editing/fact checking skills.


Yeah but not programmer jobs. If you could pay your engineers 2x to be 3x more productive, you would. But you can't.

But if you could pay some automaton 0.1x to make them 2x more productive, you obviously would. 2x is a stretch today but the price of applied-statstics-model add-on is still way way less than 0.1x. So your surely would augment them that way as (I think?) most companies are now doing.

So I think you're generally right but "AI" isn't remotely close to being able to do 90% of a fresh frontend bootcamp grad's job. It's literally more like 1%.


> you and 8 other people will be replaced by an AI plus one guy who has good prompt-fu and editing/fact checking skills.

So the maximum processing speed of the AI will be limited to the reading speed of the prompt guy, brilliant.

If every AI task requires human in the loop, all you get is a small boost. Only when it can handle everything alone it can scale 100x or 1000x.


> Only when it can handle everything alone it can scale 100x or 1000x.

There'll always be a human somewhere at some level of the loop, at least until we make ourselves entirely obsolete.

If AI can handle 99% of the workload, that's a 100x improvement. If it can handle 99.9% of the workload, that's a 1000x improvement.


Absolutely. I mean someday, whenever that is, who knows. But this time around, and the whole LLM architecture behind it... definitely won't replace competent programmers, except at the extremes.

The test I apply in my own mind is this: If it could be done 10x better, but at the same cost, would it be worth it?

There's so much software architecture and implementation of software-intensive systems to be done that to me the answer is clearly yes for those jobs. But the same holds true for a soldier, or a surgeon.

OTOH will applied statistics models replace writers? Like of daily news? Sure. But only if they were already writing undifferentiated crap. Because you're already producing shit, nobody cares about the quality, so yep.

But should humans even be doing those jobs? I'm reminded of a (very surprising!) argument I had with a girlfriend in 2004. I was working on robots and I was explaining to her how robots could be trained to pick up litter and dog shit in industrial parks and university campuses, etc, so no human would ever again have to do those jobs.

She was incensed: What about the old guys who just need an easy job?! What are they gonna do?!

I have to admit I was stumped for an answer. Because in the incompletely-visualized world I was looking forward to, not picking up dog shit every day was an unqualified win... but what? Did everybody have UBI or housing or something? I hadn't really thought it through.

But anyway, the programming jobs this generation of applied statistics LLM automatons are going to take away are the picking up dog shit jobs. Which Copilot already does for me daily, and I really appreciate it and am more productive for it.

For $20/month or whatever it is a great deal. But for even $500 it would be laughable.



No, you can’t trust ChatGPT for anything. Check, check and check again.

Don’t want to be like that lawyer citing made up cases.


It’s ok you can then ask GPT to write up the cases^H^H^H^H^H packages.


And the packages’s dependencies.


But how do you make sure that it provides consistent hallucinations?

Between GPT being consistently patched and such output being prone to statistical error I'm not sure if this vector of attack is efficient in the first place


I really hate the term "hallucinations". By which definition of hallucination is it a hallucination, really?


Well, often when I ask ChatGPT how to code something, it uses some external package that turns out doesn't exist. I don't know what else you would call that. Hallucination seems appropriate enough to me.


Okay, but if you ask me the same thing and I use a library or tell you the name of a library that does not exist (although the name comes close), would you say I am hallucinating? If not, then how could we apply this to AI if we cannot even apply the term used for humans to me?


OK, personally If a friend recommended me a library that didn't exist I would reply back to him "you made that up".

So, ChatGPT is "making stuff up".

To me hallucinating is a really similar term for "making something up". So the term works perfectly well for me.

I mean the definition on Google at least is: "an experience involving the apparent perception of something not present."

Isn't that what's happening? ChatGPT is experiencing (writing code using) the perception of something (the library) that is not present.


Treat its output like you’d treat an anonymous forum post. It can be a good source of pointers or ideas but don’t use it as ground truth for anything ever.


Pop ups are annoying and for me sufficient to close tab and treat website as spammers.

In this case popup spammed me to read article that I was reading before popup appeared.


Does ChatGPT have a hard-wired incentive structure that encourages good-faith answers? No. Can you trust its package recommendations? No.


Every time I ask it (3.5) to produce C code it assumes clang's blocks are present, even if I ask for C99 compliant code... blocks, blocks everywhere.


Sooner or later, this can accidentally slip into important pieces of code.

Attackers could upload malicious modules that don't exist yet due these hallucinations.


Probably not. It is outdated, and sometimes it gets the name of the libraries wrong, too.


No


WTF is a recommendation in the context of a language model? It does not have preferences, just weights influenced by proximity and frequency of tokens.


To be fair, it's realistically not too far off how most humans make "reccomendations" for packages.

Still doesn't mean that it should be trusted, though.


That is indeed the funniest (and slightly horrifying) thing. Even if you understand "glitch tokens"[1] (which is BTW the best way I've seen to explain to nontechnicals (hi, Dad!!) and also some subset of technicals (hi various friends of mine!! +_+...) that LLM applied statistics models aren't anything like sentient AGI)...

...regardless, ChatGPT and its ilk nevertheless do often already outperform "random human being coworker or uncle" at recommending just about anything... which is like Ted Chiang said[2], “That’s more a statement about how much bullshit we are required to generate and deal with in our daily lives.”

Although he was not referring to inputs from your (dumbass) coworkers/uncles, it still holds true.

[1]: https://youtu.be/WO2X3oZEJOA

[2]: https://archive.is/2023.06.04-050658/https://www.ft.com/cont...


You are of course right, but ChatGPT (and most others I presume) do recommend things in conversational English. For instance I asked ChatGPT this just now:

---

This guy on Hacker News just said, "WTF is a recommendation in the context of a language model? It does not have preferences, just weights influenced by proximity and frequency of tokens."

I mean, that is correct you don't have preferences. But you do recommend things based on those models, in conversational English.

Do you have any suggestions for how I can convince him that even though is is correct about how language models work, the utility of these models is largely derived from how they recommend solutions in conversational English?

---

And ChatGPT gave me some recommendations[1]:

---

When discussing language models and their recommendations, it's important to clarify the terminology and concepts involved. While it is true that language models don't have preferences ... they can still provide recommendations based on their training data and patterns they have learned. These recommendations are derived from the statistical analysis of...

To convince the person on Hacker News, you can explain the following points:

Language models learn from vast amounts of text data: Language models are trained on massive blah blah blah plausible-sounding sentences...

Proximity and frequency influence recommendations: Language models assign weights e blah blah blah plausible-sounding sentences... weights influence the model's recommendations by making it more likely to generate familiar or commonly used phrases blah blah blah plausible-sounding sentences...

---

[1]: https://chat.openai.com/share/a7d9e2ca-0104-469e-a6e4-6b08a5...


Recommendations are what I use ChatGPT for. “You are X doing Y, you know A, B, C. What 20 other things should you know. Recommend reading material.” Of course those prompts are usually bit longer.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: