Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It’s definitely Good to not put all our eggs in the LLM basket. Even openAI says it’s only a part of the solution - but it has scaled more than most anyone expected. Will that continue forever and we keep getting more impressive emergent behaviors from it? I doubt that but I could certainly be wrong.

It still seems to be missing any sense of what is True, not sure if that’s possible to embed in that model or if we’ll just have some human feedback hacks and eventually get a better model



> Will that continue forever and we keep getting more impressive emergent behaviors from it? I doubt that

I feel like putting the word 'forever' ruins the point. It's the most extreme strawman.

It's amazing how the scaling has unlocked the emergent behaviors! When I look at the scaling graphs, I see that the ability to reduce 'perplexity' is continuing with scale and capital investment with no sign of slowing yet (it will slow eventually). I also see that reducing perplexity is continually unlocking new emergent behaviors. So I would guess that scaling will probably unlock so many more new emergent behaviors before it eventually plateaus!


That debate is becoming irrelevant. Sure it would be great if GTP-6 cost 50k to train instead of 100M. But even if costs 100M it will produce many billions of dollars of value, so the original investment is still relatively small.


Yeah but the difference between those two numbers is one is accessible to individuals (albeit wealthy ones), the other is accessible only to large-ish companies.


IMO, the capabilities have not improved that much between GPT-3 vs GPT-4, the primary difference is just in the size of the context the model can handle at any one time (Max tokens).

While I definitely get "longer" and slightly more "in-depth" answers from GPT-4 vs 3, it already feels like that capability growth curve is starting to plateau.


> IMO, the capabilities have not improved that much between GPT-3 vs GPT-4

I strongly disagree. Anyone who wants to look for themself can see the GPT 4 technical report.

https://arxiv.org/pdf/2303.08774.pdf


Page 9 pretty much stopped me dead in my tracks. There's no shortage of humans, even technically-savvy ones, who wouldn't answer that question correctly.


That is incredibly impressive. Reminds me of these other examples from Microsoft’s “sparks of AGI” paper (summary link): https://ibragimov.org/2023/03/24/sparks-of-agi.html


The distinction may appear subtle in many cases, but in others GPT4 blows 3 away. E.g. recognizing when it doesn't know and admitting it's hypothesising is something I've run into with 4 but not even 3.5.

The capability curve will necessarily appear to plateau when the starting point is as good as it is now. The improvements we recognize will be subtler. Halving the remaining error rate will look less impressive for each step.


It depends how you see good. The capacities of Gpt-3.5/ChatGPT not close to perfect.

ChatGPT is quite good at something like "putting together facts using logic". Things medical diagnosis or legal argument. However, if the activity is "reconciling a summary with details", ChatGPT is pretty reliably terrible. My general recipe is "ask for a summary of a work of fiction, then ask about the relationship of the summary to details you know in the work." The thing reliably spits out falsehood in this situation.


I am yet to see GPT-4 admit to not knowing something rather than just hallucinating made up shit.

If it could reliably tell me when it DOESN'T know something, I'd have a lot more respect for it's capabilities. As it stands today, I'd feel I need to fact check nearly anything it gave me if I'm in an environment that requires high levels of factual accuracy.

Edit: To be clear, I mean it telling me it doesn't know something BEFORE hallucinating something incorrect and being caught out on it by me. It will admit that it lied, AFTER being caught, but it will never (in my experience) state that it doesn't have an answer for something upfront, and will instead default to hallucinating.

Also - even when it does admit to lying, it will often then correct itself with an equally convincing, but often just as untrue "correction" to its original lie. Honestly, anyone who wants to learn how to gaslight people just needs to spend a decent amount of time around GPT-4.


Well, I have. It still certainly goes the way you're describing often as well, to the point I was flabbergasted when it happened, but it did. Specifically I asked it to give me examples of using an NPM module published after the knowledge cutoff.

[To be clear, I did not tell it it was from after the cutoff]




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: