Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: What do you expect from GPT-5?
10 points by p1esk 10 months ago | hide | past | favorite | 17 comments
Every new generation of GPT models brought a significant improvement: GPT1 demonstrated that unsupervised pretraining of a language model on a large text corpus results in great performance on a variety of specialized tasks; GPT2 which was a simple model scale up and text corpus expansion resulted in a huge output quality improvement (English speaking unicorns story); further scale up in GPT3 resulted in the emergence of unprecedented generalization abilities, and finally GPT4 achieved the understanding of the world and reasoning capabilities that would had probably qualified as AGI just 10 years ago.

In the latest Lex Friedman interview [1], Sam Altman said he expects (hopes for?) a similar level of improvement in GPT5 as the improvement of GPT4 over GPT3. He said GPT5 "just feels smarter" overall.

What do you expect, or hope for, to see in the next big model from OpenAI? What do you think its impact will be on the world?

[1] https://www.youtube.com/watch?v=jvqFAi7vkBc




> and finally GPT4 achieved the understanding of the world and reasoning capabilities that would had probably qualified as AGI just 10 years ago.

I keep hearing this, "if you showed GPT4 to people 10 years ago they would think it was AGI!". What complete rubbish. What exactly do you think has happened over 10 years so that you don't have that response? Nothing really. LLM's were released and you gradually understood how they worked and (hopefully) quickly came to the conclusion that they aren't AGI.

Some people thought GPT4 was AGI when it was released, so of course some people would have had the same initial reaction 10 years ago. But just like today, reasonably intelligent people would quite quickly conclude that there was no AGI... Why on earth would you think it would be any different?

The goal posts for what constitutes AGI have not changed but many, many decades.


>But just like today, reasonably intelligent people would quite quickly conclude that there was no AGI...

So Peter Norvig is not reasonably intelligent? Lol

https://www.noemamag.com/artificial-general-intelligence-is-...


GPT 3.5 blew me away.

Unfortunately GPT-4 and even other coding co-pilots like Phind kinda suck at anything beyond "noob tier" / beginner level questions. In fact it has gotten so bad that I ended up Open AI subscription.

It falls apart at anything with complexity, and I feel that just taking out a sheet of printer paper or going to the whiteboard is always far more effective than trying to get GPT to come up with something that at the bare minimum has no errors.


I also cancelled my gpt-4 subscription. It just wasn't good enough for the price


>and finally GPT4 achieved the understanding of the world and reasoning capabilities that would probably qualify as AGI just 10 years ago.

...not really. GPT4 is just MoE so its good at more things, but its not really a huge upgrade over 3.5, the quality of which is directly a result of much larger parameter scaling.

I expect GPT 5 to be just more multi modal, perhaps with some added capabilities through prompt engineering internally. For example, when asked a question about code, it would execute the code, and ask itself if the answer is what it would expect or are there any issues. Or do things like automated web searching and using the html in the context, and provide answers based on that. And for any of those modes, there is countless optimizations you can do.

The thing about LLMs is that they are basically just the next Google, i.e efficient information lookup. There are 2 main things missing from them being able to reason.

1. Self guided recurrent loops. Rnns are difficult to train correctly, which is why transformer architecture took over, but for reasoning there needs to be some sort of recurrent loop that is self determined by the model to iterate on the correct answer.

2. Information ingestion without need for training loops, which looks like some sort of short term memory. Context windows aren't it. There needs to be some way for an LLM to look at a piece of text and with minimal passes auto configure the parameters to "remember" that piece of text. First is


I think exactly the same.

This is what sounds most reasonable.

I wonder if we are (again) just carrying over some preconceptions about how humans do things, and instead Agi will come from a completely different direction..


1. Self guided recurrent loops. Rnns are difficult to train correctly, which is why transformer architecture took over, but for reasoning there needs to be some sort of recurrent loop that is self determined by the model to iterate on the correct answer.

Can you explain this some more?


Right now, when LLM generates code, its all statistically likely tokens based on training set. As it so happens, when you ask a question about stuff that it doesn't know about (i.e not in the training set), it sometimes hallucinates or gives wrong answers.

A major improvement would be if LLMs had some feedback loop on essentially what to ask itself back to validate answers, much like humans "second guess" themselves to make sure they are correct. For example, you could most likely make it simulate a computer by having a recurrent check of each individual line of code that it produces, asking the question like "If i have input x and this line of code, what would it produce", and then carrying back any errors it finds back to the beginning.

Notionally, this would look like self prompt injection, however that would likely be super slow compared to an RNN that works post tokenization/embedding and at the output of individual transformer heads.


What you wrote makes sense, but I’m not sure I understand what an RNN has to fo with it. For example, what does it mean: “RNN that works post tokenization/embedding and at the output of individual transformer heads.”?


Currently LLMs are forward passes only, for each token. You get a final set of data that is statistical in nature. I.e basically a fancy look up table of sorts.

To get more capability, you have to post process that data. For example, you can get the current LLMs into situations where you ask it a question, and it gives an answer, then you ask a clarifying question, and then it gives you something slightly different, then when you ask it to reconcile the two answers, it will say something like "my apologies...", and gives you the final answer.

You could do some clever prompt engineering to automate this whole process (i.e have scripts that call the LLM with certain prompts based on answers), but thats a manual step that someone has to code. Ideally, this process should exist in LLM, and be learned.

Now, you could just train a bunch of LLMs on different concepts and then train a selector on which LLM to use, and do everything with forward pass, but then you have a bunch of redundant models that are taking up space, without any way for those models to talk to each other.

Thats why figuring out RNNs (where a layer could feed data back into a previous layer) training in the context of LLMs would be pretty much the only way to do this.


i programmed a sensor in my fleshlight to trigger an API call to chatgpt asking for a single risque sentence to be generated (random). the response json payload is then carefully translated via intricate wrapper class to voicify, which i have playing on my edifier bookshelf speakers. its so good i am hoping chatgpt is out of this world in terms of expressions of pleasure


You should open source that.


I'm hoping/expecting a big improvement in long-context understanding. Currently GPT-4 has a "128K Context Window" but it does a very poor job at actually using all the information in that window.

Imagine you could load up that context window with information and it could use every bit of it with the same intelligence as if you only passed it a paragraph.

For example, function calling is good to maybe ten different functions before it starts forgetting them or mixing then up today. Maybe in GPT-5 we can pass in 1,000s of functions and it would be able to find the relevant one every time.

Or maybe we can pass in a huge codebase with line numbers, and it can identify exactly which lines need to be changed or added to implement a feature.


I hope it's at least smart enough to definitively prove Gary Marcus wrong on all his predictions.


I had to mute him on Twitter -- the _least_ impressive public AI "expert"


If I ask to change the style of a long text, I would like ChatGPT to change the style of a bit more than the first three paragraphs.

I hope it will be better in programming too. It does beginner mistakes, that can be fixed with one snarky comment, but that would be better to not do them.


My wish list would include rapid / instantaneous API response speed. I have no idea if this is technically possible.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: