Hacker News new | past | comments | ask | show | jobs | submit login

> Everything else remains the same.

For now, which may be as little as two years, given the release cadence between GPT 3 and GPT 4, combined with the increased funding levels.

As several research papers have pointed out, GPT 3 may be just a "stochastic parrot", but GPT 4 has "sparks of AGI". I have no idea what GPT 5 will be capable of, but it will be... more.

Next, many people have already done the obvious thing and wired up LLMs in a loop, where code output is automatically compiled and/or tested, with the errors fed back into the LLM to make it automatically fix up any small mistakes.

This is nothing; it's just sticky-taping something on with external API calls. The real output of LLMs is a probability distribution over the output dictionary, which can be filtered so that instead of naively picking the "most likely output", the LLM will pick the "most likely syntactically valid output". For programming languages, this is relatively simple to add on. For non-programming output generation, this is a much harder problem to solve. In other words, expect to see output quality in SWE fields improve faster than any other field!

Even the existing GPT 4 has a (private-preview) ability of looking back 32K tokens, which is about 15KB of source code. This is significantly more than what can be processed now, and is amenable to further optimisation. For example, the required token budget decreases by 40% simply by converting 4x space characters to 1x tab character. Eliminating the leading white space entirely cuts that in half again.

Tooling is being integrated into ChatGPT and similar LLM-based web apps. Right now, these are just doing simple Google or Bing searches, but as the Wolfram Alpha integration shows, much more powerful integrations are possible. Just wait and see what happens when ChatGPT gets hooked up to a short-term memory scratchpad and a programming playground. And then see what happens when it gets specialised on this automatically, feeding back compiler errors or failed tests as negative reinforcement.

I can foresee a future LLM like GPT 5 replacing the need for junior developers almost entirely. A single senior developer can just give it instructions and have it bang out code at the same rate as 100 junior devs, but with the quality of average developers.

There's an industry metric that a typical developer in a large enterprise produces only 3-10 lines of code per day on average. Maybe we'll have a single senior developer + LLMs be able to bang out hundreds per day?

PS: Imagine even a simple IDE plugin where in the context of the line of code you're writing Chat GPT will go off in the background and read the entire source file, the associated documentation, KB articles, Stack overflow answers, and GitHub issues and make recommendations on the fly as you type!

Picture a little popup following your edit cursor around with little warnings like "this is actually buggy because of Issue #3512, you'll need a retry loop to work around", or "calling that in that way is a security hole mentioned in CVE-2023-####", etc...

PPS: LLMs like GPT are slow at producing output, but ridiculously fast at reading, able to process something like 10K-100K words per second.




> As several research papers have pointed out, GPT 3 may be just a "stochastic parrot", but GPT 4 has "sparks of AGI".

Conflict of interest. Show it to me or other skeptics and if they say the same then maybe we have something going on.

As for the loop LLM now that is actually interesting. I wonder if it leads to them producing the correct code next time around?

> I can foresee a future LLM like GPT 5 replacing the need for junior developers almost entirely.

I think even with a lot of development in the area that's basically the maximum that will be achieved, IMO.

But who knows. I've been following AI news and even read as back as the original AI winter (when people spent hundreds of billions of dollars basically just because LISP existed).

To me it seems that AI development is a self-limiting phenomena, historically. Why is that the case I haven't figured out -- except maybe that the investors get tired of it and funding stops -- but it seems that people get hyped up about what is theoretically possible and not what we can work with right now.

"Put random stuff in, magic happens" -- this seems to be the current belief system relating to AI, at least looking from the side. And it makes me an even bigger skeptic.

Would love to be wrong, I have to say.


Sparks of Artificial General Intelligence: Early experiments with GPT-4: https://arxiv.org/abs/2303.12712

I was like you, I thought GPT 3 was "a neat trick but mostly useless" because the error rate was too high, and the utility was too low due to the limited reasoning ability. I figured it would be a good basis for a next-gen image generator or voice transcriber, but that's about it.

I've been playing around with GPT 4 and now I'm actually paying for it. I'm still learning its strengths and weakness, but I've seen it do things that are simply mind blowing.

The worst output I've seen has been for it to simply echo back the input with no or minimal changes.

Conversely, I've seen it do "creative writing" that I personally struggle with. When writing (fan) fiction I have a bad habit of "telling instead of showing" and using lots of words that end with -ly. This is considered poor style, but I'm not a professional author. I simply asked Chat GPT 4 to fix a paragraph at a time, and presto: my random ramblings suddenly read like a proper novel!


Can you give a few examples of the mind-blowing stuff?


A lot of people are unimpressed with ChatGPT or similar LLMs because they've moved the goalposts into another country. I'm perpetually amazed that any part of it works at all. It's literally a function with a definition of just ```string chat(string tell_me_what_to_do)``` where the almost the entire content of the 'chat' function is just a handful of arrays of numbers. There wasn't an army of programmers programming specifics in there with if/else/do/while statements for different values of "what_to_do". You ask, it does it!

A random challenging-but-appropriate task I threw at it was to fix up and then summarize the text of a badly scanned book. The language is in a ~100 y.o. style, every second or third word has a scanning error, there's gaps, etc...

It not only cleaned it up (in seconds!), it wrote a nice condensed summary, and a valid critique when prompted. E.g.: it picked up that the author admitted to using only official sources from a single government for his information, which may introduce bias. Most people I know would not be able to write an insightful critique like that!

Then, I asked it to replace the proper names and place names from English with the original native spelling, but only for one country mentioned in the text. It did that correctly too. It didn't change any other names.

This isn't like Google Translate where there's a combo box where you can select "French" as the input language and "Japanese" or whatever as the output language. That function is "string translate( language from, language to, string text)". Its internals are complex code that took man-centuries of effort to write, and is fixed function. It does what it does. It can't fix up French placenames in a Japanese text to use French spelling, or replace romaji in French text with Hiragana.

Here's an example of ChatGPT knocking this out of the park. Given the content from this page: https://fr.wikipedia.org/wiki/Tokyo

Prompt: "Replace Japanese place names such as Tokyo in this French text with the Kanji equivalent, as would be written by a japanese native speaker."

Snippet of output: "À l'origine, 東京 (Tokyo) était un petit village de pêcheurs nommé 江戸 (Edo, « l'estuaire »). Fortifié au XVe siècle, 江戸 (Edo) devient la base militaire du shogun Tokugawa Ieyasu à la fin du XVIe siècle, puis la capitale de son gouvernement féodal. Durant l’époque d'江戸 (Edo, 1603-1868), la ville se développe et devient l'une..."

Notice how Tokugawa wasn't replaced? Well, that's because it's not a place name. I asked ChatGPT to replace place names, not other Japanese words.

Having said that, it's definitely not perfect, it makes mistakes, and can take a bit of poking and prodding to do exactly what was asked of it, especially for obscure languages or scenarios. But... that's moving the goalposts.

Ask yourself how long it would take you to write a function that can replace "words of a certain class" in text written in one language with the spelling from another language? Make sure you don't confuse cross-language homonyms, or words that can be a place name or have a different meaning depending on context, etc...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: