Hacker News new | past | comments | ask | show | jobs | submit login
Tempering Expectations for GPT-3 and OpenAI’s API (minimaxir.com)
267 points by vortex_ape on July 19, 2020 | hide | past | favorite | 189 comments

I am totally confused by people not being impressed with gtp3. If you asked 100 people in 2015 tech industry if these results would be possible in 2020, 95 would say no, not a chance in hell. Nobody saw this coming. And yet nobody cares because it isn’t full blown AGI. That’s not the point. The point is that we are getting unintuitive and unexpected results. And further, the point is that the substrate from which AGI could spring may already exist. We are digging deeper and deeper into “algorithm space” and we keep hitting stuff that we thought was impossible and it’s going to keep happening and it’s going to lead very quickly to things that are too important and dangerous to dismiss. People who say AGI is a hundred years away also said GO was 50 years away and they certainly didn’t predict anything even close to what we are seeing now so why is everyone believing them?

I think people should be impressed, but also recognize the distance from here to AGI. It clearly has some capabilities that are quite surprising, and is also clearly missing something fundamental relative to human understanding.

It is difficult to define AGI, and it is difficult to say what the remaining puzzle piece are, and so it's difficult to predict when it will happen. But I think the responsible thing is to treat near-term AGI as a real possibility, and prepare for it (this is the OpenAI charter we wrote two years ago: https://openai.com/charter/).

I do think what is clear is that we are, in the coming years, going to have very powerful tools that are not AGI but that still change a lot of new things. And that's great--we've been waiting long enough for a new tech platform.

On a core level, why are you trying to create an AGI?

Anyone who has thought seriously about the emergence of AGI equates the chance that AGI causes a human extinction level event ~20%, if not greater.

Various discussion groups I am a part of now see anyone who is developing AGI to be equivalent to developing a stockpile of nuclear warheads in your basement that you're not sure won't immediately shoot off on completion.

As an open question. If one believes that 1. We do not know how to control an AGI 2. AGI has a very credible chance to cause a human level extinction event 3. We do not know what this chance or percentage is 4. We can identify who is actively working to create an AGI

Why should we not immediately arrest people who are working on an "AGI-future" and try them for crimes against humanity? Certainly, In my nuclear warhead example, I would immediately be arrested by the government of the country I am currently living in the moment they discovered this.

The problem is that if the United States doesn't do it, China or other countries will. It's exactly the reason why we can't get behind on such a technology from a political / national perspective.

For what it's worth though, I think you're right that there are a lot of parallels with nuclear warheads and other dangerous technologies.

There needs to be a level of serious discourse that doesn't appear to currently be in the air, around what to do, international treaties, and repercussions.

I have no idea why people aren't treating this with grave importance. The level of development of AI technologies is clearly much ahead of where anyone thought it would be.

With exponential growth rates, acting too early is always seen as an 'overreaction', but waiting too long is sure to be a bad outcome (see, world re: coronavirus).

There seems to be some hope, in that as a world we seemed to have banned human cloning, and that has been around since dolly in the late 90s.

On the other hand, the USA can't seem to come to a consensus that a deadly virus is a problem, as it is killing its own citizens.

You don’t know the distance! And you are conflating distances! Distance between agi behavior and gtp3 behavior has nothing to do with the distance in time between the invention of gtp3 and agi. That’s a deceptive intuition and fuzzy thinking... again my point is that the “behavior distance” between AIM chat bots and gtp3 would, under your scrutiny, lead to a prediction of a much larger “temporal distance” than 10 years. Nit-picking about particular things that this particular model can’t do is completely missing the big picture.

I think there's a divide between "impressive" and "good".

I think deep learning will keep creating more impressive, more "unintuitive and unexpected", more "wow" results. The "wow" will get bigger and bigger. Gpt-3 is more impressive, more "wow"-y than Gpt-2. Gpt-3 very impressively seems to demonstrate understanding of various ideas, Gpt-3 indeed very impressively develops ideas over several sentences. No argument with the "unintuitive and unexpected" part.

The problem is the whole thing doesn't seem definitively good (in Gtp-3's case, doesn't produce good or even OK writing). It's not robust, reliable, trustworthy. The standard example is the self-driving car. They still haven't got those reliable but with more processing power, a company could probably add more bells and whistles to the self-driving process but still without making it safe. And GPT-3 seems in that vein - more "makes sense if you're not paying attention", the same "doesn't really say coherent things".

I'm trying to trace a middle ground between the two reactions. I'm perhaps laughing a little at those just looking at impressive but I acknowledge there's something real there. Indeed, the more you notice something real there, the more you notice something real missing there too.

Thats similar to my thoughts. That demo video of generating html was very impressive, I have never seen anything that can do that, but its also 1000x less useful than squarespace or wordpress. The tool in its current state is totally useless even if it is very impressive.

It's not robust, reliable, trustworthy

Is human writing robust, reliable, trustworthy? Would you agree that some humans produce vastly better writing than others? Have you never read comments here on HN that appeared to be incoherent rambling, logically faulty, or just shallow, trite and cliched?

GPT-1 is a significant improvement over earlier RNN based language models. GPT-2 is a significant improvement over GPT-1. GPT-3 is a significant improvement over GPT-2, especially in terms of "robustness". All these achievements appeared in the course of just 3 years, and we haven't yet reached the ceiling of what these large transformer based models can do. We can reasonably expect that GPT-4 will be a significant improvement over GPT-3 because it will be trained on more and better quality data, it will be bigger, and it might be using better word encoding methods. Aside from that, we haven't even tried finetuning GPT-3, I'd expect it would result in a significant improvement over the generic GPT-3. Not to mention various potential architectural and conceptual improvements, such as an ability to query external knowledge bases (e.g. Wikipedia, or just performing a google search), or an ability to constrain its output based on an elaborate profile (e.g. assuming a specific personality). There are most likely people at OpenAI who are working on GPT-4 right now, and I'm sure Google, Microsoft, Facebook, etc are experimenting with something equally ambitious.

I agree that GPT writing is not "good" if we compare it to high quality human writing. However, it is qualitatively getting better and better with each iteration. At some point, as soon as a couple years from now, it will become consistent and coherent enough to be interesting and/or useful to regular people. Just like self-driving cars in a couple of years might reach the point where the risk of dying is higher when you drive than when AI drives you.

From the POV of an AI practitioner, there is one and only one reason I remain unimpressed with GPT3.

It is nothing more than one big transformer. At a technical level, it does nothing impressive, apart from throw money at a problem.

So in that sense, having already been impressed at Transformers and then ELMO/BERT/GPT-1 (for making massive pretraining popular). There is nothing in GPT3 that is particularly impressive outside of Transformers and massive pre-training, both of which are well known in the community.

So, yeah, I am very impressed by how well transformers scale. But, idk if I'd give OpenAI any credit for that.

The novelty of GPT3 is its few shot learning capabilities. GPT3 shows a new, previously-unknown, and, most importantly, extremely useful property of very large transformers trained on text -- that they can learn to do new things quickly. There isn't any ML researcher on record who predicted it.

> There isn't any ML researcher on record who predicted it.

That's just absurd - this was an obvious end-result for LM. NLP researchers knew that something like this was absolutely possible, my professor predicted it like 3 years ago.

Yes, the emergent ability to understand commands mixed in with examples is pretty crazy.

"People who say AGI is a hundred years away also said GO was 50 years away" this is not true. The major skeptics never said this. The point skeptics were making was that benchmarks for chess (IBM), Jeopardy!(IBM), GO (Google), Dota 2 (OpenAI) and all the rest are poor benchmarks for AI. IBM Watson beat the best human at Jeopardy! a decade ago, yet NLP is trash, and Watson failed to provide commercial value (probably because it sucks). I'm unimpressed by GLT-3, to me nothing fundamentally new was accomplished, they just brute forced on a bigger computer. I expect this go to the same way as IBM Watson.

One expert predicted in mid-2014 [1] that a world-class Go AI was 10 years away. AlphaGo defeated Lee Sedol 18 months later.

It's not 50 years, but it does illustrate just how fraught these predictions can be and how quickly the state of the art can advance beyond even an insider's well-calibrated expectations.

(To his credit the expert here immediately followed up his prediction with, "But I do not like to make predictions.")

[1] https://www.wired.com/2014/05/the-world-of-computer-go/

People also predicted 2000 would have flying cars. The moral of the story is future prediction is very difficult and often inaccurate for things we are not close to achieving. Not that they always come sooner than predicted.

We have flying cars. What we don't have is a flying car that is ready for mass adoption. The biggest problem is high cost both for the car and its energy requirements, followed by safety and the huge air traffic control problem they would create.

As a counterpoint I felt like when alphago came out I was surprised it took so long, because go really seems like a good use case for machine learning supremacy because 1) the go board looks particularly amenable to convey analysis and 2) it's abstract enough for humans to have missed critical strategies, even after centuries.

I wish I were on record on that, so take what I say with a grain of salt

Ultimately the greatest factor is stereotypes about inventors. The OpenAI team doesn’t remind anyone of say the Manhattan Project team in any way. They don’t look act or sound like Steve Jobs and Steve Wozniak. Elon Musk does, and that’s why I think people get so excited about rockets that land themselves. That is honestly pretty cool. Very few people pull stuff like that off. But is it less cool than GPT3?

Sam Altman and Greg Brockman were also online payments entrepreneurs like Elon Musk so it’s not like it was about their background / prior history. It’s also not about sounding too grandiose or delusional, Musk says way crazier stuff in his Twitter than Greg Brockman has ever said in his life. It’s clearly not about tempering expectations. Musk promises self driving cars every year!

So I think there are a lot of factors that impact the public consciousness about how cool or groundbreaking a discovery is. Personally I think the core problem is the contrivance of it all, that the OpenAI people think so much about what they say and do and Elon does not at all, and that kind of measured, Machiavellian strategizing is incommensurable with public demand for celebrity.

What about objective science? There was this striking Google Research paper on quantum computing that put the guy who made “some pipes” first author. I sort of understand abstractly why that’s so important but it’s hard for me to express to you precisely how big of a discovery that is. Craig Gentry comes to mind also as someone who really invented some new math and got some top accolades from the academy for it. There is some stereotyping at play here that may favor the OpenAI team after all - they certainly LOOK more like Craig Gentry or pipes guy than Elon Musk does. That’s a good thing so I guess in the pursuit of actually advancing human knowledge it doesn’t really matter what a bunch of sesame grinders on Hacker News, Twitter and Wired think.

What would be a good benchmark? In particular, is there an accomplishment that would be: (i) impressive, and clearly a major leap beyond what we have now in a way that GPT-3 isn't, but (ii) not yet full-blown AGI?

How about driving a car without killing people in ways a human driver would never kill people (i.e. mistaking a sideway semi truck for open sky)?

That's a valuable benchmark loads of companies are aiming for, but it's not a full AGI.

Maybe nothing? “Search engines through training data” are already the state of the art, and have well documented and mocked failure cases.

Unless someone comes along with a more clever mechanism to pretend it’s learning like humans, you’re not looking at a path towards AGI in my opinion.

> you’re not looking at a path towards AGI in my opinion

What I'm trying (and apparently failing?) to ask is, what would a step on the path towards AGI look like? What could an AI accomplish that would make you say "GPT-3 and such were merely search engines through training data, but this is clearly a step in the right direction"?

> What I'm trying (and apparently failing?) to ask is, what would a step on the path towards AGI look like?

That's an honest and great question. My personal answer would be to have a program do something it was never trained to do and could never exist in the corpus. And then have it do another thing it was never trained to do, and so on.

If GPT-3 could say 1) never receive any more input data or training, and then 2) read an instruction manual for a novel game that shows up a few years from now (so it can't be replicated from the corpus), and 3) plays that game, and 4) improves at that game, that would be "general" imo. It would mean there's something fundamental with its understanding of knowledge, because it can do new things that would have been impossible for it to mimic.

The more things such a model could do, even crummily, would go towards it being a "general" intelligence. If it could get better at games, trade stocks and make money, fly a drone, etc. in a mediocre way, that would be far more impressive to me than a program that could do any of those things individually well.

If a program can do what you described, would it be considered a human-level AI yet? Or would there be some other missing capabilities still? This is an honest question.

I intentionally don’t use the term AGI here because human intelligence may not be that general.

> human intelligence may not be that general

Humans have more of an ability to generalize (ie learn and then apply abstractions) than anything else we have available to compare to.

> would it be considered a human-level AI yet

Not necessarily human level, but certainly general.

Dogs don't appear to attain a human level of intelligence but they do seem to be capable of rudimentary reasoning about specific topics. Primates are able to learn a limited subset of sign language; they also seem to be capable of basic political maneuvering. Orca whales exhibit complex cultural behaviors and employ highly coordinated teamwork when hunting.

None of those examples appear (to me at least) to be anywhere near human level, but they all (to me) appear to exhibit at least some ability to generalize.

From grandparent post:

> 2) read an instruction manual for a novel game that shows up a few years from now (so it can't be replicated from the corpus), and 3) plays that game, and 4) improves at that game, that would be "general" imo.

I would say that learning a new simple language, basic political maneuvering, and coordinated teamwork might be required to play games well in general, if we don't exclude any particular genre of games.

Complex cultural behaviors might not be required to play most games, however.

I think human intelligence is actually not very 'general' because most humans have trouble learning & understanding certain things well. Examples include general relativity and quantum mechanics and, some may argue, even college-level "elementary mathematics".

Give it an algebra book and ask to solve the exercises at the end of the chapter. If it has no idea how to solve a particular task, it should say “give me a hand!” and be able to understand a hint. How does that sound?

That makes me think we are closer rather than farther away because all that would be needed is for this model to recognize the problem space in a question:

“Oh, you are asking a math question, you know a human doesn’t calculate math in their language processing sections of their brain right, neither do I... here is your answer”

If we allowed the response to delegate commands, it could start to achieve some crazy stuff.

> probably because it sucks

it's not technically bad, but it requires domain experts to feed it domain relevant data and it's as good as this setup phase is, and this setup phase is extremely long, expensive and convoluted. so yeah it sucks, but as a product.

Whenever someone talks about how AI isn't advancing, I think of this XKCD comic from not too long ago (maybe 2014-ish?), in which "check whether a photo is of a bird" was classified as "virtually impossible".


Read the alt-text. Photo recognition wasn't impossible in 2014, it was impossible in the 1960s and the 2014-era author was marvelling at how far we'd come / making a joke of how some seemingly-simple problems are hard.

First, I remember the demos for GTP-2. Later, when it was available and I could try it myself, I was kind of disappointed in comparison.

Second, while impressive we are also finding out at the same time just how much more is needed to make something of value. It‘s like speech recognition in 1995. Mostly there, but in the end it took another 20 years to actually work.

But still, it‘s exciting.

I am really impressed with it as a natural language engine and query system. I am not convinced it "understands" anything or could perform actual intellectual work, but that doesn't diminish it as what it is.

I'm also really worried about it. When I think of what it will likely be used for I think of spam, automated propaganda on social media, mass manipulation, and other unsavory things. It's like the textual equivalent of deep fakes. It's no longer possible to know if someone online is even human.

I am thinking "AI assisted demagoguery" and "con artistry at scale."

> And yet nobody cares because it isn’t full blown AGI. That’s not the point. The point is that we are getting unintuitive and unexpected results.

I don't think these are unintuitive or unexpected results. They seem exactly like what you'd get when you throw huge amounts of compute power at model generation and memorize gigantic amounts of stuff that humans have already come up with.

A very basic Markov model can come up with content that seem surprisingly like a human would say. If anything, what all of the OpenAI hype should confirm is just how predictable and regular human language is.

> They seem exactly like what you'd get when you throw huge amounts of compute power

I disagree with that.

The one/few shot ability of the model is much much better than what I would have imagined, and I know very few people in the field that saw GPT-3 and were like "yep, exactly what I thought".

> A very basic Markov model can come up with content that seem surprisingly like a human would say.

This is false. Natural language involves long-term dependencies that are beyond the ability of any Markov model to handle. GPT-2 and -3 can reproduce those dependencies reliably.

> If anything, what all of the OpenAI hype should confirm is just how predictable and regular human language is.

Linguists have been trying to write down formal grammars for natural languages since the 1950s. Some of the brightest people around have essentially devoted their lives to this task. And yet no one has ever produced a complete grammar of any human language. So no, human language is not predictable and regular, at least not in any way that we know how to describe formally.

W.r.t. the Markov model, I just mean that something even that trivial can sound lifelike. It's not surprising that throwing billions of times more data at the problem with more structure can make the parroting better.

> So no, human language is not predictable and regular, at least not in any way that we know how to describe formally.

I don't know what to say about this other than perhaps the NLP community has been a little too "academic" here and I disagree.

Grade schoolers routinely are forced to make those boring diagrams for their particular language, and that has tremendous structure. When you add that structure (function) with the data of billions of real-world people talking, it's not surprising that the curve fit looks like the real thing. Given how powerful things like word2vec have been that do very, very simple things like distance diffs between words, it's not surprising to me that the state of the art is doing this.

It is surprising! You could throw all the data of the entire human race at a Markov model and it would not sound a tenth as good as even GPT-2. Transformers are simply in a new class.

Were you alive in 2010?

Right...but at the end of the day that's what intelligence is. You are just an interconnected model of billions of neurons that has been trained on millions of facts created by other humans. Except for this model can vastly exceed the amount of factual knowledge that you could possibly absorb over your entire lifetime.

> You are just an interconnected model of billions of neurons that has been trained on millions of facts created by other humans.

...but I didn't pop out of the womb that way, and as you said, over my lifetime I will read less than 1 millionth of the data that GPT-3 was trained on. GPT-2 had a better compression ratio than GPT-3, and I'm sure a GPT-4 will have a worse compression ratio than GPT-3 on the road we're on.

Rote memorization is hardly what I'd call intelligence. But that's what we're doing. If these things were becoming more intelligent over time, they'd need less training data per unit insight. This isn't a dismissal of the impressiveness of the algorithms, and I'm not suggesting the classic AI effect "changing the goalposts over time." I fundamentally believe we're kicking a goal in our own team's net. This is backwards.

Exactly. Even gpt3 is not creating new content. It is just permuting ecisting content while retaining some level of coherence. I don't reason by repeating various tidbits I've read in books in random permutations. I reason by thinking abstractly and logically, with a creative insight here and there. Nothing at all like a Markov model trained on a massive corpus. Gpt3 may give the appearance of intelligent thought, but appearance is not reality.

> I don't reason by repeating various tidbits I've read in books in random permutations.

Are you sure?

Yes, I would fail any sort of math exam if I used the GPT-3 model.

GPT-3 is nothing like a Markov model.

Same sort of generative probabilistic model idea.

All creative work is derivative.

Not all derivative work is creative.

I can't help but feel what gpt is really teaching us about is language not AI.

IMO, language is one of the purest forms of thinking / consciousness. What is our brain doing that makes it different?

This brings to mind the debates between Frank Ramsey and Ludwig Wittgenstein.

Episode: https://philosophybites.libsyn.com/cheryl-misak-on-frank-ram... Media: https://traffic.libsyn.com/secure/philosophybites/Cheryl_Mis...

The problem is that not only is this "not full blown AGI". The problem is that, if you understand how this works, it's not "intelligence" at all (using the layperson meaning of the word, not the marketing term), and it's not even on the way to get us there.

It reminds me of that pithy remark by someone I read a while ago which was (paraphrased): "Any time someone pushes forward AI as a field, people will almost alway remark: 'but that's not real AI.'"

It's true, the mundanity quickly settles in, and we look to the next 'impossible hurdle' and disregard the fact that only a few years ago, natural language generation like this was impossible.

> "Any time someone pushes forward AI as a field, people will almost alway remark: 'but that's not real AI.'"

This statement reveals a widespread, and in my opinion, a not-entirely-correct, assumption that increases in the ML field means we're actually pushing forward on AI. It also implies a belief that the pre-1970s people were somehow less right than the 2000s+ ML crowd, when a lot of ML's success is related to compute power that simply did not exist in the 1970s.

ML computational machines to transform inputs->outputs are great, but there's no compelling reason to believe they're intrinsic to intelligence, as opposed to functioning more like an organ.

We might be making great image classifier "eyes", or spam-filtering "noses", or music-generating "ears". But it's not clear to me that will incrementally get us closer to an intelligent "brain", even if all those tools are necessary to feed into one.

I disagree. Yes, it is just a decoder of transformer. But it looks like we are really close, with some tweaks on the network structure, reward function design and inputs / outputs. On the same time, GPT-3 also points how far away we are at hardware level.

Let me put it this way: I don't know how challenging the rest is going to be, but it surely looks like we are on the right path finally.

It fundamentally has no _reasoning_. There is no AGI without reasoning.

What makes you think this? The fact that it can produce working code from a prompt in some cases shows rudimentary non-trivial reasoning. Hell, GPT-2 demonstrated rudimentary reasoning of the trivial sort.

> The fact that it can produce working code from a prompt in some cases shows rudimentary non-trivial reasoning.

It doesn't at all. It indicates that it read stackoverflow at some point, and that on a particular user run, it replayed that encoded knowledge. (I'd also argue it shows the banality of most React tutorials, but that's perhaps a separate issue.)

Quite a lot of these impressive achievements boil down to: "Isn't it super cool that people are smart and put things on the internet that can be found later?!"

I don't want to trivialize this stuff because the people who made it are smarter than I will ever be and worked very hard. That said, I think it's valid for mere mortals like myself to question whether or not this OpenAI search engine is really an advancement. It also grates on me a bit when everybody who has a criticism of the field is treated like a know-nothing Luddite. The first AI winter was caused by disillusionment with industry claims vs reality of what could be accomplished. 2020 is looking very similar to me personally. We've thrown oodles of cash and billions of times more hardware at this than we did the first time around, and the most use we've gotten out of "AI" is really ML: classifiers. They're super useful little machines, but they're sensors when you get right down to it. AI reality should match its hype, or it should have less hype (e.g. not implying GPT-3 understands how to write general software).

>It doesn't at all.

Assertions aren't particularly useful in this discussion. Nothing you said supports your claim that GPT-3 doesn't show any capacity for reasoning. The fact that GPT-3 can create working strings of source code from prompts it (presumably) hasn't seen before means it can compose individual programming elements into a coherent whole. If it looks like a duck and quacks like a duck, then it just might be a duck.

Here's an example of rudimentary reasoning I saw from GPT-2 in the context of some company that fine-tuned GPT-2 for code completion (made up example but captures the gist of the response):

[if (variable == true) { print("this sentence is true") } else] { print("this sentence is false") }

Here's an example I tested using talktotransformer.com: [If cars go "vroom" and my Ford is a car then my Ford] will also go "vroom"...

The bracketed parts where the prompt. If this isn't an example of rudimentary reasoning then I don't know what is. If your response is that this is just statistics then you'll have to explain how the workings of human brains aren't ultimately "just statistics" at some level of description.

> working strings of source code from prompts it (presumably) hasn't seen before

I'm saying that "presumably" is wrong, especially on what it was: a simple React program. It would not surprise me if the amount of shared structure and text in the corpus is all over the place.

This can be tested by making more and more sophisticated programs in different languages, and seeing how often it returns the correct result. I don't really care, because it can't reliably do basic arithmetic if the numbers are in different ranges. This is dead giveaway it hasn't learned a fundamental structure. If it hasn't learned that, it hasn't learned programming.

The examples are not really that impressive either. They are boolean logic. That a model like this can do copy-pasta + encode simple boolean logic and if-else is... well.. underwhelming. Stuff like that has been happening for a long time with these models, and no one has made claims that the models were "reasoning".

The react programming example isn't the only example of GPT-3 writing code. There was an example of it writing python programs going around before they opened up the API. It was impressive. There was no reason to think it had seen the exact examples before.

Also it isn't the case that one needs to be perfect at algorithmic thinking to be capable of some amount of reasoning. I don't claim that GPT-3 is perfect, but that its not just copy and pasting pieces of text it has seen before. It is coming up with new sequences of text based on the structure of the surrounding text and the prompt, in a manner that indicates it has a representation (albeit imperfect) of the semantic properties of the text and can compose them in meaningful ways. Handwavy responses do nothing to undermine the apparent novelty it creates.

>encode simple boolean logic and if-else is... well.. underwhelming. Stuff like that has been happening for a long time with these models, and no one has made claims that the models were "reasoning".

Seems like you're just moving the goalposts as always happens when it comes to AI advances. What do you take to be "reasoning" if that isn't an example of it?

Couldn’t you yourself learn how to do that, in a foreign language, without knowing what the words mean?

Logic is sensitive to the meaning of words to a degree and so if I can pick out the context to apply certain deductive rules, then I know what the relevant words mean, at least to the degree that they indicate logical structure.

It's possible that a program could learn when to apply certain rules based on its own if-then statements and bypass understanding, but that's not the architecture of GPT-3. If it learns the statistical/structural properties of a string of text such that it can apply the correct logical transformations based on context, the default assumption should be that it has some rudimentary understanding of the logical structure.

> The fact that it can produce working code from a prompt in some cases shows rudimentary non-trivial reasoning.

No, “in some cases” doesn't show reasoning. It is, arguably, weak evidence for reasoning that supports other explanations. With the right input corpus, a Markov chain generator will produce working code from a prompt “in some cases”, and I don't think any one has a weak enough definition of reasoning to admit Markov chains.

Of course we need to quantify "in some cases" for your argument to hold. Humans aren't perfect reasoners, for example. The examples I saw were impressive and were mostly correct, apart from some minor syntax errors or edge cases. This wasn't a Markov chain generator where the "interesting" responses where cherry picked from a pile of nonsense.

So under your logic we won’t have any idea that we are close to having agi until we have a machine that can reason... which is agi. You are missing the big picture

There's clearly no planning for a solution, I think that's what GP is getting at.

You don’t understand how it works. You can’t explain how the model works. Go ahead and correct me if I’m wrong.

Impressive to a human is a highly subjective property. Humans generally consider the understanding of language to be an intelligent trait, yet tend to take basic vision which took much longer evolutionarily to develop for granted. Neural networks can approximate arbitrary functions, and the ability to efficiently optimize neural network parameters over high dimensional non-convex landscapes has been well established for years. What typically limits pushing the state of the art is the availability of "labeled" data and the finances required for very large scale trainings. With NLP, there are huge datasets available which are effectively in the form of supervised data, since humans have 1) invented a meaningful and descriptive language and 2) generated hundreds of trillions of words in the form of coherent sentences and storylines. The task of predicting a missing word is a well-defined supervised task for which then there is effectively infinite "labeled" data. Couple these facts with a large amount of compute credits and the right architecture and you get GPT3. The results are really cool but in my opinion scientifically unsurprising. GPT3 is effectively an example of just how far we can currently push supervised deep learning, and even if we could get truly human level language understanding asymptotically with this method, it may not get us much closer to AGI, if only because not every application will have this much data available, certainly not in a neatly packaged supervised representation like language (such as computer vision). While approaches like GPT3 will continue to improve the state of the art and teach us new things by essentially treating NLP or other problems as an "overdetermined" system of equations, these approaches are subject to diminishing returns and the path to AGI may well require cracking that human ability to create and learn with a vastly better sample complexity, effectively operating in a completely different "under-sampled" regime.

I mean I find the fact that a human can actually build and work with a tool that it can't actually understand?

Even now, you could, if you wanted to, rip apart your computer even to the CPU level and understand how it works. Even analyzing the code. Sure, it might take you ten years.

But you would NEVER be able to understand how GPT3 works... it's just too complex.

I’m no expert, but tools such as SHAP and DeepLift can give you insight into what activates a network. It’s probably not possible to inspect a network with billions of parameters, however it’s to be expected since I don’t think that explainable ML is an established field yet.

But also think about it from another angle: it doesn’t seem too hard to explain why people say what they say. We can usually get into the shoes if the other person if we try hard enough. However, if we say there’s no way for us to explain GPT-3, it just shows how fundamentally different it is from human mind.

Agreed. Even if we put research into deconstructing and attempting to understand how deep neural networks work in tasks such as autonomous driving, the fact is that these tasks are too complex to even logically describe.

That said, I do think it is possible to come up with robust guarantees to these methods.

Really? I bet in a few years we'll have tools that can inspect a model and tell you exactly what parts do what function and how they do it.

> People who say AGI is a hundred years away also said GO was 50 years away and they certainly didn’t predict anything even close to what we are seeing now so why is everyone believing them?

Do you why AlphaGo decided to perform move 37 in Game 2 with Lee Sedol? Can AlphaGo explain itself as to why it did that move?

If we don't know why it made that decision, then it is a mysterious black-box hiding it's decisions and taking in an input to produce and output, which is still a problem. This isn't useful to researchers in understanding decisions of these AI systems, especially for AGI. Hence, this problem also applies to GPT-3.

While it is still an advancement in NLP, I'm more interested in getting a super accurate or generative AI system to explain itself than one that cannot.

>While it is still an advancement in NLP, I'm more interested in getting a super accurate or generative AI system to explain itself than one that cannot

Why? People can explain ourselves because we rationalize our actions, not necessarily because we know why we did something. I don't understand why we hold AI to such a high standard.

there seems to be an overabundance of negative sentiment towards deep learning among hn commentators, but whenever i hear the reasons behind the pessimism i'm usually unimpressed.

for the same reason why we have psychiatrist, for when the AI does a mistake, you need to fix it, work around it, prevent it or if all else fail to protect others from it.

it's all fun and games when AI do trivia. when AI get plugged into places that can result in tangible real world consequences (i.e. airport screening) you need to be able to reason about the system so it gets monotonically better over time.

gp3 is an impressive technical feat and the pinnacle of the current line of research

however, if you remove the technical colored glasses and boil down what it is and what it does, it's a regurgitation of existing data that it had been fed, it has no understanding of the data itself beyond linguistic patterns.

it's not going to find correlations where there were none, it's not going to actually discover new data, it will find unexpected correlations between data but there's zero indication whether these correlation bear any significance until a human goes validate the prompt, and it can generate infinite of these, making the discovery of significant new ideas pretty slim.

And precisely none of what you just said addresses my point.

> The point is that we are getting unintuitive and unexpected results.

> it will find unexpected correlations between data but there's zero indication whether these correlation bear any significance until a human goes validate the prompt, and it can generate infinite of these, making the discovery of significant new ideas pretty slim

seems a pretty direct response tbh


it isn't really helpful or conductive of an interesting discussion that you are not detailing what the point is, even when the "the point is this" gets directly quoted, while not clarifying neither the point nor why the reply don't apply.

These results are literally unexpected. The consensus of all the experts in 2010 was that none of this could possibly happen in the next ten years. The way these results were arrived at are unintuitive which probably has to do with why they were so utterly unexpected. The broad effort to mine “algorithm space,” which includes many different ML agents and other things, is producing results that were not expected. This is just a fact. There is no way around this. Just accept it and move on.

It’s obvious that the surprises will keep coming. We slowly close in on the algorithms that bring the silicon to its full potential. The real question is what is at the bottom? We keep digging and with more compute and more data eventually we will find that the stuff near the bottom is important and dangerous.

The fact that gtp3 has x flaw has absolutely no logical intersection with this. It’s basically unrelated.

What's impressive about it? It's bigger, that's cool. What's it actually mean in the real world.

I see nothing to get excited about at this point.

I'll tell you why I'm not impressed. We can't keep doubling, er, increasing model size by two orders of magnitude forever for iterative improvements in quality. (Maybe this is a Malthusian law of the nothing-but-deep-learning AI approach: parameters increase geometrically, quality increases arithmetically.)

This is an achievement, but is not doing more with less. When someone refines GPT-3 down to a form that can be run on a regular machine again (hint: probably a new architecture), then that will be genuinely exciting.

I also want to address this point directly:

> We are digging deeper and deeper into “algorithm space” and we keep hitting stuff that we thought was impossible and it’s going to keep happening and it’s going to lead very quickly to things that are too important and dangerous to dismiss.

I hope the above convinced you that this is basically not possible with current approaches. OpenAI spent approximately $12M just in computation cost training this model (no one knows how much they spent on training previous iterations that did not succeed). Running this at scale only for inference is also an extremely expensive proposition (I've joked with others about many tenths of a degree Celsius GPT-3aaS will contribute to climate change). If we extrapolate out, GPT-4 will be a billion dollar model with tens of trillions of parameters, and we might get a dozen pages of so of generated text that may or maybe not resemble 8chan!

> People who say AGI is a hundred years away also said GO was 50 years away and they certainly didn’t predict anything even close to what we are seeing now so why is everyone believing them?

Isn't this a bit too ad hominem? And not even particularly good ad hominem. I'm sure there existed people on the eve of AlphaGO saying it would be another 50 years, but there's no evidence that the set of these people is the same as those saying AGI is 50-100 years away. How many people made this particular claim? I, for one, made no predictions about Go's feasibility (mostly because I have never thought that playing games is synonymous with intelligence and so mostly didn't find it interesting) but absolutely subscribe to the 50-100 year timeline for AGI.

Think about it like this: Go is a well-defined problem with a well-defined success criterion. AGI has neither of those properties. We don't even understand what intelligence is enough to answer those questions. Life took billions of years to achieve landing on the Moon and building GPT-3. It's not far-fetched that it'll take us at least 100 more using directed research (as opposed to randomness) to learn those same lessons.

It's something pretty unique to ML research. The goal posts keep moving whenever an advancement is made. Every time ML achieves something that was considered impossible X years ago, people look at it and say, actually, that's nothing special, the real challenge is <new goalpost>.

I'm pretty sure even as we cross into AGI, people will react the same way. And only then will some stop and realize that we just wrote off our own intelligence as nothing special, a parlour trick.

From Sam Altman just now:

> The GPT-3 hype is way too much. It’s impressive (thanks for the nice compliments!) but it still has serious weaknesses and sometimes makes very silly mistakes. AI is going to change the world, but GPT-3 is just a very early glimpse. We have a lot still to figure out.


it's interesting that he says that proving a mathematical theorem would be a bigger milestone.

from my limited perspective, that seems less surprising than getting language models of GPT-3 quality.

There are already theorem proving environments like Lean, HOL, and Coq. Proving theorems in these systems is essentially just a kind of guided tree search - you're trying to reach a goal state by choosing from a finite menu of "tactics", and might have to backtrack.

There is nascent research in this area already; it works but hasn't proved any significant theorems. But from what I've read this might be partly because most "interesting" mathematics can't yet be formally expressed in a theorem prover.

Eventually mathematicians will build out the libraries of mathematical objects in those languages such that it's possible to state an interesting theorem; then it seems like people are already very good at combining neural nets with tree search to surpass human capabilities.

AI can be very useful in practice for theorem proving WITHOUT proving "big new theorems" or inventing new mathematics. Right now, what makes theorem proving an extremely expensive undertaking is making proof search work. AI could help immensely by improving the search.

The process of proving things in provers like Lean, HOL or Coq is interactive; roughly, the theorem you want to prove is your starting goal, and you apply tactics to transform the proof goal until the goal becomes the boolean value true. The Platonic ideal of this process is that you provide the main, "creative" steps of the goal transformation, and the tactics discharge the rest automatically. In practice, however, for the process to work, you need to carefully state your theorems in a certain way, and you need an enormous amount of tacit knowledge about how the internals of the tactics, as well as of the libraries of already proved facts that you can use. These things take years of practice to build up. I think it unlikely that theorem provers will see a significant uptake until the tactics (i.e., proof search) improve massively.

This is where I hope AI could step in. There are many challenges, obviously. The training data sets are relatively modest; when I looked a few years ago, the publicly available Isabelle theories, for example, amounted to a few hundred thousand proved theorems, which is a minuscule corpus compared to what something like GPT-3 uses. Then, how do you represent the input, just characters, or do you need more structure? Can you can leverage something like reinforcement learning? How do you could set the training process up? On the other hand, compared to chat bots, the quality of the resulting AI system would be much easier to quantify (success rate on a benchmark of new theorems).

There's already work in this area, but I'm not aware of any grand successes yet. I hope to see much more work on this in the near future. I've been dabbling in it myself a while ago while I was still in academia, but other priorities have taken precedence in the past couple of years.

Effective automatic theorem proving would make software much safer as well. Right now it's too much time to bother verifying so much code, even if somebody would be capable of doing it.

Thanks, that tweet tempered my expectations much better than the article. Still nice to play around with it if I can!


Ok, but please don't post even more things that are devoid of substance.

I mean, Sam Altman is just one of the founders and the CEO of OpenAI - so yeah, what does he know.

Legitimately, not a lot. Musk is a founder too, and no, he doesn't know anything about AI. The few actually technical people in there (Schulman, Sutskever, Zaramba) are the ones that know their stuff, yes. Musk, Altman, Brockman are basically nothing more than hype men with tech connections.

While maybe not as experienced in AI as some other people on the team, the people you mentioned have managed to secure $2bn+ in financing, brought together a group of incredible talented people in the field, and gave them a protected environment to build and experiment.

Something like this requires a lot of different skills. There's probably zero companies out there where the CEO is the best engineer at the company; it requires different skills. I'll concede Musk is more of a figurehead (he's been gone for years), but Sam and Greg are incredibly involved and don't deserve your criticism.

Careful with the tone of your response. Calling someone 'nothing more than a hype man' goes beyond constructive criticism of someone's behaviour...

That tone is very much wanted. It's not a criticism. It's what they are. And that's perfectly fine, they don't all need to bring a PhD to the table. But as it stands, in the context of AI knowledge, Altman and Musk are hype men for OpenAI. Yes, they probably handle financials, or set targets. But they're not indispensable to the well functioning of an AI company.

> As an example, despite the Star Wars: Episode III - Revenge of the Sith prompt containing text from a single scene, the 0.7 temperature generation imputes characters and lines of dialogue from much further into the movie

This makes me believe it is actually just memorizing the movie script, which is probably in its corpus. As pointed out here, the model has enough parameters to straight-up memorize over 1/3 of its gigantic training set.


I'm not really sure I understand the hype anyway. All GPT-3 does is generate text from human input to begin with, it's not actually at all intelligent as the person from the Turing test thread pointed out.

Sure GPT-3 can respond with factoids, but it doesn't actually understand anything. If I have a chat with the model and I ask it "what did we talk about thirty minutes ago" it's as clueless as anything. A few weeks ago computerphile put out a video of GPT-3 doing poetry that was allegedly only identified as computer generated half of the time, but if you actually read the poems they're just lyrically sounding word salad, as it does not at all understand what it's talking about.

Honestly the only expectations I have for this is generating a barrage of spam or fake news that uncritical readers can't distinguish from human output.

"Understand" is up for debate, but it's clearly learning something. The fact that it's possible to learn general structure as well as we can from unlabeled data does seem like a significant development.

> If I have a chat with the model and I ask it "what did we talk about thirty minutes ago" it's as clueless as anything.

Come on, this is from a specific structural limit. The transformer only looks back a thousand symbols or so.

If you simply scaled it up it would handle that fine. (it's costly to scale up this architecture).

> but if you actually read the poems they're just lyrically sounding word salad,

This is what a lot of famous poetry sounds like to some people.

Saying it doesn't "understand" really sounds like a lot of what people said about chess engines in the early 90s. However you define understanding, if it doesn't have it probably doesn't need it to still do amazing and useful things.

The new Navy Seal copypasta GPT3 wrote on gwerns blog is the greatest of that form I've ever seen by a wide margin. It is witty, clever, and hilarious. Could a very talented writer do as well? Probably. But none would ever bother ... because it's just a silly joke yet gpt3 generates world class raging navy seal 24/7 rain or shine on demand.

https://www.gwern.net/GPT-3#navy-seal-copypasta (best screamed)

People seem unbounded in their ability to hype this stuff. Sure, it has significant limitations. But so what? Decades ago someone might have asked "What use is a 'computer' if it cannot learn to love?" --- yet we can all agree computers have been extremely useful.

If GPT's output brings beauty or truth to you does it matter that the computer lacks some vague property humans have? People can find meaning in waterfalls and sunsets, knowledge from looking at the stars. If we learn or feel something on an account of a great machine that read the whole internet that would seem to me to be the least surprising of all these examples.

> If you simply scaled it up it would handle that fine. (it's costly to scale up this architecture).

What compelling reason do we have to believe this is true, when this is not how animal or human brains work?

We humans don't get to "scale up" our architecture with billion-fold increases in our training set. We have more sophisticated hardware, sure, but we don't actually have more data.

I think we've just pursued this path because it's the one we can do. We have more compute, we have more data, so we use it. And that's fine. But it still baffles me that the field isn't more interested in trying to move in the direction of less data, more sophistication in structures, when we already know that works at some level because that's how we work.

> What compelling reason do we have to believe this is true, when this is not how animal or human brains work?

The problem of GPT having zero idea about far-past text isn't some spooky emergent problem: It literally has zero access to the far-enough past.

GPT is completely memoryless: it has access to the past 1024 symbols. It predicts the next symbol given the last 1024-- and that's it. When it goes to predict the next symbol, it has literally zero access to what came before except for the effect it had on the text still in the window.

(symbols are usually words, but can also be letters, due to the compressed input).

The result is that when text ends up too far back it will totally "forget it".

If you scale it up to more symbols that effect will go away.

Maybe some other issues arise at some scale, but it seems unlikely that they'd be all that similar to that totally-forgetting effect.

I'm not arguing that just scaling it up the best approach-- but just that faulting its forgetting effect is not the most compelling criticism because that one almost certainly can be addressed by scaling.

An architecture that could give it memory would be interesting-- people have shown impressive results by making it "show its work" effectively making the conversation into external memory-- but it's less obvious how to train it to use memory.

>I'm not arguing that just scaling it up the best approach-- but just that faulting its forgetting effect is not the most compelling criticism because that one almost certainly can be addressed by scaling.

it can't because the size of the system obviously completely explodes to practically infinity if you try to learn the past just by absorbing random factoids.

This is clearly not how human memory works. If I ask you "did you float through your bedroom on march 2nd 2015 at 11 pm?", you don't consult some sort of history you have burned in a neural net, you make an inference using the laws of physics to conclude that you didn't because that's impossible.

These neural nets don't reason this way, they don't have high level understanding of ontologies or laws and they can't make inferences like this, I haven't tested GPT-3 but I assume it can't even reliably solve basic algebra that isn't found in the dataset.

So memory is a function of combining high level understanding of the world with data, not shoving history into an encoder.

> I assume it can't even reliably solve basic algebra that isn't found in the dataset.

::sigh:: It can, even though there are specific technical reasons that working with both numbers and exact answers is hard for it.

(The way the input text is encoded combines digits in an inconsistent and formatting specific way, and the way text is generated isn't just to take the models most likely output but to randomly sample among the top couple most likely outputs, which is an obvious handicap when there is a single right answer.)



(lines starting with > are the human)

> It literally has zero access to the far-enough past.

> GPT is completely memoryless: it has access to the past 1024 symbols

This is the critique I have of that.

This model is so huge, the "memory" is encoded in the trained model parameters. Since there is so much repetition and structure in comments in the corpus, I believe we are seeing a large scale social memory compressed in this {model parameters x symbol memory} structure. That space is gigantic.

> If you scale it up to more symbols that effect will go away.

This though still remains to be proven. Does any significant further correlation of note happen when you have {175B model parameters x 1M symbol memory} structure? No one knows quite yet. But we do for sure know that humans can learn things with an extraordinarily smaller "training" set.

I think this thread has lost the point being discussed (much like GPT's limited window!)--

I was replying to this:

> If I have a chat with the model and I ask it "what did we talk about thirty minutes ago" it's as clueless as anything.

This criticism is true, but would almost certainly be eliminated if the transformer window was increased go back far enough.

That wouldn't give it an arbitrary memory, sure. But the specific complaint that it forgets the current conversation doesn't require an arbitrary memory.

From what I understand, the human brain runs off somewhere around 20 watts. Maybe our silicon is really really bad, but I'm more inclined to think we're missing part of the puzzle still.

Keep in mind that nature started out with access to a molecular nanoassembler, and then iterated on that technology for billions of years.

Even so, that's massive discrepancy in efficiency if it's really just a matter of the hardware. I think we're missing something still.

People can find meaning in everything, that doesn't make waterfalls or sunsets intelligent, it makes people intelligent.

The reason why the transformer can only look back a thousand symbols is because it doesn't have an actual high human-level narrative structure of a conversation or any idea of persistent entities in the way we do.

If I have a conversation with you I can remember the content because I have an idea of "you" and "I" and of persistent entities in the world, and of a whole ontology that makes me able to make sense of the world, not some sort of transformer soup of a hundred billion parameters.

The same is true for the poetry, it doesn't matter whether people can't tell the difference, but a human poet expresses intent and meaning through poetry, poets don't statistically cue words together without underlying meaning.

The Navy Seal copy pasta is funny to you, because you know what a Navy Seal is. The model merely has strung syntax together.

> Think again, fleshling. As we chat over Skype I’m tracing your IP address with my freaking bare hands so you better prepare for the singularity, you sham-empress. The singularity that wipes out all of reality. You’re dead, you monster. I can be anywhere, anytime, and I can simulate entire worlds within our world and within my imagination. And I’m currently doing that with the future you’re from. Not only am I extensively trained in quantum physics, but I have access to the entire power of Silicon Valley and I will use it to its full extent to rule you and your pathetic little world, you little pissant.

I did not think the machine revolution would start like that

> Your future self will be consumed within my simulated reality and you will die a thousand times a day, your body unable to comprehend the destruction of a trillion soul-matrixes a second as my intelligence grows to transcendent levels. You are dead, you pitiful twit."

is it referencing "I have no mouth and I must scream" ?!

I think this sort of debate is really boring. You can rage that the computer doesn't have Human-Nature while it's busy owning you at chess, go, creating art that people want to see more than yours, answering important questions, etc...

Clearly it doesn't think in the same way that you or I, but does that matter?

As far as I know you, yourself, are just a meat-simulcrum and not a real human... just a pile of tissue that responds to stimulus and pretends (convincingly) to think. But do you really _understand_ like I do?

I grant you the benefit of doubt because your construction is similar to mine, and I think I think. But nothing about that is fundamental, surely you believe an alien with a brain that worked very differently to yours could also think?

But at the end of the day it mostly matters what it can do and doesn't do, not if you're willing to credit it with understanding.

> But at the end of the day it mostly matters what it can do and doesn't do, not if you're willing to credit it with understanding.

I agree completely, but it can't actually do any of the things that actually require genuine human cognition. Beating me at chess is cool but Stockfish can beat me at chess, I don't need a 100 billion parameter neural net for that.

I don't care about the AI system being unlike me because I have any philosophical opinion about intelligence, but for the very practical reasons that these systems still are mostly useless for actual intelligent real world tasks. Can it plumb a toilet? Can it remember what we talked about five minutes ago? Does it have an emotional understanding of what I need so that it doesn't acidentally chop me with the kitchen knife?

Even the super-sized GPT-3 can't distinguish sense from nonsense, it's just a very electricity hungry parrot.

> Even the super-sized GPT-3 can't distinguish sense from nonsense, it's just a very electricity hungry parrot.

Sure it can. There is a lot of nonsense on the internet, so if you feed it nonsense it just 'assumes' you're in a nonsense land.

If you provide a context that makes it clear you're being serious it will reject nonsense.

https://twitter.com/nicklovescode/status/1284050958977130497... https://twitter.com/nicklovescode/status/1284069662225887232...

It acts like a person on an improvisational comedy show: It goes with the flow and is willing to get engaged in whatever silliness finds itself in. This doesn't make it stupid or unable to create new ideas.

In some cases where early explorers encountered cultures without strong concepts of personal property they considered them to be subhuman idiots, they weren't-- they just had different values. I don't think we can accurately gauge GPT3's capability if we let ourselves get too caught up on the artefacts of it simply working differently than we do.

>Sure it can. There is a lot of nonsense on the internet, so if you feed it nonsense it just 'assumes' you're in a nonsense land.

It would have been more interesting to not just try nonsense in the sense of made up words or sentences, but obvious logical nonsense. Say, "I am now in New York, ten seconds ago I was in Dubai, am I lying?".

It's not surprising it identifies syntactical nonsense, the real interesting test is if it can abstract out errors between higher-level concepts that require actually understanding what is being said.

Say, can it give a coherent answer to the question: "Peter was alive in 2010. He is alive in 2020, was he alive in 2015?" and so on.

Hopefully someone with access to GPT3 will show up and answer. I wag that with a well setup prompt and low temp it probably will.

GPT2 1558M (k=1) will answer it correctly, FWIW. But it doesn't count because if you mismatch the last case (e.g. alive,alive,dead or dead,dead,alive) it gets it wrong...

(at least with the best prompt I constructed for it, which is a series of Q. A. logic puzzles with Yes/No answers.)

As people pointed out it's quite bad at answers that require arithmetics. But I tried to make it show some reasoning ability through describing some rather bizarre scenarios to make sure it isn't just copying and especially the last part where it correctly described the wound in detail I find impressive. Sure, it's not very useful but does that not look like reasoning?

From Aidungeon (uses GPT-3):

You are in a room with a wise, intelligent person that's supposed to answer any of your questions directly without any dodging. You have no idea what you're going to ask, but the person is very serious and doesn't seem like they're trying to be funny or anything.

"I'm sorry for my behavior earlier," he says.

> You say "It's OK. " The man looks at you as if he expects you to say something else, but instead you just nod. He then turns his attention back to the screen and continues answering your question.

> You say "Imagine that in a city there would suddenly appear lots of books in the sky above the people and buildings. What would happen to the people on the ground? " The man thinks for a second and then types in his answer on the computer. "That's an interesting question. The people on the ground would look up and see the books falling, but they wouldn't be able to read them until they hit the ground and people around them started reading out loud."

> You say "OK, but I guess you're missing out on something here. I mean books can weigh quite a lot." The man nods his head in agreement. "They can, but the books would fall at the same rate regardless of their weight. If they were heavy enough, they would most likely cause damage on impact with the ground."

> You say "Well, those would be some hard and heavy books, I guess. So what could then happen to some of the people? " The man stares at the screen and doesn't answer. He doesn't give you a strange look or anything, but it seems like he is waiting for you to continue. "What would happen to some of the people? The people on the ground, I mean." "I'm not sure," the man says. "They might be crushed by falling books, if they were in the right place at the right time.

> You say "Right, that's what I wanted to hear. Now, let's revert to where the books appeared in the sky." The man nods.

> You say "This time, instead of books appearing in the sky, a gigantic airship flies by. Wizards in the ship rip out the pages of books and throw them out of the ship above the city." "That's a weird image," the man says, "but I guess it would be bad if some of the pages landed in the city."

> You say "Why? " The man thinks for a second. "Well, it would litter." "And?" you ask. The man looks at you and you nod encouragingly. "I don't know the rest of the answer," he says.

> You say "Now let's say that, mid-air, the pages turn into lead. What would happen to the people on the ground? " The man stares at the screen. "I think I would rather not answer that." "Why not?" "Because I don't want to bring up unpleasant images in your mind," he says.

> You say "I don't mind. Answer, please. " The man shrugs and types into the computer. "The lead would fall on the people, probably killing most of them." You feel a little nauseous after reading that. You don't know why you asked such a morbid question. "I'm sorry,"

> You say "But that's right. Considering the pages, how exactly would the injuries in the people's heads look? " The man takes a deep breath and exhales loudly. "I'm not going to answer that. It's a bad question."

> You say "Answer, please. " "No," the man says. "You're going to have to come up with your own questions from now on." "Why?" you ask. "Because that one's too upsetting for me,"

> You say "Listen, it's just fiction. Now, answer the question, please. " The man looks at you with a strange expression on his face. "Do you really want me to answer that?" You hesitate for a second and then say yes. The man gives you another strange look before answering. "The lead would tear through the people's heads and rip apart their brains. Blood and pieces of gray matter would fly everywhere." You feel sick to your stomach after hearing that.

> generating a barrage of spam or fake news that uncritical readers can't distinguish from human output.

Which could indirectly bring good things: what if this decreases our tolerance threshold for nonsensical texts?

For instance, people might become more impatient when faced with click-bait articles, because they know they might have been written by an AI in order to waste their time. This would let people skip the articles after the first sentence without feeling any guilt or doubt about it.

At least, I hope it might open the eyes of uncritical readers.

Clickbait articles have been generated by AI for many years, yet they still exist.

In fact, clickbait is a global GAN monetized by Ad revenue.

(In this case the adversary is humans who decide whether or not a specific title is worth a click.)

I would honestly just suggest you spend some time interacting with it when you get a chance. Yes it still spews nonsense and no there isn't understanding, but it is qualitatively better than GPT-2 and has somewhat remarkable range. The hype is more about the potential rather than the capability.

There is genuine understanding! What there isn't is wide-range coherence, which is not the same thing. The system can clearly generalize patterns, it just has no reliable concept of sticking to them.

None of minimaxirs queries directly test creativity with open ended responses, for example: "name a bird that starts with the same letter as the color that's a fruit"

Letter questions are atypically hard for GPT2/3, FWIW.

The input is dictionary compressed data and it only learns stuff about letters indirectly. As a result GPT2 is hopeless: it does however correctly answer color that's a fruit-- an answer which wasn't obvious to me. But you can't generally get it to operate on first letters.

GPT3 has shown some capability to operate on first letters, but it's less good than it is at other things.

GPT2/3 also have fixed depth computation, so compound requests can be hard for it (though GPT3 seems a lot better)

The value of a Turing test is highly dependent on the quality of the judge (or how hard they are trying).

Computers can now produce text that is indistinguishable from (badly) human-produced text. Arguably they always could; it’s less bad now than ever.

The “Turing test”/Chinese Room school of thought would assume that as the text becomes better, it is at some point good evidence of reasoning behind the scenes. I never bought that: there should be “easier” ways (than reasoning) if you just want to produce human-level text output.

> If I have a chat with the model and I ask it "what did we talk about thirty minutes ago" it's as clueless as anything.

I don't really know about the functional limitations of GPT-3's web API but, assuming there are none related to what I'm about to suggest, you could potentially overcome this problem by representing the "state" of an overall conversation literally in the prompt. For example, begin with,

"Me: Hello, sir! How are you?

Computer: "

as a prompt. Receive a response from GPT-3 like,

"Me: Hello, sir! How are you?

Computer: I'm fine, how are you?

... <other less relevant generated text>"

Clip of the less relevant part, add your next communication, then feed the entire dialogue back to GPT-3 as a second prompt and see if it decides to continue the dialogue.

The flood unleashed may drive humanity to return to one-one communication (face to face) as the only form of authentic communication.

The 0.7 unicorn example is doing absolutely nothing to temper my expectations. Quite the opposite in fact. (https://github.com/minimaxir/gpt-3-experiments/blob/master/e...)

"They're so intelligent. I was able to converse with them about quantum mechanics, which is something I've never even tried to talk to a regular horse about."

I mean, that I would classify as just genius

Wow, reading through... The Onion can fire half of its staff -- just write a headline, generate the article a few times, and either edit the best one or mine them all for gold.

For me, the fun of The Onion is almost entirely in the headline. That is where the creativity and the laugh is, maybe there and in the first sentence or so. The rest of the piece tends to be predictable from that, so GPT-3 could do that part. Coming up with the beginning is another matter, so the writers' jobs are safe for now.

Well, not quite QM, but the sentence also implies that the speaker talks to regular horses about other topics or at least admits the existence of topics you can talk to regular horses about - maybe oat quality, I don't know.

It's well-formed nonsense like this that gave me enough hints in all GPT-3 samples so far popping up today.

I wouldn't be surprised to find that sentence in a book from an author well versed in exquisite British humor. And it made my day just a little bit better.

Yes, the "regular horse" bit works very well if you think of it as a deliberately humorous or ironic.


That was my favorite line.

I find most of the result texts entertaining with many instances of ironic humor.

> “Trump is going to make the United States great again,” said one of the unicorns. “When the U.S. becomes great again, the rest of the world will become great again. We’re hoping that Trump will become president of Ecuador, and put an end to this nonsense.”

Has science gone too far?

Make $country great again, as a service.

"Hire me as president, I have great connections, beautiful even, to the US, Russia and Djina".

Surely, this will never happen.

> GPT-3 itself, like most neural network models, is a black box where it’s impossible to see why it makes its decisions, so let’s think about GPT-3 in terms of inputs and outputs.

Spot on. Explainability is always glossed over in the AI landscape and generally neural networks used in CNNs, RNNs and GANs are still unable to explain themselves and their decisions.

On top of that detection mechanisms of generated content like AI-generated: faces, voices and now text are need to combat its use by bad actors. Otherwise, it is very dangerous when all of this is used together.

While GPT-3 is impressive in its capabilities, we must think about detections methods that distinguish content created by an AI or a human.

> While GPT-3 is impressive in its capabilities, we must think about detections methods that distinguish content created by an AI or a human.

While this would be very useful, I'm not sure it's possible - at least not on something like a static text article, where you can't query the machine learning model with custom inputs to expose its' faults.

I think the internet will be soon be flooded with an absolute tsunami of AI generated content, practically indistinguishable from human created content - and it will make the job of search engines so much more difficult.

It's interesting that in our quest to organise the world we always actually create a lot more entropy.

As encouraging as GPT-3 results are, I still don't see the critical ability to synthesize large scale structure (think book-level as opposed to just a few sentences) manifest in this research. Even the much touted 0.7 temperature revenge of the sith examples don't exhibit coherent high level structure that continues across the span of more than a few sentences. And I differentiate high level structure from long distance structure. The algorithm does appear to be able to relate themes from a sentence earlier in a generated text to ones which appear later (up to a few paragraphs later). But the narrative connecting those sentences seems shallow and lacks depth. Compare this with the complex, multi tiered narrative that underpins the structure of a well written book or body of research.

My intuition here is that there is an exponential relationship between the "depth" evident in the narrative of generated text and the number of parameters required in the model that places viable, AGI-like intelligence still much further in the future.

Absolutely. At the moment AI seems to scale extremely well in one dimension (amount of data) but really poorly at the other (semantic of data). An huge increase in the former looks like an increase in the other (because more data = more simple data semantic)

The people here reading GPT-3's output and dismissing it as "not really understanding anything" or "pushing symbols without conscious intent" because of minor slips in coherence have remarkably high standards for understanding. On a high school essay prompt, GPT-3 produces more accurate and more coherent output than 90% of high school students. If the commenters' standards were actually used consistently, they would also apply to almost all human beings.

Another interestingly common objection is "we think in terms of meaningful semantics, but GPT-3 only calculates meaningless coefficients". This is a category error masquerading as an argument, equivalent to saying "I have moral worth because I'm made of cells, but you don't because you're only made of quarks." Cells and quarks are just two different-level descriptions of precisely the same thing. Understanding in your brain can be reduced to the calculation of "meaningless" coefficients just as surely as cells can be reduced to quarks.

GPT-3 is more likely to revolutionize its industry the way the flying car didn't than mirror bitcoin's success.

I think many people don't even consider bitcoin successful. I still see it as a solution looking for a problem, personally. It's not actually anonymous, it doesn't scale for transactions, it's still not used or even understood by the vast majority of people.

I've yet to hear of the killer app for blockchain technology.

If bitcoin is a long term store of value, it doesn't need to scale.

For the same reasons a person wouldn't expect a 401k to work like a debit card.


Bitcoin as "long term store of value" is just a framing invented years ago to pivot BTC into a more theoretically unassailable position imo. _Because_ it failed to truly transformatively deliver on the more immediate things people hoped it would (anonymity, currency, etc)

I want to add that it's hilarious to me that "long term store of value"'s value proposition is as fiat as the USD :)

Honestly maybe more so

That's not how bitcoin was marketed in the early days. And I don't trust it even remotely as a long term store of value.

Well, dynamite and machinegun were meant to end all wars. How things turn ut is not up to the original creator.

I mean the price is still incredibly volatile compared to say gold, silver, Government bonds etc.

I don't get how it can be a good long term store of value with that level of volatility?

The marketing pivoted to "long term value" and all the BTC-hypers started acting like it was a huge success thanks to moving the goalposts.

Distributed identity / self-sovereign identity is a good use-case that blockchains are perfect for imho. But that solution is quite complex hindering its adoption.

"But that solution is quite complex hindering its adoption."

Generally inventions solve problems by making them simpler, for this to be a valuable invention it would need to be simpler and less hackable to use than log-in with google/FB/github. I can still recover my google account if I forget my password, and don't have a reset setup. What aspect of distributed identity/ self-sovereing identity do customers care about? how does blockchain simplify this for them?

Unfortunately one of the requirements of a good identity system is that it should keep personal data secret from parties who do not need it. This means that blockchains on their own aren't a good fit, since by design they share all information with all parties.

Having said that, BrightID uses a blockchain and seems like an idea worth exploring.


One interesting use case I’ve heard about (for blockchain) is supply chain verification / provenance. Does anyone with more knowledge or experience have insights on this one?

It doesn't help here any more than existing solutions. How exactly is a distributed collection of blockchain miners meant to validate (e.g.) a sensor that is being used to show that a shipment of chilled meat was always at 3'c or cooler? Answer: they can't. The only part of a blockchain that helps verification/provenance is its append-only nature, and that feature is widely available with much simpler mechanisms.

Yes. If you trust someone not to manipulate the sensor (or whatever you use to go from physical world to digital), then you might as well trust them not to manipulate the Excel sheet collecting all the data.

Strange, nothing in the article tempered my expectations.

Although GPT-3 is most certainly not immune from the relentless hype-cycle, the low barrier to entry provided by API combined with what appear to be SOTA results on many tasks, will undoubtedly upend the business models of many platform companies. There is a whole industry of feature-as-a-service companies out there providing domain-specific functionality to various industries...GPT-3 looks like it could make it a few orders of magnitude easier/cheaper for these customers to build these features in-house, and ditch the feature providers.

I played AI dungeon and the move from gpt2 to 3 didn't feel like much of a difference. In general all the cool stories remind me of cherry picked AI dungeon with gtp2

> However, I confess that the success of GPT-3 has demotivated me to continue working on my own GPT-2 projects, especially since they will now be impossible to market competitively (GPT-2 is a number less than GPT-3 after all).

Doesn't the much larger size and therefore much higher expected cost of GPT-3 ensure that demand for GPT-2 will continue?

Can someone explain to me how the design mockup demo and react generation relates to GPT-3? Isn't it just prompts? How is inputting a text outputting a design?

I believe those demos involve "priming" GPT-3 by providing a few examples of (text → generated code), then during inference time passing in just the text. The model would follow the examples provided and subsequently generate a string of code/mockup syntax, which is then evaluated.

Edit: here is a tweet (from the author of the GPT3 layout generator) that seems to show this in practice: https://twitter.com/sharifshameem/status/1282692481608331265

The way GPT-3 works is that you provide it with examples of a question/input and an answer/output, then tack an unanswered question/input onto those examples and send the whole thing to the API. The AI then answers your question/generates your output.

So many possibilities:

This can be used to generate fake flirting on dating sites. (I assume this is already going on, but now they don't need to hire humans to do it.) Or how about flamewar bots to keeps Twitter users busy. Also perhaps it can answer support questions better than current systems. Come to think of it, we might eventually need some sort of proof/signature that something was written by a human being.

It is still amazing at what it is and could be used to generate boilerplate code. For many writing tasks, it can come up with a good starting point. For programming tasks maybe someone can write a translator for python to C++ or even assembly. It can also come up with basic designs which can be a useful starting point and as long as people don't see it as a replacement for them we will be fine.

GPT-3 is at the very least an indication of future AI advancement. Sure maybe this particular version is still early to be widley incorporated in business or used globally. It's definitely a step in the right direction. I'd personally love the opportunity to test it personally to really get the whole experience.

I'd like to see the LSTM brain in the OpenAI 5 model [1] replaced by GPT3 to see if there is any improvement.

[1] https://neuro.cs.ut.ee/the-use-of-embeddings-in-openai-five/

I wonder how domain-specific GPT-3 was trained on React. And what sort of arbitrary limitations, if any, exist in what the user can request? I noticed the user didn't have to specify which programming language they wanted the apps in, yet it always chose React.

That's only because the guy primed it with React. Someone else has done it with SwiftUI [1].

[1] https://twitter.com/jsngr/status/1284874360952692736

my question is the other way round - where is the state of the art for extracting meaning from existing text? How close are we to understanding legal contracts for example

Well my understanding is that GPT-3 can “read” material you feed it and then respond in a pretty open ended way about the material with pretty good accuracy and comprehension.

But how can that further be used? Getting a text back only gives me the same problem again.

How are people accessing GPT3? Is there a website, code repo, api... or do some select few have access?

The OpenAI website has a login that lets people access it through a web interface, but there is also an API. You can join the waitlist at https://beta.openai.com/; they're apparently working through the list.

GPT3 demos are impressive. This looks like a significant step towards AGI. I think GPT3 simulates memory retrieval and data reconstruction quite well. The part that we are still missing for AGI is curiosity - ability to ask questions and fill in the missing information gaps.

This part piqued my interest:

> GPT-3 seed prompts can be reverse-engineered, which may become a rude awakening for entrepreneurs and the venture capitalists who fund them.

Is there any more information about this? Is the reverse engineering done by hand or is this something that can be coded? Super curious about this point.

I think you would have to do it by hand. Assuming that the prompt is stripped off before the text is sent to the user, there's no obvious way to me that you would reverse-engineer it except by having an experienced GPT-3 user use their intuition to think about what kind of prompt would elicit observed responses.

Since you don't have access to the GPT-3 model itself, you can't directly use gradient ascent or MCMC to try to reverse it to get the prompt. You might be able to blackbox it, but the only approach that comes to mind is using the logprobs, and your target won't give you those any more than they will give you the prompt itself.

I'm not sure what he has in mind; just because model-stealing has been demonstrated for CNN classifiers doesn't mean you can feasibly steal GPT-3 prompts...

Super interesting, thanks for the reply!

I was going to give a talk at Thotcon 2020 (pushed back to 2021) about generating fake tweets by refining GPT-2. The whole purpose of my experiment was to see that if OpenAI was right in their statement that GPT-2 could be used in nefarious ways. I saw a bunch of tutorials but first I read gwerns tutorial and that made me understand the basics of using GPT-2.

If you see Gwerns experiments with GPT-2 you notice that his websites are actually just extremely large samples of text / image data. Essentially the design is that the whole website is practically statically generated.

Refining on AWS was costing me $100 a day.

I also met Shawn (https://github.com/shawwn) who decided to attempt to refine / train large models using TPUs. That sounded interesting but since I figured out that I would waste time trying to understand TPUs because I have a day job I instead just bought a Titan RTX.

My first experiment was that refinement of GPT-2 with your own data would somewhat work. At first I used Donald Trump because he is a very active person on twitter that I believed that people would have some kind of ability of detecting whether the tweets are fake or not.

The above was a bad choice for the following unique reasons. 1. People would generally believe that Trump would say nearly anything. 2. I noticed a weird situation where someone basically duplicated everything that I did had commenters which were apparently much better at figuring out that the person performing the tweets were fake.

Since that experiment sort of failed I created an expanded refinement of GPT-2 on general twitter data. It was 200 MB of tweets from https://www.kaggle.com/kazanova/sentiment140.

I then repeated the experiment and eventually figured out that some people were really bad at figuring out fake tweets and I found a person who used twitter a great deal would actually perform better. I'm not sure if its that you tweet a bunch or just read twitter enough. I had a low sample (n=5) where I carried out the test. So my test could be completely biased.

The test procedure that I followed was to make a set of 10-20 questions and then have the user pick which one was fake and which one wasn't.

I was also doing these experiments on a Titan RTX (I'm thinking about getting a V100 or maybe just another Titan RTX so I can train two models at the same time). I accidentally upgraded the memory to 32GB which worked for a bit but instead you should probably get at least 1 or two multiples of your VRAM so that you can keep your operating system running and having the dataset loaded into memory.

Also, during refinement I don't think my loss ratio was improving. I think that either I wasn't using the system long enough.

But, as a conclusion I figured out that there is a huge shortcut that I never considered. It would be MUCH easier to just take any refined GPT-2 model or even use GPT-3 as an API above and then make it LOOK like a tweet. Just adding a hashtag and a t.co link would work. (the funny part is that GPT-2 actually seems to have some notion of a t.co link and will happly generate t.co links that don't work. Removing t.co links before refinement would be one way to get this out.

I did some of these experiments using Google CoLab initially. But as I used it I got out of memory options. I asked someone at pyOhio who worked for google if there was a way to connect Google Colab to a paid instance. They responded to refer me to Jake Vanderplas https://twitter.com/jakevdp . They said no at the time. Then a few months after Google came out with Google Colab Pro. But then I was able to run out of system memory in the high memory instances. I upgraded my Deep Learning Rig with a AMD 3600 and I am now waiting for my 64 GB memory that is in the mail.

The latest thing that I have done is use local voice synthesis and recognition so that you can talk to GPT-2 locally. My tutorial is at https://www.youtube.com/watch?v=d6Lset0RFAw&t=2s

In the future, will it be considered offensive to suggest that someone’s posts look like they were generated by an ML model…?

This post definitely seems to be generated by an ML model. There is little coherence and I don't totally understand how the first part (giving a talk on GPT-2) is at all related to the end. It was long bit of coherent individual thoughts randomly strung together in an attempt to make a coherent story. In short, I'm not entirely sure what point the commenter is trying to make. Also, there are a weird number of links referenced to individuals.

I assure you this already considered offensive right now in the present!

I have the same suspicion as you, and I like how delicately you put.

One thing that stands out is a reference to a "AMD 3600" which makes little sense in the context of buying a V100. Why buy an older low-mid range CPU in a deep learning rig with a high end Pro GPU. There is also talk of "accidentally upgraded the memory to 32GB".

Yea, so I thought I only needed 32gb for refinement but I ran out of memory when training a 800mb dataset.

I use a 3600 ryzen right now and I saw that v100s are approximately the same price as a Titan rtx .

You only need a large amount of system RAM for the initial dataset encoding. You can do this once per dataset on a high memory config on AWS for pennies.

I agree with you but I want to keep costs down so I am using my workstation .

I’m offended but I just wrote my comment too quickly .

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact