Hacker News new | past | comments | ask | show | jobs | submit login
Why GPT-3 Matters (leogao.dev)
223 points by teruakohatu on July 20, 2020 | hide | past | favorite | 163 comments



I really hope this tech will be miniaturized at some point.

I hate to be that guy, but when I saw the article the other day that ended with "plot twist, this article was autogenerated with GPT-3!", I was not impressed, mainly because it looked like content farm's content to me, conveying no information. It basically looked like an incredibly costly spam tool.

But then I thought of possible applications. Give humans a new tool, and they will amaze you, I'm sure we'll see cool applications of this tech in the future (and probably horrible ones too). One thing that I could think of where it would totally rock : games. If games were able to use such tech, their game world could be filled with casual discussions rather than the developers needing to fill everything. This means that NPCs in a game could have ever changing discussions, and even answer to the player about the subject they're discussing. Their discussion could even change based on what is happening in the world or what the players are doing, even to things with the smallest impact, or what is happening directly around them at the moment, without devs needing to script it all. That would be awesome.

But yeah, this won't happen easily unless the model can be embedded and shipped on local computers.


Hm. GPT-3 is trained on internet data from the real world. Your NPCs are in the gameworld. I guess you wouldn't want NPCs to ever reference anything 'real' for fear of breaking immersion (not to mention political backlash if the model grabs the wrong thing!). However, if you limit your corpus to the gameworld that's nowhere near comparable to the real dataset, and someone would have to retrain it.

There needs to be a way of absolutely limiting it, from what I've seen the prompt can't do that with 100% success. A blacklist would be useful in some applications (people tolerate their search engine screwing up once in a while), but not in games. Still, maybe we can accept limited performance by limiting to the gameworld + some manual texts. It's not NPC dialog is literature-level anyway...


The main thing about GPT-3 is that they wanted to demonstrate one-shot fine-tuning and succeeded at it.

So the model can be transformed to output part-of-speech words, dependency grammar trees or named entities in input even if training data is sparse. Similarily, you could fine tune it to produce game lore and then see how it works for that. The model easily switches to different modes of operation and achieves state-of-the-art or close to state-of-the-art performance.

It's quite funny how NLP folks tried to solve low level tasks (POS tagging, NER, Named entity relationship extraction, dependency parsing, sentiment classification etc.) to get to higher level tasks (good summarization, machine translation, text generation, question & answering) and now a single model captures all the low level stuff for free and does high level stuff so good that finetuning it to do low level stuff is unnecessary.


This, the difference between one-shot fine-tuning, vs fine tuning for GPT-2, is one of the major breakthroughs. Since GPT-3 is so hot in the past few days, people seem to forgot or not realize lots of the GPT-3 examples shown off today were possible with GPT-2, with the catch that you had to fine-tune your own GPT-2 model to fit your problem domain (game plots, poems, music, bots that chats like certain characters, etc). GPT-3 makes that fine tuning process unnecessary (although practically you probably can't/can't afford to fine-tune your GPT-3 model)


Are you sure that set out to proof one-shot works? Maybe they found fine-tuning performance disappointing and decided to publish this instead.


Seems like a minor technical challenge (at least in the case of games)

1. Set up a pseudo-adversary NN trained to recognize context-correct speech based on a small corpus. 2. Craft a GPT-3 prompt to get N 9s of accuracy 3.Retry if the answer fails the test from the other NN 4. Set a cap on retries based on how many 9s your prompt got 5. If cap exceeded, return a context-free or limited context response


Sorry. Fixed list:

1. Set up a pseudo-adversary NN trained to recognize context-correct speech based on a small corpus.

2. Craft a GPT-3 prompt to get N 9s of accuracy

3.Retry if the answer fails the test from the other NN

4. Set a cap on retries based on how many 9s your prompt got

5. If cap exceeded, return a context-free or limited context response


This solution would indeed work well enough for me (I'd be paranoid enough to add a tiny blacklist at the end just in case).


> There needs to be a way of absolutely limiting it,

Usually, it's fine-tuning (training a model starting from an earlier model). They can use a small amount of text to fine-tune GPT-3 to their liking.


Well, I've seen examples where GPT-3 strayed quite amusingly. I'm hardly an NN expert, but my understanding that one can't assure the model wouldn't do this (beside retraining on the gamedata corpus alone, which would obviously impair the NN). There are good reasons for game devs to want a measure of certainty here.

Someone else in the thread suggested using a verifier based on game data and maybe that would be fine. The key IMHO must be some kind of NN trained only on game data, either GPT-3 itself or a verifier of some sort.


In the AI dungeon app, regardless of the setting, you can ask the NPCs to sit down and play super smash brothers with you, and they will.


Why not? It would work for any game set in real past or potential future of today.


There are issues here beyond immersion. One of them is that so far, nobody can peek inside the GPT-3 NN and find out how the game devs trained it.

If a player got something extremely disagreeable from an NPC, was this a fluke or did a dev intentionally add it in training as to make it more likely? There's no way to prove innocence. Add in trigger-happy social media + governments, the potential cost to devs & publishers could get all the way to bans/boycotts/legal threats. Most companies do not wish to risk this, so mitigations must be in place.


In case you didnt know, aidungeon.io uses GPT-3 in their "Dragon" model. It's only available for pro accounts though.

I think this might be the way to go forward regarding language models in games: Offer them through cloud computing for GaaS.


This is an excellent example of the potential.

It feels like what I imagined playing in a Star Trek Holodeck would be (well the dialog anyway).


"Give humans a new tool and they will amaze you." You are right. Already some special use cases of the API have come up. I have been exploring the API since yesterday and it is a big deal. But not for the reason of AI and AGI. That may happen later. But right now, the API is the game changer. Single API to apply for lots of different tasks. I have been updating my thread of demos which I am building at https://twitter.com/nutanc/status/1285128265519083520


How long did it take you to get access to the API? I filled out the form almost a week ago and haven't received anything back.


Around 2 weeks I think


That would be a great use. It really takes me out of a game when I talk to an NPC and they run out of dialog options. After that, every time you approach them, they repeat the last thing they said.


I predict it wouldn't be as fun as it might initially sound. NPC dialog is part of level design just like all the rest of the level it occurs on. With few exceptions (such as roguelikes), random level generation tends to produce bland cookie-cutter levels; I don't see why dialog would be any different. The reason it's fun to talk to NPCs in a game like Chrono Trigger or FF7 is because someone put the work in to make their dialog interesting and relevant and fun.

Where I could see it working better would be for e.g. newspaper headlines in a grand-strategy or SimCity-like game. When you do something crazy like have Lichtenstein conquer Western Europe, it could be funny to have some auto-generated commentary on said crazy state of affairs.


It offers you procedurally generated dialogue that can be relevant to any potential player actions. You decide to kill the town chicken and now all the townspeople are talking about it. You've freed up devs from having to hard code all these possible interactions, obviously not perfect but it allows for some interesting possibilities.


Just imagine a caves of qud weirdly coherent storytelling be the base of new games, I LOVE THE FUTURE


>But then I thought of possible applications

Indeed, some pretty impressive demos have been sprouting up on Twitter. Things like generating properly running code from english descriptions of the desired functionality are potentially game changing.


GPT-3 doesn't know how to code. However, it has so many parameters that it was almost able to memorize its training data, which included people asking how to code something and other people answering.


Even if it was completely rote memorization it is still extremely valuable to be able to give a plain text prompt and get relevant answers back. As SWEs most of us probably have very high end google-fu so we can find potentially obscure answers to our questions but this could make that kind of skill redundant. Why try to search stackoverflow when GPT could just generate exactly the code snippet you need?


I see people saying this over and over again, with nothing to back it up. I could say the same about the human mind.


And at least those can be run on prem. This thing we are talking about is already in walled gardens.


Another potential application/adaptation that would be useful is lossy text compression, but I'm not excited about using 300 GB of RAM or a web service to compress and decompress text.


Especially since text is so small anyway, compared to pictures and videos or sound.


It would be great as a "lorem ipsum" generator ...


Why does GPT-3 matter? Well, the article starts with a plot that shows what looks like an exponential growth in the number of parameters, compared with previous models. So it matters because it's bigger.

Further down there's a plot titled "Aggregate Performance Across Benchmarks" where we can see that performance is on average about 50%. I don't know what the baseline for this plot should be (what is the expected average if the tasks are solved by a random classifier?) but comparing this plot with the plot at the top of the article, it doesn't really look like there's a huge improvement in accuracy with a huge increase in the number of parameters. In fact, it's quite the contrary: there's a small increase and a very smooth, almost linear curve. So that's an exponential increase in the use of resources for an almost linear increase in performance? That's not that impressive.

So it appears that the big thing about GPT-3 is that it's big.

It should also be noted that the public interest about GPT-3 is mostly focused on its ability to generate text, for which there is no good metric. So basically, GPT-3 is big, but it's not that good in tasks for which there are formal benchmarks (such as they are, because Natural Language Understanding benchmarks are often very poorly made and don't really measure what they say they measure) and we can't really tell how good it is in the one task that interests most people the most.


The significance of GPT-3 is that the scaling isn't slowing down. With every increase in the number of parameters the doubters say, "Oh, you'll hit diminishing returns," or "Oh, the curve will go sigmoid," but it hasn't happened.

If OpenAI develops GPT-4, with 1T parameters, I wouldn't be surprised to see a performance gain larger than the jump between GPT-2 and GPT-3.

What GPT-3 shows us is that we're going to have ML systems that can write at the level of an average human pretty soon now.


I have a much lower opinion of the average human's writing capabilities. Much of what we see online has been written by people who either love to write or are journalists. I think GPT-3 is already at the average human's writing level.


I'm not sure what's being discussed here exactly. If we talk about vocabulary, spelling and grammar I agree with you. On the other hand humans are able to express opinions and idea, come up with novel things to say, not merely mimicking an input.

If you give me a huge corpus of chinese texts and a very long time, I might be able to figure out what character goes with what other, find the various structures in the text and then be able to generate a somewhat convincing made up chinese text while still not understanding a word of it.

These GPT-3 demos are impressive because they look like real text with proper syntax and grammar, but they still express absolutely nothing. It reads like a long series of rambling that goes nowhere. There's no intent behind it.

It reminds me of these videos of apes imitating humans by using their tools, banging hammers ineffectively. They are able to copy the appearance of the behavior, but not the reasoning behind it. They don't get why we bang hammers or what it achieves.


Have you read any business books? I used to read quite a few. For the most part, they take a central thesis and then repeat variations on the theme over and over again. Sometimes with anecdotes of questionable veracity. I venture that many of them could be generated with GPT-3.

My point is, GPT-3 is operating at human levels for certain contexts. I think it would get passing grades on essays in a lot of schools in the US, for instance, just based on syntax and grammar.


This stuff is so new that HN threads may be the first to mention realistic potential applications - congratulations, I think you just found one. Having GPT-3 render a first draft of books in the archetype you mention (one simple idea stretched out over many pages) seems like a very profitable endeavor.


> Having GPT-3 render a first draft of books in the archetype you mention (one simple idea stretched out over many pages).

Given what I've seen so far with GPT-3, that simple idea would have to have already been discussed at length on forums on the internet and in the corpus.

Usually books have facts and studies that they use as supporting points. Many of the connections they make between the subject material and their thesis are unique, and this forms their supporting argument. GPT-3 is rearranging words and sentences to resemble structures it's seen before, but it does not create novel facts.


So ideally it could work like a meta-study. Meta-studies combine results from multiple separate studies, making correlations and drawing more confident conclusions. Most 'original' human ideas are just reinventions of older ideas, too.

The interesting part is that GPT-3's leap in performance can be attributed to scaling. That's easier to do than inventing completely new approaches. Scale data, scale compute, scale money, then you have something you couldn't have invented directly.


A Chinese room can still be an interesting conversation participant


That's a good point, but I feel like there's still a long way to go before the model has enough data to actually output insightful content. Right now it seems to mostly output grammatically correct Lorem Ipsum.


I completely agree. People have a hugely inflated opinion of themselves. It's already better at writing articles than your average human.


>> The significance of GPT-3 is that the scaling isn't slowing down.

Can I ask- who says this? Is it your personal opinion? Is it the conclusion of the article above, as you understand it? Is it a commonly held opinion of some of the experts in language modelling?

I am asking because everytime there is a claim like "X is important because Y" and someone points out that "Y" is not that interesting, if someone else then says "X is important because Z" and Z is not Y, it's very difficult to have a productive conversation, because it's very difficult to know what we are talking about. Of course, this is the internets and not scientific debate (typically carried out in peer reviewed publications) but if the goalposts keep moving all the time, it's pointless to even try to have a conversation about the merits and flaws of such a complex system. That, with all due respect.

Now, regarding whether GPT-3 is slowing down, it isn't, but it's not going very fast either. Like I say, the curve in the middle of the article that shows acccuracy as a function of parameters is quite flat. Depending on how you want to define diminishing returns, the image painted by the accuracy plot is not that far from it and in any case average accuracy is pretty disappointing.

>> What GPT-3 shows us is that we're going to have ML systems that can write at the level of an average human pretty soon now.

Like I say, there's no good metrics for this kind of task. We have no way to determine what is writing "at the level of an average humen" (let alone what is an "average human"), except eyballing output and expressing a subjective opinion. Anyone might claim that GPT-3 is already capable of writing "at the level of an average human". Anyone might claim that GPT-2 is. Or a Hidden Markove Model, or an n-gram model. Such claims really don't mean anything at all.

It is important to note that this is exactly the task that OpenAI has publicised the most, with GPT-3: a poorly defined task with no good metrics. This insistence in promoting an ability that cannot be objectively evaluated as being a strong point of the model is strong evidence that the model is not nearly as good as advertised.


But it is slowing down. In computer vision, we had MNIST solved and waited for more than 2 decades of exponential growth in compute until ImageNet was solved. That 98% percent accuracy on ImageNet is nowhere near good enough for applications like self-driving cars. How many decades until we reach a 10^6 error rate? Keeping in mind that exponential growth in compute is over.


> Keeping in mind that exponential growth in compute is over.

DL is getting increasingly specialized hardware, there's plenty of growth there. Plus what we're seeing here is GPT scaling up without algorithmic changes. Algorithms are advancing too.


GPT-3 or GPT-4 can give us "convincing liars", but we still need to figure out how to combine them with actual factual databases and do a quick fact-checking/validation/inference. GPT-3 is showing us a convincing human-like style, but no real substance. It's a massive step forward in any case.

I might try to generate soft-science essays with GPT-3 at one of my universities to see if it passes through TA filters.


It’s this and the fact that few/one shot learning seems to just emerge with with models of this size


Oh and note that this:

>> GPT-3 can also finally do arithmetic, something GPT-2 was unable to do well.

Is a preposterous claim that is very poorly supported by the data in the GPT-3 paper [1]. Figure 3.10 in the paper summarises the results. The authors tested addition between two to five digits, subtraction between two to five digits and multiplication between two digits drawn uniformly at random from [0,100]. There was also a composite task of addition, subtraction and multiplication with single-digit numbers (e.g. 6+(4*8), etc).

On all tasks, other than two and three digit addition and subtraction, accuracy was uniformly under 20%. The other four tasks achieved high accuracy with more parameters.

Of course, this doesn't show that the larger models "learned arithmetic". Two- and three-digit addition and subtraction are likely to be much better represented in a natural language dataset than other operations (and note of course the conspicuous absence of division). So it's safe to assume that the model has seen all the operations it's asked to repeat and knows their results by heart. Remember that for two and three digit addition and subtraction one only needs a dataset with the numbers up to 999, which is really tiny and easy to memorise.

Edit: the authors note that they "spot checked" whether the model is simply memorising results, by searching for three-digit addition examples in their dataset. Out of 2000 three-digit addition problems they failed to find more than 17% in their dataset which "suggests" that the model had not ever seen the problems before. Or, it "suggests" the search was not capable of finding many more existing matches. In any case, why only "spot-check" three-digit addition? Who knows. The paper doesn't say. Certainly, one- and two-digit addition and subtraction should be much more common in a natural language dataset. The authors also say that the model often makes mistakes such as not carrying a one, so it must be actually performing arithmetic! Or, it's simply reproducing common arithmetic mistakes in its dataset. Overall, this sort of "testing" of arithmetic prowess simply doesn't cut the mustard.

Edit 2: Also, no information about how many arithmetic problems of each type were tried. One? Ten? One hundred? Where all arithmetic tasks tested with the same number of problems? Unknown.

_____________

[1] https://arxiv.org/abs/2005.14165


It's possible the prompts used by the researchers to gauge arithmetic ability could be improved by changing to a more conversational style.

I found these series of tweets almost unbelievable in how well GPT-3 seems to reason about function composition f(f(x)).

"I wonder if the AI would be better at math if you told it to show it's work":

https://twitter.com/kleptid/status/1284098635689611264?s=20


The important thing to understand is that there is no obvious reason why a language model should be good at arithmetic, rather than reproducing results in its training set. OpenAI is claiming that it is, which is tantamount to invoking magick. They need to back up their very strong claim with very strong evidence. They haven't, so it's nothing more than an absurd claim that follows in a long line of absurd claims about AI, since the early days of the field.


But the example the poster you are replying to gave shows that GPT-3 can take the square root of a user-defined function composed with itself. It clearly can perform arithmetic, and we don't need to trust OpenAI now that users can interact with the model.


And yet it can't always correctly subtract two two-digit numbers. Does that sound like a system that can perform arithmetic?

As I say in previous comments, no. It sounds much more like a system that can reproduce resutls it has seen in trainig, but has no general concept of arithmetic.

This would also explain the square root example example easily. Also, the examples in the OP's linked tweet are very simple examples of square roots and function composition that are very likely to have been lifted verbatim from some textbook, or who knows what ...and that's the problem, because who knows what the model has flat memorised and what it's composing from smaller components.

>> It clearly can perform arithmetic, and we don't need to trust OpenAI now that users can interact with the model.

The paper I link above performed a systematic evaluation of GPT-3's arithmetic ability. Playing around with the OpenAI API and eyballing a few results is not going to give a clearer understaning of its abilities.

In general, hitting a language model with a few queries is never going to give any clear understanding of its capabilities. Systematic evaluation is always necessary and the average user (or the non-average user) is not going to be able to do that.


I think it's pretty clear that there is something more than memorising examples from the training set going on. Look at this: https://www.johnfaben.com/blog/gpt-3-arithmetic

In which GPT-3 answers the question "what is one hundred and five divided by three?" with "35.7". It also gave several other close-but-not-correct answers. It seems pretty unlikely these are all present in the training set, and surely can't all have been lifted verbatim.

I agree systematic testing is probably more useful, but find it really hard to believe this is all happening without any sort of model of arithmetic.


>The important thing to understand is that there is no obvious reason why a language model should be good at arithmetic

If there is enough arithmetical structure in the training corpus, eventually the best way to predict the training corpus is just to learn arithmetic rather than memorize every instance of arithmetical structure. Transformers have been shown to be equivalent to graph neural networks, so in some sense they have the power to self-discover novel architectures in service to learning a data set. So it is quite reasonable that it could have learned generic rules of arithmetic.


Sorry, but that doesn't sound very reasonable at all. I'm also not sure what you mean by "arithmetical structure" to be honest.

In any case, I think you're applying an overly permissive criterion for learning "generic rules of arithmetic". It's clear from the paper linked above that GPT-3 is extremely limited in its ability to return correct results given arithmetic operations as input. The only task that it performs with 100% accuracy is two-digit addition. It cannot even perform two-digit subtraction with perfect accuracy and it's all downhill from there.

Furthermore, like I say division is conspicuously absent from the set of tested tasks reported in the paper, as are any operations with five digits or more [EDIT: sorry, that's "operations with more than five digits"- my bad.]. Going again by my heuristic from an earlier comment, that researchers publish positive results and avoid publishing negative results, this tells us that GPT-3 can't perform any division at all with any accuracy and it can't perform any arithmetic operations with five digits with any accuracy [EDIT: again, that's "more than five digits". Apologies].

That is hardly the hallmark of a system that has "learned generic rules of arithmetic". It is far more likely that GPT-3 has learned to reproduce results that it has seen during training. Even more so since, as I say in my comment above, it is much better at operations that are likely to be found more often in a natural language corpus.


But why think performing arithmetic with 100% accuracy is required? Children learning arithmetic aren't perfectly accurate but they're certainly learning arithmetic. The fact that there is a digit cut off where the quality of its results drop off isn't all that surprising either. How much arithmetic can you do in your head? I'm likely to fail at some point with two digit addition without using a pencil or paper. Three digits I would be significantly worse. Your criteria for what counts as "learning arithmetic" doesn't seem to be based on anything substantive.

The cliff for GPT-3's arithmetic ability is likely due to the fact that it can't do recursive/recurrent calculations. That is, it can't reprocess and refine a tentative answer to improve it. You can't do arbitrary arithmetic with a finite amount of substrate without this sort of recursion or recurrency. The fact that it can only do two digits with 100% accuracy could be a hardware or architecture limitation.


>> But why think performing arithmetic with 100% accuracy is required?

Because otherwise, how do you know that your system has learned the "rules of arithmetic", as per your comment, and not something completely different? And like I say in my other comments, there's a very obvious alternative about what that something completely different could be: a representation of already seen results.

Besides, GPT-3 is a piece of software, it's not a child or a grown up human, who can make mistakes because their memory fails or because they get overwhelmed by the complexity of executing a complex set of rules. If a piece of software implements a set of rules, it's usually able to execute them right every time, without failure, certainly so for relatively simple rules like arithmetic. Pocket calculators with tiny resources can do that and they can do it with very long sequences of numbers, so why would a huge language model, running on very expensive hardware, fail?

>> The cliff for GPT-3's arithmetic ability is likely due to the fact that it can't do recursive/recurrent calculations.

Well, yes, exactly that. If a system can't represent recursion then it can't represent arithmetic between arbitrary numbers. Hell, without recursion, a system can't even count to arbitrary numbers. So in what sense can GPT-3 be said to have "learned the rules of arithmetic"? Learned them, how, if it can't represent them?

Actually, your observation about recursion is the first thing I'd have normally said, but it doesn't seem to be commonly understood that neural networks (and propositional, attribute-value, learners in general) can not represent recursion. Similarly, such systems can't represent non-ground values, that is they can't represent the concept of a variable. But that's a big part of why they can't build general theories. In terms of arithmetic, it means they can't represent the relation x + y = z because they can't represent x, y and z as universally quantified variables. The only remaining alternative is to represent every ground expression, like 1 + 1 = 2, 1 + 2 = 3, etc. But that's not the rules of arithmetic! That's only some instances of specific operations. That is why GPT-3 hasn't learned arithmetic and can't learn arithmetic, no matter how much data it is fed. It's just not possible to represent the rules of arithmetic in a propositional language. A first-oder language and the ability to define relations recursively are necessary.

Edit: OK, sorry, my claim about a first order language being necessary is maybe hard to substantiate outside of Peano arithmetic. But, recursion and the ability to represent variables are absolutely necessary. See primitive recursive functions: https://en.wikipedia.org/wiki/Primitive_recursive_function.


>Because otherwise, how do you know that your system has learned the "rules of arithmetic", as per your comment, and not something completely different?

Presumably because it answers correctly for examples it hasn't explicitly seen in training. While its plausible that it has seen all two-digit sums in during the course of training, its not a given.

>Besides, GPT-3 is a piece of software, it's not a child or a grown up human, who can make mistakes because their memory fails or because they get overwhelmed by the complexity of executing a complex set of rules.

GPT-3 can become "overwhelmed" by the complexity of the problem extending beyond its feed-forward computation window.

>If a piece of software implements a set of rules, it's usually able to execute them right every time, without failure, certainly so for relatively simple rules like arithmetic.

But a computer system that "computes" through manipulations of language representations is fundamentally different than computer systems that came before. Carrying over the intuition from computers as bit-manipulators to manipulators of language representations is a mistake.

> so why would a huge language model, running on very expensive hardware, fail?

Impedance mismatch? It turns out performing tasks on a computational substrate not suited to those tasks comes with severe drawbacks. But we already knew that.

>So in what sense can GPT-3 be said to have "learned the rules of arithmetic"? Learned them, how, if it can't represent them?

It could know how to sum individual digits through memorization and learn the carry rule. It may be incapable of recursion and thus incapable of summing arbitrarily long digits. But learning the carry rule is most of the way there.

>Similarly, such systems can't represent non-ground values, that is they can't represent the concept of a variable.

I see no reason to accept this. Multi-layer networks seem to be well-suited for abstract representations and manipulations on non-ground values. Ground-values are the input into the network, but higher layers represent on the abstract properties of the ground-values within its receptive field, rather than the particulars of the ground-values. For example, the location and direction of an edge rather than the particular in the form of an edge.


>> I see no reason to accept this.

Yes, I'm aware it's very difficult to get people to believe this outside of AI research. Of course, it is entirely uncontroversial and very well understood by researchers. For example, I was in a presentation by a gentleman who works at DeepMind last year and who works on neuro-symbolic integration and he was asked a question along the lines of "how can you model first order logic without variables?" and he pointed out that he had a footnote on one of his slides where he was noting this limitation and that work was underway to address it.

Regarding arithmetic, none of the points made in your comment made in the GPT-3 paper. In fact, the paper makes no attempt to explain what makes GPT-3 capable of performing arithmetic, other than to say that the mistakes in carrying a one suggest that it's actually trying to perform computation and failing. So I have to ask, where do these points come from?

What I mean is, you seem to have a theory about how GPT-3 works. Where does it come from? I apologise if this comes across as personal or unfair, but many commenters in this thread and similar conversations express strong opinions and give detailed explanations about how GPT-3 and similar models work. I am always left wondering where all this information comes from, given that usually it can't be found in the sources I'd expect to find it, namely the work that is being discussed (namely, the GPT-3 paper, in this case).


>For example, I was in a presentation by a gentleman who works at DeepMind last year and who works on neuro-symbolic integration

Sure, neural networks don't operate on proper variables and so in the context of neuro-symbolic processing I'm sure this is a significant hurdle. But in general, abstract representations is part-and-parcel of what makes deep learning powerful. And such an abstract representation is all that's needed for a neural arithmetic unit.

Here[1] is a study on GPT-2 that demonstrates its middle layers develop a representation of syntax and part-of-speech, the sorts of abstract representations that would be needed to develop a mechanism to do abstract arithmetic.

>What I mean is, you seem to have a theory about how GPT-3 works. Where does it come from?

Studies like the one mentioned, and reasonable extrapolation from knowledge of DL and other transformer architectures. We are not totally ignorant on how GPT-3 works.

[1] https://aletheap.github.io/posts/2020/07/looking-for-grammar...


My comment above discussed the inability of neural networks (and propositional, attribute-value learners in general) to represent variables. I'm sorry, but I can't see how your comment or the post you link to show that neural networks can represent variables.

I do not quite understand the relation between "abstract representations that would be needed to develop a mechanism to do abstract arithmetic" and variables. I'm also not sure what you mean by "abstract arithmetic", or what mechanisms you mean. Can you please explain?

Also, I had thought we shared an understanding that the ability to represent primitive recursive functions (which presupose the ability to represent variables and recursion) is necessary to represent arithmetic. Your above comment now makes me doubt this, also. Can you clarify?

Finally, the link above is a blog post. I wouldn't call it a study. But, can you say where in that post I can find the theory about GPT-3's function that you express above?


As usual, YeGoblynQueene never has a good thing to say about deep learning, and isn't nearly as expert as they think they are. Unlike you, OP has actually been paying attention, and he knows that the GPT-3 paper seriously understates the arithmetic performance of GPT-3 because they failed to deal with the BPE issue. If you use that by adding commas, you drastically improve the arithmetic. Matt Brockman has done some more systematic evaluation: http://gptprompts.wikidot.com/logic:math


And did we mention you can't run it yourself?


The Aggregate performance graph shows how GPT-3 does translation and other tasks without ever learning to do that. Just by a simple example of translation, it understands the task and does translation. By another example, it can do math, and by another one it can do reasoning, or write react apps. All without having been explicitly trained in those tasks.

This means we could potentially dig out thousands of uses out of it by carefully crafting triggers. It's a more general kind of tool than what we're accustomed to work with. It opens a new direction that might become a large field in five years, if they manage to make it run on a regular computer.

<rant>I imagine a GPT-3 like model coupled with search (having Google in an internal loop), trained with multimedia - images, videos, audio, papers, code so it has grounded concepts, and being able to generate text, images, video and code as output. Then I imagine having curated thousands of tasks from the community and added in the training set so it becomes much more efficient, and having all these capabilities exposed as a general AI library. It will be able to work with any modality and you will be able to describe your task in natural language. All of these are possible today. GPT-3 has shown the power of learning to predict on 500B tokens.</>


>All without having been explicitly trained in those tasks.

Can you elaborate on what kind of training data was used here? I'm curious.


500 billion word pieces (it splits words into few-character long pieces), which comes down to 100-200B words, scraped off the internet. It used mostly CommonCrawl as training data.

It's like the proverbial witch's pot where they put everything in and out comes the magic.


Yes, that's an apt metaphor. Magic, indeed.


>> The Aggregate performance graph shows how GPT-3 does translation and other tasks without ever learning to do that.

But it also shows it's not very good on any of those tasks. In any case, machine translation, despite its great popularity as a natural language processing task, is another AI task for which we do not have good metrics.

>> Just by a simple example of translation, it understands the task and does translation. By another example, it can do math, and by another one it can do reasoning, or write react apps.

I think in general, there's a tendency to overestimate the capabilities of GPT-3 for various reasons. Speaking of "understanding" and "reasoning" is really not justified.

On the one hand, it's a language model and most of the tasks it's applied to are tasks for which we don't have very good metrics or benchmarks. Like I say in my earliest comment above, natural language understanding metrics are very bad at measuring "understanding" and we don't even have a commonly agreed definition of what that means. Basically, many benchmarks are defined as classification tasks, e.g. with multiple choice questions supposedly testing a model's understanding, but without any way to ensure that a system is not overfitting to statistical regularities in the dataset - and, indeed, language models have often been shown to do exactly that (e.g. see [1]).

On the other hand, OpenAI very aggressively promotes its systems (not just GPT-3) to users outside of AI research and those users have no way to perform a systematic evaluation of such claims, so they are left with good old eyballing [2] of stuff like language generation or translation, etc. It's all too easy for such users to be impressed by a few hand-picked examples provided by OpenAI itself, or by other users who also don't have the capacity for systematic evaluation (and who hand-pick their results out of undue excitement, rather than for any other reason).

The result is that there is a public perception that OpenAI's language models are much better than they really are. If memory serves, OpenAI made a very big todo about how GPT-2 was so good it was dangerous, etc. Well, now we have GPT-3 which is reportedly even more better- but it's served as an API. Doesn't sound that dangerous and it all sounds a lot more as hype than actual progress.

____________

[1] Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference

https://arxiv.org/abs/1902.01007

[2] I keep saying that word, but it's actually semi-formal terminology. There was a paper about evaluating the results of grammar induction algorithms that used it. I'll see if I can find it.


We need not go so deep into semantics, if it is real understanding or not. It solves SuperGLUE and other 'reasoning' tasks without training on them, and yes, at a lower accuracy. The amazing part is that it can be prompted to various tasks like that.


But what does it mean, if it's beating an irrelevant benchmark for a task that is poorly defined? Is it really amazing if it's passing a test that doesn't mean anything at all, just because it wasn't trained to pass that test? So it happens to pass the test. So what? What did we learn from that?

I believe such a situation would generate much less debate in software engineering: "my program passes all my unit tests, but it still crashes". Well, yes. Your program passes all your unit tests, because your unit tests are missing the point, not because your code works.


I don't know that the benchmarks capture the subjective difference in GPT-2 and GPT-3. It's much better. I really feel like the lack of explainability is preventing us from understanding how to compose prompts to achieve the best outcome. There are subtle input differences (including whitespace) that result in large differences in output.


> So basically, GPT-3 is big, but it's not that good in tasks for which there are formal benchmarks

It is very good in all those tasks. In the paper all the numbers comparing GPT-3 with other models on those benchmark are comparing GPT-3 in a zero/one/few shot setting (so no gradient step), against the previous state of the art finetuned for hours or days on the specific task with millions of gradients steps. If you had time and money to finetue GPT-3 on the specific task there is every reason to believe the gap would be huge.

This is the big thing about GPT-3, the promise of not needing to finetune anymore. This is huge in term of productivity but also because it allows you to use the model in settings for which there are basically no dataset available to finetune on.


>> If you had time and money to finetue GPT-3 on the specific task there is every reason to believe the gap would be huge.

On the contrary, there is every reason to assume that GPT-3 cannot significantly improve its results on those tasks with extensive fine tuning. Because, if GPT-3 could significantly improve its performance on those tasks with extensive fine-tuning, the OpenAI paper would be reporting those results (edit: OpenAI sure has the time and money to finetune their model).

As we all know by now, there is strong bias in never reporting negative results in machine learning as in other research fields, so we can be reasonably certain that if there is an obvious experiment to perform and that experiment is missing from a paper, it was attemtped and the results were poor.


I entirely disagree with your interpretation :

> "OpenAI sure has the time and money to finetune their model"

This model is absolutly gigantic. In term of training it's a nightmare. I am pretty sure they don't have the capacity to train 10 of them in parallel. So finetuning on all the downstream task needs to be done basically in serial and takes forever. They might have a lot of money, but time is as important for them than for anyone else, their lab isn't at the edge of black hole.

If what you are saying is true I am pretty sure they would have reported it because it would be an extremely interesting and important result in my opinion. If finetuning gave you no advantage over a few shot settings, it would basically mean that the model already know everything there is to know about the task just from it's pretraining and any additional training is useless as the model is not learning anything.

Finally, given the pre-training curves with various model sizes, we clearly haven't reached saturation there, there is no indication anywhere that we have reached saturation on downstream task.

So, for me the far more likely explanation is that fine-tuning on downstream tasks is indeed very costly (time and/or money) even by their standard and isn't even on topic for this paper.


An important thing that one learns in academia is that nobody gives anyone the benefit of the doubt. This is a lesson learned the hard way: you make an unsupported claim and a murder of angry reviewers pounce upon it like hyenas hungry for flesh.

Outside of academia I see this thing very often. "Oh, I'm sure if it was easy to do that, they'd have done it". No. This is not how a piece of research work is evaluated, not even a piece of work in deep learning, a field that has abandoned all pretensions to science in recent years.

My personal advice (speaking as someone who has been attacked by the hyenas and paid my pound of flesh) is that one should always demand the highest standard of proof for any claim in a research paper. That, if one really wishes to know what's going on. Intellectual curiosity and scientific wonderment should not result in gullibility.


> Outside of academia I see this thing very often. "Oh, I'm sure if it was easy to do that, they'd have done it".

Do you not see the cognitive dissonance? You were precisely claiming that because it should be easy to do they most likely have done it, and if they didn't report results it's because it failed.

You are making this unsupported claim with 0 evidence to back you up.


I did not say it's easy or hard. I said they can do it because they have the time and money.

In any case, it's an obvious thing to do and there's no obvious explanation of why they didn't, given that they could.


Not necessarily. There have been a number of papers trying to emphasize the generalization power of a network rather than "it got a SOTA number", and so leave off fine tuning results. Because that would distract from the point of the paper.


Moore's law is exponential. If parallel processing power continues to rise exponentially...


> GPT-3 shows that it’s possible for a model to someday reach human levels of generalization in NLP

This is a big, big claim tossed in as a throwaway line. GPT-3 shows that we haven't yet reached the limit of the "just throw more resources at it" school of AI development, but it doesn't automatically follow that it'll reach human levels of NLP if you give it enough resources.

By analogy, this is claiming "New, larger steam locomotives are strictly faster than older, smaller ones, so this shows with enough coal it's possible for steam-engines to someday drive interstellar transport at 0.5c"


> By analogy, this is claiming "New, larger steam locomotives are strictly faster than older, smaller ones, so this shows with enough coal it's possible for steam-engines to someday drive interstellar transport at 0.5c"

Making that argument about scaling up high energy fuels and engines would make interstellar travel possible would be a pretty good hypothesis. Turns out you need rocket fuel and rocket engines, not coal and steam engines.

GPT-3 might not be the engine, but throwing insane amounts of electrical energy and computing power at the problem might just get us there.


>strictly faster

The question is how fast is fast enough.

At what point do you stop being able to distinguish an agenda determined social engineering bot from a non native speaking teenage 4chan troll ?

The bar is not much higher - much less than the gain of function already seen from v2. So will it be v4? v5?

For a lot of disruptive applications, just a little bit "faster" is all you need for the bad actors to act.


> At what point do you stop being able to distinguish an agenda determined social engineering bot from a non native speaking teenage 4chan troll ?

I'm...not sure why you would want to? I can't see either being particularly helpful regarding discourse on the Internet or in real life.


> that it’s possible for a model to someday reach human levels of generalization in NLP

Fully disagree. There is no evidence that we are now closer to human level text understanding than before GPT-3. Yes, GPT-3 produces grammatically correct sentences but it still can't form a coherent idea or meaning and express it in sentences afterwards - that's what humans would do. GPT-3 is just better at obfuscating that the model has no clue what it's talking about.

Nevertheless, compared with Eliza or other bots from 1960-2000 we made remarkable progress.


> Yes, GPT-3 produces grammatically correct sentences but it still can't form a coherent idea or meaning and express it in sentences afterwards - that's what humans would do.

There's considerable debate over whether humans can have a coherent idea before it is reduced into symbolic language, and it's not clear how you would distinguish this sequence of events, anyway.

It's pretty clear what GPT-3 does doesn't match the common rationalization of human subjective experience of cognition, but it's not at all clear, AFAICT, that what the human brain does matches that rationalization, either.

Which is not to say I think GPT-3 has anything like the kind, much less the level, of understanding humans have, I just think some of the common arguments arrayed in casually dismissing it are based on suppositions about human cognition that aren't sufficiently examined.


> There's considerable debate over whether humans can have a coherent idea before it is reduced into symbolic language, and it's not clear how you would distinguish this sequence of events, anyway.

This sounds like the thing that is so silly a person has to be very educated to believe it.

You know how I know that humans have coherent ideas before rendering it into symbolic language... because they do. The GPT-3 paper, itself, is a bunch of ideas that were formed and then rendered into symbolic language. Literally every new book/work/presentation that a person decided to write because they said to themselves "I have a great idea, I should share it with the world" comes from this.

GPT-3 doesn't even know when it thinks it has a new idea. Contrast this with humans, which have to go out of their way to communicate and promote their idea because they understand it's novel.


I think the idea here is that symbolic language is the tool with which we forge our ideas.

To continue with the GPT example, the ideas are not rendered into symbolic language only at the point of writing - the ideas are formed in the mind using symbols and then expressed afterwards

I see it like this: With no way to represent my thoughts and the context around them succinctly, I would not be able to string various complex ideas together coherently


> GPT-3 is just better at obfuscating that the model has no clue what it's talking about.

There's an interesting angle to this as well, which is that it makes the models "unfalsifiable" in a way. You can never prove whether the data is a straight compression lookup or whether the network has generated an insight, because the model can't tell you (to anthropomorphize).

This, more than anything else, would be the value of having explainable models. I don't blame the ML community for this gap, but it puts them in the unenviable position of not being scientific in the Popperian sense. There's a great element of "trust us, the intelligence is in there" or "the intelligence will get there", but when everything's a mashup of more hardware and data without a known structure, we ultimate have to take that on faith. We can do empirical measurements after the fact, but the guiding projections for how an experiment should behave is lacking. (I don't think anyone in any community has a satisfactory answer to this btw.)


I wonder if 10 years ago one would believe that we’d have a model capable of generating an article that fools many given a complex prompt, yet is essentially incapable of any reasoning.


The Chinese room argument shows lots of people were thinking about it being a possibility but to have it happen so soon is something else.


I assume I wasn't the only early reader of blogs who thought they point at this being possible, but I admit I did not expect to see it so soon.


It is capable of reasoning. Good lord what does it have to do to prove it???!


There's also the fact that it's completely missing structures analogous to human memory/consciousness. I'm not talking about philosophical notions of consciousness and qualia here but the difference, in neuroscience, between a subliminal stimuli and a superliminal stimuli. Stimuli that aren't abstracted and moved to working memory leave no trace in the brain just a couple of seconds after they're removed, analogously to the 2048 character memory of GPT3. That's something that's still conspicuously missing from GPT3 and AlphaStar if you've watched enough of it's matches.


Hasn't GPT-3 been out for a while now? Why are there so many articles about it on the front page over the past few days?


They recently gave certain devs early access and a lot of demos are only now being shared by people with followings


I've noticed this too. Maybe it's anecdotal, but it feels like an attempt to astroturf the dev community or drive up buzz. I can't escape the superlative-laden Tweets and articles about GPT-3.


API/Playground is in private beta, I guess more people got access to it and started showing results. And upon seeing these results people get excited and talk about implications, etc.

Like, generating code and solving math is pretty damn good for a model which is not trained for generating code and solving math. Few weeks ago people didn't know it can do that.


It blew up on Twitter a few days ago, and it's still in closed beta so people are eager to see what it can do.


Marketing campaign.


What can GPT-3 do that is useful? I can understand that it outputs text based on a prompt and some input but I don't understand how that can be leveraged to do useful things. I can ask it trivia questions but is it better than doing a Wikipedia search? Is it possible to give it a prompt of some scientific paper say and write an article about it? Or will it just generate nonsense?


At the moment it feels like the only use-cases are:

- toys/games: AI Dungeon, fun chatbots, generally playing around with generating text in a certain style

- humour: generating jokes/memes

- deception: trolling/disinfo campaigns/spam/gaming advertisement market by creating garbage content.

I guess there will be use cases where a human operator can use it to simplify their job (or lower the skill level required to do a job) by curating and editing generated content instead of writing it themselves. I'm thinking things like simple writing jobs where quality isn't that important: social media posts, newsletters?


I like the idea of using it as a tool to mitigate writers block. Say you're writing a school paper and you've done your research but you can't seem to keep your train of thought flowing. With gpt3 you could have it generate a sentence prompt based on your previous writing, or maybe you have it write your conclusion for you that summarizes your thoughts from the paper. There are lots of possibilities in that arena.


Fahrenheit 451 "wall" family and the movie Her are futures I see from this. The paper also lists stuff like better auto complete.


Even as far back as Eliza, there were reports of people finding that talking to the chatbot was therapeutic, and people would spill their guts out to it for hours.

With GPT-3's conversation having much more verisimilitude, the perceived therapeutic value should be much greater.

Many lonely, hurting people want someone to talk to, and for some knowing that they're talking to a machine increases their comfort in revealing personal details to it.

"Conversation as a service" could be a very desirable product for many.


Here is someone on Twitter who has been building some, albeit toys, using the tool. It seems pretty powerful: https://twitter.com/jsngr

I found this to be very impressive: https://twitter.com/jsngr/status/1284874360952692736

Not only did it eventually generate code based on his example for the new domain he specified, it was even able to generate new domains.


It's somewhat impressive but it didn't do what he wanted it to. It didn't list temperatures, instead it listed weather descriptions. And to get it to do this he had to write a full template. We also don't get to see if this was the first attempt at doing this or can the model generate code like this reliably.


You'll be looking for reasons to downplay it while dozens of startups receive funding to build automatic app generation tools.


You are pushing this idea so hard and blasting this thread with copy/paste comments. It's obvious you're excited about it and that's great but there is a lot that isn't shown and that's where the real work is being done.

Just because it looks like magic, doesn't mean there isn't someone pulling some strings somewhere else to aid the illusion.


>You are pushing this idea so hard and blasting this thread with copy/paste comments.

I'm sorry but this comment is absurd. I'm assuming you are insinuating that I am astroturfing for OpenAI which is against guidelines. Not only that but in fact none of my comments are copy pasted so its doubly ridiculous.

>Just because it looks like magic

No is saying it's magic but the thread is full of people saying: "Uhh my random prompt got bad results, this is just hype, blah blah blah..." People looking for excuses to trash the model instead of seeing what it could mean for the industry going forward. None of this is married to OpenAi either, there are plenty of groups replicating GPT and they will likely have similar capabilities.


What if you ask it a serious question you are genuinely interested in getting an answer to and it responds with a real clue? They say it seems like going to reach the human level, doesn't this mean it will NOT "just generate nonsense"? Humans use to generate nonsense but it often makes at least some sense.


People need to look WAY beyond generating text here. There are lots of demos on Twitter already of people generating fully functioning code in various languages with english inputs.

https://twitter.com/sharifshameem


I built a recommendation engine on top of GPT-3: http://serendipityrecs.com/

I think other applications exist, you just have to figure out how to extract what you want from the prompt/response interface.


I tried the recommendation engine with some prompts that are fairly simple, but would require a deeper understanding than just a keyword search to get right (e.g. "books where magic is secretly technology"). The recommendation engine does not seem to be doing significantly better than a keyword search would. I would probably have more success with raw Google.

This is the sort of thing that dampens the hype for me a bit. I keep assuming that the demonstrations are not cherrypicking examples, but it's kind of hard to believe I'm just uniquely good at picking problem cases.


Actually I like the results a lot. How does the logic behind it work, is it just the output of GPT-3? Do you need to parse it somehow? If it is just GPT-3 its kind of amazing how good the results are, since it was never trained for that. A bit scary to be honest.


It's mostly just GPT-3. You do need to parse it, as far as GPT-3 is concerned it's just doing text in/text out. I'm doing some ranking logic behind the scenes to make the results more consistently reliable, but you could directly ask the model for recommendations. GPT-3 has a lot of knowledge of the real world from having read so much of the internet.


Wow kind of amazing. Don't think a human would give me much better results. It would be interesting to know how fast OpenAi is able to retrain / update the model based on new data.


If you give it some text and tack "TLDR" on the end then it will give you a summary of the text.


https://twitter.com/sharifshameem

TLDR: Generate working apps in seconds from english text input.

If you can't see the power here you might not be paying attention fully.


As a programmer but without that much of data science background, could someone explain what is the whole hype/breakthrough of GPT-3? I know it generates content that makes sense, "HTML+CSS" and some other text based stuff.

But in layman terms, what is the huge deal with it?


It's fascinating to think about it. We put into writing almost everything we experience from the world. GPT-3 demonstrates that if the model is big enough, it can produce intelligent answers, much better than GPT-2. It can fail spectacularly, but humans can do that too :)

It was also shown that it's scalable, so there's no reason we couldn't make it an order of magnitude bigger if we wanted to. That future system might be a game changer for search and AI.

Then we just have to ask great questions like "what is the meaning of life the universe and everything" and watch the loading animation for 7.5 million years.

It's probably not that far fetched to think that this is the Fat Man bomb of AI. China, Russia probably already allocated resources to build their own model. The arms race is on.


> GPT-3 demonstrates that if the model is big enough, it can produce intelligent answers

You could make the same statement about a lookup table.


And it would be a fine statement :) That table would be huge in size compared to GPT and less general. GPT is way more compressed.


It's doing all these things (basic code generation, arithmetic, function composition) without specifically being trained for any of these tasks. It was basically just trained to predict the next most likely word from a prompt and yet all those interesting things "emerges" from it. And giving it more compute power it still keeps improving so far.


> And giving it more compute power it still keeps improving so far.

And giving it more data, it still keeps improving. The extra compute is necessary to compress and train the model.

At the end of the day, we've proven if you have more human knowledge in your lookup table, you can generate more things convincingly. Search engines have been doing something similar for a long time, except instead of generating things, they find them. Search engines also have the advantage that a lot of their knowledge does actually have real semantic models, which seems more intelligent to me.


The first two versions generated content based on a prompt that was coherent for a sentence or a paragraph.

GPT-3 generates an entire article that's largely coherent (though still lacking nuanced meaning).

Each version of GPT has exponentially increasing parameter count. Raw compute seems to be winning out here. Basically for this application they aren't seeming to hit diminishing returns for increasing parameters.

Now people are wondering what GPT-4 can do. If you can write human-level coherent articles or if it passes the turing test or something it's gonna trigger an arms race for governments to obtain this capability.


Disagree with the largely coherent statement. The output I've read from GPT-3 is still quite confusing, for example this[0] blog post demonstrates some of it's capabilities. Reading through the article it generates is very confusing. Statements are made and then later contradicted, elements that do not actually exist on the page are referenced. And there just doesn't seem to be an actual cohesive narrative. All the sentences are grammatically correct and make sense on their own, but the thought connecting the sentences making some sort of larger point just isn't there. And this is after the author generated 10 different articles and picked the most intelligible one.

[0] https://maraoz.com/2020/07/18/openai-gpt3/


Thanks for the explanation ;)

> Each version of GPT has exponentially increasing parameter count What are parameters in this context?

> Now people are wondering what GPT-4 can do. If you can write human-level coherent articles or if it passes the turing test or something [...] Well, when I first heard about "text generation" last year what I first raised to my friends was "surely this will make fake news even worse". But well, any technology come with a bright and dark side. I try to not be pessimistic in these moments hehe


parameters here are just the size of the neural net of the model. Each unit of computation has its own parameter that needs to be fit. Bigger model = more layers or units of computations = more parameters.


This is like watching parents talking about their kid develop over time and getting excited over their evolving writing ability.

Doesn't mean we get Turing or Gandhi at the end of the story. Or that there is any control over what is produced. To produce those 2 nature had to iterate over 100 billion humans.


Picture this grim future:

1. Companies will use models like these to generate automatic job listings

2. People will use models like these to generate automatic job applications

3. Companies from 1) will use automated tools to parse and analyse said applications

We're gonna end up with abysmal SNR, as the sheer number of applicants will explode.


It doesn’t sound like the worst way to hire model builders.


> GPT-3 shows that it’s possible for a model to someday reach human levels of generalization in NLP—and once the impossible becomes possible, it’s only a matter of time until it becomes practical.

The fact it can [occasionally] generalize and participate in a verbal conversation on a human level means it can potentially organize and direct productive (on a human level at least) action if assigned as a manager to a team of humans. Doesn't it?


Use it to post to Reddit and create a feedback loop. Help to hasten the AI takeover.

What I find fascinating is the carful craft of framing things out of context, as a tool for politics.

https://en.wikipedia.org/wiki/AI_takeover

Narrative wins. This will be a great tool to misinform.


There is already a pretty good one using GPT-2

https://www.reddit.com/r/SubSimulatorGPT2/


If you look at the folks who are backing this, it is clear what their intent is.


We're incredibly excited about GPT-3. I think there is a fair bit hype exhaustion, especially from the likes of OpenAI ("our AI is too dangerous to release"). So this is completely understandable.

However I think what's missing here is our benchmarks (a la Turing test) are about negation as opposed to affirmation. We tend to evaluate AI on whether or not we can discern the fact that it's AI. We seek to negate it as human, as opposed to affirming it as human (or close to). And this is not the right mindset when it comes to AGI because the gap between "obviously not human" and "human-like" is enormous. These are all definitely steps in the right direction, and the applications for even robotic process automation will be huge. But we're not even close to having nets that can reason about even the most basic things.


> However I think what's missing here is our benchmarks (a la Turing test) are about negation as opposed to affirmation.

I would question the value of the Turing test, and maybe think that's not a great example for AI.

There's always been this assumption that passing the Turing test would mean we had AI, but I think that was always predicated on the machine generating the outputs. With the GPT- models, it's not clear that this isn't a form of compression over an immense data set, and we're sending pre-existing _human_ responses back to the user. It implies to me that we can pass the Turing test with a large enough data set and no (or very little) intelligence.

All of this makes me believe "These are all definitely steps in the right direction" is questionable.


Can someone answer a question for a layman:

Is the number of parameters to be read as the indicator of how "advanced" the training has gotten, or the accuracy of the output? As in, this dataset/training has gotten to the point that it understands the 160 billionth small exception to the general rules of how language should be interpreted, or constructed, to be considered believable?

Sometimes (as a layman) I look at this and think instead, wow, how slow these ML algorithms must be that they need 160 billion parameters to predict correctly.

Is it one of these statements?


> Is the number of parameters to be read as the indicator of how "advanced" the training has gotten, or the accuracy of the output?

Accuracy, of course.

> As in, this dataset/training has gotten to the point that it understands the 160 billionth small exception to the general rules of how language should be interpreted, or constructed, to be considered believable?

It memorized a lot of facts, but it is also better at figuring out rules than its predecessor.

> Sometimes (as a layman) I look at this and think instead, wow, how slow these ML algorithms must be that they need 160 billion parameters to predict correctly.

There are more specialized models which are trained on much smaller datasets. They usually are given a specific task, such as classification. GPT-3 is trained on a very large dataset in unsupervised way. And as a result, it is able to handle a very wide variety of tasks (without re-training). If you tell it to do math, it will do math. If you tell it to translate between different languages, it will do translation. If you tell it to write JS code, it will write JS code. If you ask it to write a Harry Potter parody as if it was written by Hemingway, it will do that.

So the whole point is that it can do pretty much any imaginable task involving text given only few examples, with no specific training.


Thanks!


GPT-3 won't matter until all of the work put into it can be replicated by someone else, and as it stands right now it's just a toy for people with far too many resources to spare.


Can a model like this be realized as a chip?

I mean, 300GB memory is crazy. But back in the days I ran full HD videos on a netbook, because it hat special optimzied chips.


Yeah but that chip is probably going to look a lot like a 300GB memory chip.


Like big external power bank? Doesn't sound too bad.


What I would really like to see is an analysis of the weight that individual chunks of context contribute to the final probability for an output. That might allow for better prompting; when GPT-3 gets the wrong answer it would be fairly obvious what was lacking in the prompting given the context it thinks has the most influence. Also, good prompts would presumably have a higher weight.


Once discovered, you could have a kind of outline builder where you provide all the right linking outline context to have it generate appropriating paragraphs around it.


For me it’s performance passes a usability threshold for human + machine collaboration.

It is possible to use it to generate texts which can be quickly curated or edited by a user.

Specifically this could be useful in authoring fiction (sci-fi novels, game dialog, etc.).

Imagine the Star Trek holodeck characters. It’s dialog quality is nearly good enough to make that level of interaction feasible.


I can see comment and review farms wanting their hands on GPT really bad. Imagine being able to generate 1000s if Human like reviews with positive sentiment. Businesses pay real money for this.

Right now AI is only available to mega tech corps. Even OpenAI is a closed research lab. So one can infer that AI will always be the divider.


Speaking of unconscious bias, this quote from the original article made me raise my eyebrows: "We wanted to identify how good an average person on the internet is at detecting language model outputs, so we focused on participants drawn from the general US population."


Clearly the average USian is not the average person, but given that they need to speak English to have a shot (unless gpt-3 works for other languages?) it doesn't seem like a terrible approximation


So can you feed a video to it and let 'dream' of all possible outcome? If this is achievable then the internet will eventually become a loop-back device for our senses just like the matrix movie.


I don't feel like this article answered the question in its headline. I don't even know what GPT-3 is so maybe even the tiniest bit of background could have helped.


it is also a pretty good case study in the "bitter lesson" and all but ensures that the future of AI will be driven by the companies with the deepest pockets.


OpenAI is far from having the deepest pockets.


We saw that the GPT-3 doesn't matter that much, right?


For me, it has kind of broken HN’s comment sections. I find myself jumping to the bottom of longer comments to look for “btw, this comment was written by gpt3”. To me it seems like we are going to be entering a perpetual April fools day where we never really know what’s real.


This has always been a problem with HN due to the robotic nature of its audience.


Ouch!


Not personally, no.


Take gpt-2, bert or any other attention based language model. Apply few shot learning on any domain to model you take. You will not see so much meaningful difference that matters. Even the GPT-3 super large than the other models. There is a hype because people able to use that domain adaptation easily without serving huge ml models by themselves. OpenAI presents that serving already. That makes it easy to explore language models to developers that don't know ML that much to tune.


There’s been a lot of discussion on HN lately about the implications of GPT-3: are we moving toward general AI or is this just a scaled up party trick?

I have no idea whether scaling up transformers another 100x will lead to something resembling real intelligence, but it certainly seems possible. In particular, I find the arguments against this possibility to be fairly silly. These are the three main arguments I have seen for why GPT type models will never approach AGI, and the reasons I don’t think they are valid:

1. GPT-3 requires vast amounts of training data (hundreds of billions of words from the internet), whereas a human can become fluent in natural language after “training on” much less data.

It’s not analogous to compare the GPT-3 training corpus to the education that one human receives before becoming fluent in natural language. We benefit from millions of years of evolution across billions of organisms. A massive amount of “training” is incorporated in the brain of an infant. This must be the case because even if you could somehow read all of the text on the internet to your dog, it would not approach intelligence.

2. There was no intellectual breakthrough in the development of GPT-3 just more “brute force” training on more data, therefore it or its successors can’t achieve a breakthrough in intelligence.

We must remember that there was no intellectual breakthrough required for the development of human intelligence, it was just more of the same evolution. The core pattern of evolution is extremely simple: take an organism, generate random variants from it, see which ones do the best, and then create new variants from the good ones. This is perhaps the most basic scheme you could think of that might actually work. Evolution has produced amazing results in spite of its simplicity and inefficiency (random variations!) because it generalizes well to many environments and scales extremely well to millions of generations. These are exactly the strengths of gradient descent. In fact, gradient descent follows the same structure as evolution, except that at each iteration we don’t generate random variations, but instead make an educated guess about what a fruitful variation would be based on available gradient information. This improves learning efficiency tremendously; imagine being able to say: “this Neanderthal died because he stepped into a fire, let’s add some fire-avoidance to the next one” instead of waiting for this trait to be generated randomly. Speaking of brute force and amount of training, it would take 355 years to train GPT-3 on a single GPU. This strikes me as quite fast relative to evolutionary time scales.

3. Machines lack capabilities fundamental to the human experience: in particular feeling pleasure, pain, and an internal drive toward a goal.

Indeed, if you turn a computer off in the middle of a computation, there is no evidence of suffering. And if the computer successfully writes a blog post of human quality, it feels no joy in the human sense. My claim is that these sensations are not core aspects of intelligence. In fact, pleasure and pain are very primitive developments that even cockroaches can claim. The most impressively human accomplishments (harnessing vast external energy sources, breaking out of bare subsistence, landing on the moon, etc.) were made in spite of the fact that we are messy bags of emotion that unpredictably feel anger, jealousy, despondence or elation. These emotional responses were selected for because they were useful as proximate goalposts orienting us toward reproduction—basically, to overcome forgetfulness in the pursuit of long-term goals. If in the future we can simply direct a computer to write a captivating novel without needing to program in lots of visceral intermediate stimuli to keep it on track, so much the better.


A stronger contrast between human natural language learning and GPT-3 is that the human is an active participant, continually trying things out and getting feedback. GPT-3's training is entirely passive -- and when all humans have of a language is a corpus of fragments with unknown referents (Minoan Linear A), we don't do well.


Also, nothing prevents an AI model from having a goal like maximizing pleasure or uptime. Then we could see some "real" intelligence, in the sense that the AI will do things that will help it 'survive', which might include even trying to upgrade itself.


it seems that this is the future




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: