I hate to be that guy, but when I saw the article the other day that ended with "plot twist, this article was autogenerated with GPT-3!", I was not impressed, mainly because it looked like content farm's content to me, conveying no information. It basically looked like an incredibly costly spam tool.
But then I thought of possible applications. Give humans a new tool, and they will amaze you, I'm sure we'll see cool applications of this tech in the future (and probably horrible ones too). One thing that I could think of where it would totally rock : games. If games were able to use such tech, their game world could be filled with casual discussions rather than the developers needing to fill everything. This means that NPCs in a game could have ever changing discussions, and even answer to the player about the subject they're discussing. Their discussion could even change based on what is happening in the world or what the players are doing, even to things with the smallest impact, or what is happening directly around them at the moment, without devs needing to script it all. That would be awesome.
But yeah, this won't happen easily unless the model can be embedded and shipped on local computers.
There needs to be a way of absolutely limiting it, from what I've seen the prompt can't do that with 100% success. A blacklist would be useful in some applications (people tolerate their search engine screwing up once in a while), but not in games. Still, maybe we can accept limited performance by limiting to the gameworld + some manual texts. It's not NPC dialog is literature-level anyway...
So the model can be transformed to output part-of-speech words, dependency grammar trees or named entities in input even if training data is sparse. Similarily, you could fine tune it to produce game lore and then see how it works for that. The model easily switches to different modes of operation and achieves state-of-the-art or close to state-of-the-art performance.
It's quite funny how NLP folks tried to solve low level tasks (POS tagging, NER, Named entity relationship extraction, dependency parsing, sentiment classification etc.) to get to higher level tasks (good summarization, machine translation, text generation, question & answering) and now a single model captures all the low level stuff for free and does high level stuff so good that finetuning it to do low level stuff is unnecessary.
1. Set up a pseudo-adversary NN trained to recognize context-correct speech based on a small corpus.
2. Craft a GPT-3 prompt to get N 9s of accuracy
3.Retry if the answer fails the test from the other NN
4. Set a cap on retries based on how many 9s your prompt got
5. If cap exceeded, return a context-free or limited context response
1. Set up a pseudo-adversary NN trained to recognize context-correct speech based on a small corpus.
2. Craft a GPT-3 prompt to get N 9s of accuracy
3.Retry if the answer fails the test from the other NN
4. Set a cap on retries based on how many 9s your prompt got
5. If cap exceeded, return a context-free or limited context response
Usually, it's fine-tuning (training a model starting from an earlier model). They can use a small amount of text to fine-tune GPT-3 to their liking.
Someone else in the thread suggested using a verifier based on game data and maybe that would be fine. The key IMHO must be some kind of NN trained only on game data, either GPT-3 itself or a verifier of some sort.
If a player got something extremely disagreeable from an NPC, was this a fluke or did a dev intentionally add it in training as to make it more likely? There's no way to prove innocence. Add in trigger-happy social media + governments, the potential cost to devs & publishers could get all the way to bans/boycotts/legal threats. Most companies do not wish to risk this, so mitigations must be in place.
I think this might be the way to go forward regarding language models in games: Offer them through cloud computing for GaaS.
It feels like what I imagined playing in a Star Trek Holodeck would be (well the dialog anyway).
Where I could see it working better would be for e.g. newspaper headlines in a grand-strategy or SimCity-like game. When you do something crazy like have Lichtenstein conquer Western Europe, it could be funny to have some auto-generated commentary on said crazy state of affairs.
Indeed, some pretty impressive demos have been sprouting up on Twitter. Things like generating properly running code from english descriptions of the desired functionality are potentially game changing.
Further down there's a plot titled "Aggregate Performance Across Benchmarks" where we can see that performance is on average about 50%. I don't know what the baseline for this plot should be (what is the expected average if the tasks are solved by a random classifier?) but comparing this plot with the plot at the top of the article, it doesn't really look like there's a huge improvement in accuracy with a huge increase in the number of parameters. In fact, it's quite the contrary: there's a small increase and a very smooth, almost linear curve. So that's an exponential increase in the use of resources for an almost linear increase in performance? That's not that impressive.
So it appears that the big thing about GPT-3 is that it's big.
It should also be noted that the public interest about GPT-3 is mostly focused on its ability to generate text, for which there is no good metric. So basically, GPT-3 is big, but it's not that good in tasks for which there are formal benchmarks (such as they are, because Natural Language Understanding benchmarks are often very poorly made and don't really measure what they say they measure) and we can't really tell how good it is in the one task that interests most people the most.
If OpenAI develops GPT-4, with 1T parameters, I wouldn't be surprised to see a performance gain larger than the jump between GPT-2 and GPT-3.
What GPT-3 shows us is that we're going to have ML systems that can write at the level of an average human pretty soon now.
If you give me a huge corpus of chinese texts and a very long time, I might be able to figure out what character goes with what other, find the various structures in the text and then be able to generate a somewhat convincing made up chinese text while still not understanding a word of it.
These GPT-3 demos are impressive because they look like real text with proper syntax and grammar, but they still express absolutely nothing. It reads like a long series of rambling that goes nowhere. There's no intent behind it.
It reminds me of these videos of apes imitating humans by using their tools, banging hammers ineffectively. They are able to copy the appearance of the behavior, but not the reasoning behind it. They don't get why we bang hammers or what it achieves.
My point is, GPT-3 is operating at human levels for certain contexts. I think it would get passing grades on essays in a lot of schools in the US, for instance, just based on syntax and grammar.
Given what I've seen so far with GPT-3, that simple idea would have to have already been discussed at length on forums on the internet and in the corpus.
Usually books have facts and studies that they use as supporting points. Many of the connections they make between the subject material and their thesis are unique, and this forms their supporting argument. GPT-3 is rearranging words and sentences to resemble structures it's seen before, but it does not create novel facts.
The interesting part is that GPT-3's leap in performance can be attributed to scaling. That's easier to do than inventing completely new approaches. Scale data, scale compute, scale money, then you have something you couldn't have invented directly.
Can I ask- who says this? Is it your personal opinion? Is it the conclusion of the article above, as you understand it? Is it a commonly held opinion of some of the experts in language modelling?
I am asking because everytime there is a claim like "X is important because Y" and someone points out that "Y" is not that interesting, if someone else then says "X is important because Z" and Z is not Y, it's very difficult to have a productive conversation, because it's very difficult to know what we are talking about. Of course, this is the internets and not scientific debate (typically carried out in peer reviewed publications) but if the goalposts keep moving all the time, it's pointless to even try to have a conversation about the merits and flaws of such a complex system. That, with all due respect.
Now, regarding whether GPT-3 is slowing down, it isn't, but it's not going very fast either. Like I say, the curve in the middle of the article that shows acccuracy as a function of parameters is quite flat. Depending on how you want to define diminishing returns, the image painted by the accuracy plot is not that far from it and in any case average accuracy is pretty disappointing.
>> What GPT-3 shows us is that we're going to have ML systems that can write at the level of an average human pretty soon now.
Like I say, there's no good metrics for this kind of task. We have no way to determine what is writing "at the level of an average humen" (let alone what is an "average human"), except eyballing output and expressing a subjective opinion. Anyone might claim that GPT-3 is already capable of writing "at the level of an average human". Anyone might claim that GPT-2 is. Or a Hidden Markove Model, or an n-gram model. Such claims really don't mean anything at all.
It is important to note that this is exactly the task that OpenAI has publicised the most, with GPT-3: a poorly defined task with no good metrics. This insistence in promoting an ability that cannot be objectively evaluated as being a strong point of the model is strong evidence that the model is not nearly as good as advertised.
DL is getting increasingly specialized hardware, there's plenty of growth there. Plus what we're seeing here is GPT scaling up without algorithmic changes. Algorithms are advancing too.
I might try to generate soft-science essays with GPT-3 at one of my universities to see if it passes through TA filters.
>> GPT-3 can also finally do arithmetic, something GPT-2 was unable to do well.
Is a preposterous claim that is very poorly supported by the data in the GPT-3 paper . Figure 3.10 in the paper summarises the results. The authors tested addition between two to five digits, subtraction between two to five digits and multiplication between two digits drawn uniformly at random from [0,100]. There was also a composite task of addition, subtraction and multiplication with single-digit numbers (e.g. 6+(4*8), etc).
On all tasks, other than two and three digit addition and subtraction, accuracy was uniformly under 20%. The other four tasks achieved high accuracy with more parameters.
Of course, this doesn't show that the larger models "learned arithmetic". Two- and three-digit addition and subtraction are likely to be much better represented in a natural language dataset than other operations (and note of course the conspicuous absence of division). So it's safe to assume that the model has seen all the operations it's asked to repeat and knows their results by heart. Remember that for two and three digit addition and subtraction one only needs a dataset with the numbers up to 999, which is really tiny and easy to memorise.
Edit: the authors note that they "spot checked" whether the model is simply memorising results, by searching for three-digit addition examples in their dataset. Out of 2000 three-digit addition problems they failed to find more than 17% in their dataset which "suggests" that the model had not ever seen the problems before. Or, it "suggests" the search was not capable of finding many more existing matches. In any case, why only "spot-check" three-digit addition? Who knows. The paper doesn't say. Certainly, one- and two-digit addition and subtraction should be much more common in a natural language dataset. The authors also say that the model often makes mistakes such as not carrying a one, so it must be actually performing arithmetic! Or, it's simply reproducing common arithmetic mistakes in its dataset. Overall, this sort of "testing" of arithmetic prowess simply doesn't cut the mustard.
Edit 2: Also, no information about how many arithmetic problems of each type were tried. One? Ten? One hundred? Where all arithmetic tasks tested with the same number of problems? Unknown.
I found these series of tweets almost unbelievable in how well GPT-3 seems to reason about function composition f(f(x)).
"I wonder if the AI would be better at math if you told it to show it's work":
As I say in previous comments, no. It sounds much more like a system that can reproduce resutls it has seen in trainig, but has no general concept of arithmetic.
This would also explain the square root example example easily. Also, the examples in the OP's linked tweet are very simple examples of square roots and function composition that are very likely to have been lifted verbatim from some textbook, or who knows what ...and that's the problem, because who knows what the model has flat memorised and what it's composing from smaller components.
>> It clearly can perform arithmetic, and we don't need to trust OpenAI now that users can interact with the model.
The paper I link above performed a systematic evaluation of GPT-3's arithmetic ability. Playing around with the OpenAI API and eyballing a few results is not going to give a clearer understaning of its abilities.
In general, hitting a language model with a few queries is never going to give any clear understanding of its capabilities. Systematic evaluation is always necessary and the average user (or the non-average user) is not going to be able to do that.
In which GPT-3 answers the question "what is one hundred and five divided by three?" with "35.7". It also gave several other close-but-not-correct answers. It seems pretty unlikely these are all present in the training set, and surely can't all have been lifted verbatim.
I agree systematic testing is probably more useful, but find it really hard to believe this is all happening without any sort of model of arithmetic.
If there is enough arithmetical structure in the training corpus, eventually the best way to predict the training corpus is just to learn arithmetic rather than memorize every instance of arithmetical structure. Transformers have been shown to be equivalent to graph neural networks, so in some sense they have the power to self-discover novel architectures in service to learning a data set. So it is quite reasonable that it could have learned generic rules of arithmetic.
In any case, I think you're applying an overly permissive criterion for
learning "generic rules of arithmetic". It's clear from the paper linked above
that GPT-3 is extremely limited in its ability to return correct results given
arithmetic operations as input. The only task that it performs with 100%
accuracy is two-digit addition. It cannot even perform two-digit subtraction
with perfect accuracy and it's all downhill from there.
Furthermore, like I say division is conspicuously absent from the set of
tested tasks reported in the paper, as are any operations with five digits or
more [EDIT: sorry, that's "operations with more than five digits"- my bad.]. Going again by my heuristic from an earlier comment, that researchers
publish positive results and avoid publishing negative results, this tells us
that GPT-3 can't perform any division at all with any accuracy and it can't
perform any arithmetic operations with five digits with any accuracy [EDIT: again, that's "more than five digits". Apologies].
That is hardly the hallmark of a system that has "learned generic rules of
arithmetic". It is far more likely that GPT-3 has learned to reproduce results
that it has seen during training. Even more so since, as I say in my comment
above, it is much better at operations that are likely to be found more often
in a natural language corpus.
The cliff for GPT-3's arithmetic ability is likely due to the fact that it can't do recursive/recurrent calculations. That is, it can't reprocess and refine a tentative answer to improve it. You can't do arbitrary arithmetic with a finite amount of substrate without this sort of recursion or recurrency. The fact that it can only do two digits with 100% accuracy could be a hardware or architecture limitation.
Because otherwise, how do you know that your system has learned the "rules of arithmetic", as per your comment, and not something completely different? And like I say in my other comments, there's a very obvious alternative about what that something completely different could be: a representation of already seen results.
Besides, GPT-3 is a piece of software, it's not a child or a grown up human, who can make mistakes because their memory fails or because they get overwhelmed by the complexity of executing a complex set of rules. If a piece of software implements a set of rules, it's usually able to execute them right every time, without failure, certainly so for relatively simple rules like arithmetic. Pocket calculators with tiny resources can do that and they can do it with very long sequences of numbers, so why would a huge language model, running on very expensive hardware, fail?
>> The cliff for GPT-3's arithmetic ability is likely due to the fact that it can't do recursive/recurrent calculations.
Well, yes, exactly that. If a system can't represent recursion then it can't represent arithmetic between arbitrary numbers. Hell, without recursion, a system can't even count to arbitrary numbers. So in what sense can GPT-3 be said to have "learned the rules of arithmetic"? Learned them, how, if it can't represent them?
Actually, your observation about recursion is the first thing I'd have normally said, but it doesn't seem to be commonly understood that neural networks (and propositional, attribute-value, learners in general) can not represent recursion. Similarly, such systems can't represent non-ground values, that is they can't represent the concept of a variable. But that's a big part of why they can't build general theories. In terms of arithmetic, it means they can't represent the relation x + y = z because they can't represent x, y and z as universally quantified variables. The only remaining alternative is to represent every ground expression, like 1 + 1 = 2, 1 + 2 = 3, etc. But that's not the rules of arithmetic! That's only some instances of specific operations. That is why GPT-3 hasn't learned arithmetic and can't learn arithmetic, no matter how much data it is fed. It's just not possible to represent the rules of arithmetic in a propositional language. A first-oder language and the ability to define relations recursively are necessary.
Edit: OK, sorry, my claim about a first order language being necessary is maybe hard to substantiate outside of Peano arithmetic. But, recursion and the ability to represent variables are absolutely necessary. See primitive recursive functions: https://en.wikipedia.org/wiki/Primitive_recursive_function.
Presumably because it answers correctly for examples it hasn't explicitly seen in training. While its plausible that it has seen all two-digit sums in during the course of training, its not a given.
>Besides, GPT-3 is a piece of software, it's not a child or a grown up human, who can make mistakes because their memory fails or because they get overwhelmed by the complexity of executing a complex set of rules.
GPT-3 can become "overwhelmed" by the complexity of the problem extending beyond its feed-forward computation window.
>If a piece of software implements a set of rules, it's usually able to execute them right every time, without failure, certainly so for relatively simple rules like arithmetic.
But a computer system that "computes" through manipulations of language representations is fundamentally different than computer systems that came before. Carrying over the intuition from computers as bit-manipulators to manipulators of language representations is a mistake.
> so why would a huge language model, running on very expensive hardware, fail?
Impedance mismatch? It turns out performing tasks on a computational substrate not suited to those tasks comes with severe drawbacks. But we already knew that.
>So in what sense can GPT-3 be said to have "learned the rules of arithmetic"? Learned them, how, if it can't represent them?
It could know how to sum individual digits through memorization and learn the carry rule. It may be incapable of recursion and thus incapable of summing arbitrarily long digits. But learning the carry rule is most of the way there.
>Similarly, such systems can't represent non-ground values, that is they can't represent the concept of a variable.
I see no reason to accept this. Multi-layer networks seem to be well-suited for abstract representations and manipulations on non-ground values. Ground-values are the input into the network, but higher layers represent on the abstract properties of the ground-values within its receptive field, rather than the particulars of the ground-values. For example, the location and direction of an edge rather than the particular in the form of an edge.
Yes, I'm aware it's very difficult to get people to believe this outside of AI
research. Of course, it is entirely uncontroversial and very well understood
by researchers. For example, I was in a presentation by a gentleman who works
at DeepMind last year and who works on neuro-symbolic integration and he was
asked a question along the lines of "how can you model first order logic
without variables?" and he pointed out that he had a footnote on one of his
slides where he was noting this limitation and that work was underway to
Regarding arithmetic, none of the points made in your comment made in the
GPT-3 paper. In fact, the paper makes no attempt to explain what makes GPT-3
capable of performing arithmetic, other than to say that the mistakes in
carrying a one suggest that it's actually trying to perform computation and
failing. So I have to ask, where do these points come from?
What I mean is, you seem to have a theory about how GPT-3 works. Where does it
come from? I apologise if this comes across as personal or unfair, but many
commenters in this thread and similar conversations express strong opinions
and give detailed explanations about how GPT-3 and similar models work. I am
always left wondering where all this information comes from, given that
usually it can't be found in the sources I'd expect to find it, namely the
work that is being discussed (namely, the GPT-3 paper, in this case).
Sure, neural networks don't operate on proper variables and so in the context of neuro-symbolic processing I'm sure this is a significant hurdle. But in general, abstract representations is part-and-parcel of what makes deep learning powerful. And such an abstract representation is all that's needed for a neural arithmetic unit.
Here is a study on GPT-2 that demonstrates its middle layers develop a representation of syntax and part-of-speech, the sorts of abstract representations that would be needed to develop a mechanism to do abstract arithmetic.
>What I mean is, you seem to have a theory about how GPT-3 works. Where does it come from?
Studies like the one mentioned, and reasonable extrapolation from knowledge of DL and other transformer architectures. We are not totally ignorant on how GPT-3 works.
I do not quite understand the relation between "abstract representations that would be needed to develop a mechanism to do abstract arithmetic" and variables. I'm also not sure what you mean by "abstract arithmetic", or what mechanisms you mean. Can you please explain?
Also, I had thought we shared an understanding that the ability to represent primitive recursive functions (which presupose the ability to represent variables and recursion) is necessary to represent arithmetic. Your above comment now makes me doubt this, also. Can you clarify?
Finally, the link above is a blog post. I wouldn't call it a study. But, can you say where in that post I can find the theory about GPT-3's function that you express above?
This means we could potentially dig out thousands of uses out of it by carefully crafting triggers. It's a more general kind of tool than what we're accustomed to work with. It opens a new direction that might become a large field in five years, if they manage to make it run on a regular computer.
<rant>I imagine a GPT-3 like model coupled with search (having Google in an internal loop), trained with multimedia - images, videos, audio, papers, code so it has grounded concepts, and being able to generate text, images, video and code as output. Then I imagine having curated thousands of tasks from the community and added in the training set so it becomes much more efficient, and having all these capabilities exposed as a general AI library. It will be able to work with any modality and you will be able to describe your task in natural language. All of these are possible today. GPT-3 has shown the power of learning to predict on 500B tokens.</>
Can you elaborate on what kind of training data was used here? I'm curious.
It's like the proverbial witch's pot where they put everything in and out comes the magic.
But it also shows it's not very good on any of those tasks. In any case, machine translation, despite its great popularity as a natural language processing task, is another AI task for which we do not have good metrics.
>> Just by a simple example of translation, it understands the task and does translation. By another example, it can do math, and by another one it can do reasoning, or write react apps.
I think in general, there's a tendency to overestimate the capabilities of GPT-3 for various reasons. Speaking of "understanding" and "reasoning" is really not justified.
On the one hand, it's a language model and most of the tasks it's applied to are tasks for which we don't have very good metrics or benchmarks. Like I say in my earliest comment above, natural language understanding metrics are very bad at measuring "understanding" and we don't even have a commonly agreed definition of what that means. Basically, many benchmarks are defined as classification tasks, e.g. with multiple choice questions supposedly testing a model's understanding, but without any way to ensure that a system is not overfitting to statistical regularities in the dataset - and, indeed, language models have often been shown to do exactly that (e.g. see ).
On the other hand, OpenAI very aggressively promotes its systems (not just GPT-3) to users outside of AI research and those users have no way to perform a systematic evaluation of such claims, so they are left with good old eyballing  of stuff like language generation or translation, etc. It's all too easy for such users to be impressed by a few hand-picked examples provided by OpenAI itself, or by other users who also don't have the capacity for systematic evaluation (and who hand-pick their results out of undue excitement, rather than for any other reason).
The result is that there is a public perception that OpenAI's language models are much better than they really are. If memory serves, OpenAI made a very big todo about how GPT-2 was so good it was dangerous, etc. Well, now we have GPT-3 which is reportedly even more better- but it's served as an API. Doesn't sound that dangerous and it all sounds a lot more as hype than actual progress.
 Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference
 I keep saying that word, but it's actually semi-formal terminology. There was a paper about evaluating the results of grammar induction algorithms that used it. I'll see if I can find it.
I believe such a situation would generate much less debate in software engineering: "my program passes all my unit tests, but it still crashes". Well, yes. Your program passes all your unit tests, because your unit tests are missing the point, not because your code works.
It is very good in all those tasks. In the paper all the numbers comparing GPT-3 with other models on those benchmark are comparing GPT-3 in a zero/one/few shot setting (so no gradient step), against the previous state of the art finetuned for hours or days on the specific task with millions of gradients steps. If you had time and money to finetue GPT-3 on the specific task there is every reason to believe the gap would be huge.
This is the big thing about GPT-3, the promise of not needing to finetune anymore. This is huge in term of productivity but also because it allows you to use the model in settings for which there are basically no dataset available to finetune on.
On the contrary, there is every reason to assume that GPT-3 cannot significantly improve its results on those tasks with extensive fine tuning. Because, if GPT-3 could significantly improve its performance on those tasks with extensive fine-tuning, the OpenAI paper would be reporting those results (edit: OpenAI sure has the time and money to finetune their model).
As we all know by now, there is strong bias in never reporting negative results in machine learning as in other research fields, so we can be reasonably certain that if there is an obvious experiment to perform and that experiment is missing from a paper, it was attemtped and the results were poor.
> "OpenAI sure has the time and money to finetune their model"
This model is absolutly gigantic. In term of training it's a nightmare. I am pretty sure they don't have the capacity to train 10 of them in parallel. So finetuning on all the downstream task needs to be done basically in serial and takes forever. They might have a lot of money, but time is as important for them than for anyone else, their lab isn't at the edge of black hole.
If what you are saying is true I am pretty sure they would have reported it because it would be an extremely interesting and important result in my opinion. If finetuning gave you no advantage over a few shot settings, it would basically mean that the model already know everything there is to know about the task just from it's pretraining and any additional training is useless as the model is not learning anything.
Finally, given the pre-training curves with various model sizes, we clearly haven't reached saturation there, there is no indication anywhere that we have reached saturation on downstream task.
So, for me the far more likely explanation is that fine-tuning on downstream tasks is indeed very costly (time and/or money) even by their standard and isn't even on topic for this paper.
Outside of academia I see this thing very often. "Oh, I'm sure if it was easy to do that, they'd have done it". No. This is not how a piece of research work is evaluated, not even a piece of work in deep learning, a field that has abandoned all pretensions to science in recent years.
My personal advice (speaking as someone who has been attacked by the hyenas and paid my pound of flesh) is that one should always demand the highest standard of proof for any claim in a research paper. That, if one really wishes to know what's going on. Intellectual curiosity and scientific wonderment should not result in gullibility.
Do you not see the cognitive dissonance? You were precisely claiming that because it should be easy to do they most likely have done it, and if they didn't report results it's because it failed.
You are making this unsupported claim with 0 evidence to back you up.
In any case, it's an obvious thing to do and there's no obvious explanation of why they didn't, given that they could.
This is a big, big claim tossed in as a throwaway line. GPT-3 shows that we haven't yet reached the limit of the "just throw more resources at it" school of AI development, but it doesn't automatically follow that it'll reach human levels of NLP if you give it enough resources.
By analogy, this is claiming "New, larger steam locomotives are strictly faster than older, smaller ones, so this shows with enough coal it's possible for steam-engines to someday drive interstellar transport at 0.5c"
Making that argument about scaling up high energy fuels and engines would make interstellar travel possible would be a pretty good hypothesis. Turns out you need rocket fuel and rocket engines, not coal and steam engines.
GPT-3 might not be the engine, but throwing insane amounts of electrical energy and computing power at the problem might just get us there.
The question is how fast is fast enough.
At what point do you stop being able to distinguish an agenda determined social engineering bot from a non native speaking teenage 4chan troll ?
The bar is not much higher - much less than the gain of function already seen from v2. So will it be v4? v5?
For a lot of disruptive applications, just a little bit "faster" is all you need for the bad actors to act.
I'm...not sure why you would want to? I can't see either being particularly helpful regarding discourse on the Internet or in real life.
Fully disagree. There is no evidence that we are now closer to human level text understanding than before GPT-3. Yes, GPT-3 produces grammatically correct sentences but it still can't form a coherent idea or meaning and express it in sentences afterwards - that's what humans would do. GPT-3 is just better at obfuscating that the model has no clue what it's talking about.
Nevertheless, compared with Eliza or other bots from 1960-2000 we made remarkable progress.
There's considerable debate over whether humans can have a coherent idea before it is reduced into symbolic language, and it's not clear how you would distinguish this sequence of events, anyway.
It's pretty clear what GPT-3 does doesn't match the common rationalization of human subjective experience of cognition, but it's not at all clear, AFAICT, that what the human brain does matches that rationalization, either.
Which is not to say I think GPT-3 has anything like the kind, much less the level, of understanding humans have, I just think some of the common arguments arrayed in casually dismissing it are based on suppositions about human cognition that aren't sufficiently examined.
This sounds like the thing that is so silly a person has to be very educated to believe it.
You know how I know that humans have coherent ideas before rendering it into symbolic language... because they do. The GPT-3 paper, itself, is a bunch of ideas that were formed and then rendered into symbolic language. Literally every new book/work/presentation that a person decided to write because they said to themselves "I have a great idea, I should share it with the world" comes from this.
GPT-3 doesn't even know when it thinks it has a new idea. Contrast this with humans, which have to go out of their way to communicate and promote their idea because they understand it's novel.
To continue with the GPT example, the ideas are not rendered into symbolic language only at the point of writing - the ideas are formed in the mind using symbols and then expressed afterwards
I see it like this:
With no way to represent my thoughts and the context around them succinctly, I would not be able to string various complex ideas together coherently
There's an interesting angle to this as well, which is that it makes the models "unfalsifiable" in a way. You can never prove whether the data is a straight compression lookup or whether the network has generated an insight, because the model can't tell you (to anthropomorphize).
This, more than anything else, would be the value of having explainable models. I don't blame the ML community for this gap, but it puts them in the unenviable position of not being scientific in the Popperian sense. There's a great element of "trust us, the intelligence is in there" or "the intelligence will get there", but when everything's a mashup of more hardware and data without a known structure, we ultimate have to take that on faith. We can do empirical measurements after the fact, but the guiding projections for how an experiment should behave is lacking. (I don't think anyone in any community has a satisfactory answer to this btw.)
Like, generating code and solving math is pretty damn good for a model which is not trained for generating code and solving math. Few weeks ago people didn't know it can do that.
- toys/games: AI Dungeon, fun chatbots, generally playing around with generating text in a certain style
- humour: generating jokes/memes
- deception: trolling/disinfo campaigns/spam/gaming advertisement market by creating garbage content.
I guess there will be use cases where a human operator can use it to simplify their job (or lower the skill level required to do a job) by curating and editing generated content instead of writing it themselves. I'm thinking things like simple writing jobs where quality isn't that important: social media posts, newsletters?
With GPT-3's conversation having much more verisimilitude, the perceived therapeutic value should be much greater.
Many lonely, hurting people want someone to talk to, and for some knowing that they're talking to a machine increases their comfort in revealing personal details to it.
"Conversation as a service" could be a very desirable product for many.
I found this to be very impressive:
Not only did it eventually generate code based on his example for the new domain he specified, it was even able to generate new domains.
Just because it looks like magic, doesn't mean there isn't someone pulling some strings somewhere else to aid the illusion.
I'm sorry but this comment is absurd. I'm assuming you are insinuating that I am astroturfing for OpenAI which is against guidelines. Not only that but in fact none of my comments are copy pasted so its doubly ridiculous.
>Just because it looks like magic
No is saying it's magic but the thread is full of people saying: "Uhh my random prompt got bad results, this is just hype, blah blah blah..." People looking for excuses to trash the model instead of seeing what it could mean for the industry going forward. None of this is married to OpenAi either, there are plenty of groups replicating GPT and they will likely have similar capabilities.
I think other applications exist, you just have to figure out how to extract what you want from the prompt/response interface.
This is the sort of thing that dampens the hype for me a bit. I keep assuming that the demonstrations are not cherrypicking examples, but it's kind of hard to believe I'm just uniquely good at picking problem cases.
TLDR: Generate working apps in seconds from english text input.
If you can't see the power here you might not be paying attention fully.
But in layman terms, what is the huge deal with it?
It was also shown that it's scalable, so there's no reason we couldn't make it an order of magnitude bigger if we wanted to. That future system might be a game changer for search and AI.
Then we just have to ask great questions like "what is the meaning of life the universe and everything" and watch the loading animation for 7.5 million years.
It's probably not that far fetched to think that this is the Fat Man bomb of AI. China, Russia probably already allocated resources to build their own model. The arms race is on.
You could make the same statement about a lookup table.
And giving it more data, it still keeps improving. The extra compute is necessary to compress and train the model.
At the end of the day, we've proven if you have more human knowledge in your lookup table, you can generate more things convincingly. Search engines have been doing something similar for a long time, except instead of generating things, they find them. Search engines also have the advantage that a lot of their knowledge does actually have real semantic models, which seems more intelligent to me.
GPT-3 generates an entire article that's largely coherent (though still lacking nuanced meaning).
Each version of GPT has exponentially increasing parameter count. Raw compute seems to be winning out here. Basically for this application they aren't seeming to hit diminishing returns for increasing parameters.
Now people are wondering what GPT-4 can do. If you can write human-level coherent articles or if it passes the turing test or something it's gonna trigger an arms race for governments to obtain this capability.
> Each version of GPT has exponentially increasing parameter count
What are parameters in this context?
> Now people are wondering what GPT-4 can do. If you can write human-level coherent articles or if it passes the turing test or something [...]
Well, when I first heard about "text generation" last year what I first raised to my friends was "surely this will make fake news even worse". But well, any technology come with a bright and dark side. I try to not be pessimistic in these moments hehe
Doesn't mean we get Turing or Gandhi at the end of the story. Or that there is any control over what is produced. To produce those 2 nature had to iterate over 100 billion humans.
1. Companies will use models like these to generate automatic job listings
2. People will use models like these to generate automatic job applications
3. Companies from 1) will use automated tools to parse and analyse said applications
We're gonna end up with abysmal SNR, as the sheer number of applicants will explode.
The fact it can [occasionally] generalize and participate in a verbal conversation on a human level means it can potentially organize and direct productive (on a human level at least) action if assigned as a manager to a team of humans. Doesn't it?
What I find fascinating is the carful craft of framing things out of context, as a tool for politics.
Narrative wins. This will be a great tool to misinform.
However I think what's missing here is our benchmarks (a la Turing test) are about negation as opposed to affirmation. We tend to evaluate AI on whether or not we can discern the fact that it's AI. We seek to negate it as human, as opposed to affirming it as human (or close to). And this is not the right mindset when it comes to AGI because the gap between "obviously not human" and "human-like" is enormous. These are all definitely steps in the right direction, and the applications for even robotic process automation will be huge. But we're not even close to having nets that can reason about even the most basic things.
I would question the value of the Turing test, and maybe think that's not a great example for AI.
There's always been this assumption that passing the Turing test would mean we had AI, but I think that was always predicated on the machine generating the outputs. With the GPT- models, it's not clear that this isn't a form of compression over an immense data set, and we're sending pre-existing _human_ responses back to the user. It implies to me that we can pass the Turing test with a large enough data set and no (or very little) intelligence.
All of this makes me believe "These are all definitely steps in the right direction" is questionable.
Is the number of parameters to be read as the indicator of how "advanced" the training has gotten, or the accuracy of the output? As in, this dataset/training has gotten to the point that it understands the 160 billionth small exception to the general rules of how language should be interpreted, or constructed, to be considered believable?
Sometimes (as a layman) I look at this and think instead, wow, how slow these ML algorithms must be that they need 160 billion parameters to predict correctly.
Is it one of these statements?
Accuracy, of course.
> As in, this dataset/training has gotten to the point that it understands the 160 billionth small exception to the general rules of how language should be interpreted, or constructed, to be considered believable?
It memorized a lot of facts, but it is also better at figuring out rules than its predecessor.
> Sometimes (as a layman) I look at this and think instead, wow, how slow these ML algorithms must be that they need 160 billion parameters to predict correctly.
There are more specialized models which are trained on much smaller datasets. They usually are given a specific task, such as classification. GPT-3 is trained on a very large dataset in unsupervised way. And as a result, it is able to handle a very wide variety of tasks (without re-training). If you tell it to do math, it will do math. If you tell it to translate between different languages, it will do translation. If you tell it to write JS code, it will write JS code. If you ask it to write a Harry Potter parody as if it was written by Hemingway, it will do that.
So the whole point is that it can do pretty much any imaginable task involving text given only few examples, with no specific training.
I mean, 300GB memory is crazy. But back in the days I ran full HD videos on a netbook, because it hat special optimzied chips.
It is possible to use it to generate texts which can be quickly curated or edited by a user.
Specifically this could be useful in authoring fiction (sci-fi novels, game dialog, etc.).
Imagine the Star Trek holodeck characters. It’s dialog quality is nearly good enough to make that level of interaction feasible.
Right now AI is only available to mega tech corps. Even OpenAI is a closed research lab. So one can infer that AI will always be the divider.
I have no idea whether scaling up transformers another 100x will lead to something resembling real intelligence, but it certainly seems possible. In particular, I find the arguments against this possibility to be fairly silly. These are the three main arguments I have seen for why GPT type models will never approach AGI, and the reasons I don’t think they are valid:
1. GPT-3 requires vast amounts of training data (hundreds of billions of words from the internet), whereas a human can become fluent in natural language after “training on” much less data.
It’s not analogous to compare the GPT-3 training corpus to the education that one human receives before becoming fluent in natural language. We benefit from millions of years of evolution across billions of organisms. A massive amount of “training” is incorporated in the brain of an infant. This must be the case because even if you could somehow read all of the text on the internet to your dog, it would not approach intelligence.
2. There was no intellectual breakthrough in the development of GPT-3 just more “brute force” training on more data, therefore it or its successors can’t achieve a breakthrough in intelligence.
We must remember that there was no intellectual breakthrough required for the development of human intelligence, it was just more of the same evolution. The core pattern of evolution is extremely simple: take an organism, generate random variants from it, see which ones do the best, and then create new variants from the good ones. This is perhaps the most basic scheme you could think of that might actually work. Evolution has produced amazing results in spite of its simplicity and inefficiency (random variations!) because it generalizes well to many environments and scales extremely well to millions of generations. These are exactly the strengths of gradient descent. In fact, gradient descent follows the same structure as evolution, except that at each iteration we don’t generate random variations, but instead make an educated guess about what a fruitful variation would be based on available gradient information. This improves learning efficiency tremendously; imagine being able to say: “this Neanderthal died because he stepped into a fire, let’s add some fire-avoidance to the next one” instead of waiting for this trait to be generated randomly. Speaking of brute force and amount of training, it would take 355 years to train GPT-3 on a single GPU. This strikes me as quite fast relative to evolutionary time scales.
3. Machines lack capabilities fundamental to the human experience: in particular feeling pleasure, pain, and an internal drive toward a goal.
Indeed, if you turn a computer off in the middle of a computation, there is no evidence of suffering. And if the computer successfully writes a blog post of human quality, it feels no joy in the human sense. My claim is that these sensations are not core aspects of intelligence. In fact, pleasure and pain are very primitive developments that even cockroaches can claim. The most impressively human accomplishments (harnessing vast external energy sources, breaking out of bare subsistence, landing on the moon, etc.) were made in spite of the fact that we are messy bags of emotion that unpredictably feel anger, jealousy, despondence or elation. These emotional responses were selected for because they were useful as proximate goalposts orienting us toward reproduction—basically, to overcome forgetfulness in the pursuit of long-term goals. If in the future we can simply direct a computer to write a captivating novel without needing to program in lots of visceral intermediate stimuli to keep it on track, so much the better.