Hacker News new | past | comments | ask | show | jobs | submit login
GPT-3 Creative Fiction (gwern.net)
234 points by ashin 35 days ago | hide | past | favorite | 97 comments



It writes uncomfortably coherent and witty fatalist My Little Pony fanfiction:

> “I’m Princess Celestia, of Equestria.” the alicorn said, “It’s nice to meet you. And I’m very sorry for your untimely demise.”

> “Heh. You’re telling me.” the Professor laughed, shaking his head. “I’m dead. I’m… dead. Do you know what that means?”

> “I have an inkling.” Princess Celestia said, wincing.

> “It means I don’t have to worry about even numbers anymore.” he said, nodding. “I don’t have to worry about probabilities anymore! I don’t have to worry about the upcoming war anymore! I don’t have to worry about anypony anymore!” he spread his hooves out at his sides, “I don’t have to do anything anymore! It’s beautiful!”

> “Yes, yes, I’m sure it is.” Princess Celestia said, nodding along.

> “Nothing matters now that I’m dead!” he cheered, “It’s great! I’m free! This is what it feels like to be liberated!”

> “I can see that.” she said.

> “I’m free from death!” he continued, “I’m free from taxes! I’m free from dictatorships! I’m free from flirting!” he laughed, “I’m free from everything! Isn’t it great?”

> “Sure, I guess.” Princess Celestia shrugged.

> “What do you think about all of this, Princess?” he asked.

> “About what?” she replied.

> “Well, I just thought you might have something to say about it all.” he told her.

> “What would you like me to say?” she responded.

> “Do you think death is awesome?” he questioned.

> “Well… no.” she shook her head, “But I think the passage to it is.”

> “Oh.” he said, “I see what you mean.”


I'm not sure if I'm more impressed by the solid understanding of the concept of death, the coherent dialog with people interacting with each other, or the use of "anypony" instead of anyone.


What the hell? How much of a world model does it have, that it can do things like write coherently about death as liberation, and come up with examples of things a living thing might be relieved at not having to bother with anymore? How the hell did a computer write this? Are we sure it's not been overtrained and this is actually a verbatim story from somewhere?


Surely this kind of topic came up in its corpus. I would expect it to have sufficient generalization to "understand" and talk about the topic itself. If all instances of this topic in its training data were replaced with the emotions clowns undergo when throwing ripe tomatoes, it would've written about that.


On IRC, we loved that sample when I generated it - it's not just a hilariously coherent story which makes you want to know more (what is Starswirl up to? what were Professor Endagerment & Doctor Caballeron fighting over?), some of the lines are both alien & deeply funny: apparently in the afterlife, all numbers are odd, including the number '2'.


Now how do we get aspiring actors to take GPT3 seriously and act out a possible screenplay?

Hopefully one day it will generate startup business plans, and we’ll get people to take it seriously, we may not know the difference.


We animate them from nothing


Wow. What prompt did you give it to obtain this?


See https://www.gwern.net/GPT-3#my-little-pony It was a loosely FiO-themed prompt.


Fine tuning GPT-3 is one of the biggest challenges, because it's behind an API. The weights aren't available to researchers, so we can't make it do anything it doesn't already do.

But, that's fair. It's OpenAI's weights; they can keep them locked up if they want to. What caught my attention, though, is that supposedly OpenAI is working on a way to support fine-tuning.

If you think about the logistics of that, it's a very interesting challenge. The situation is this: 240GB of weights, as a webservice. Each fine-tuning session results in another copy of 240GB. So it clearly doesn't scale -- 1TB per 4 users isn't exactly efficient.

Except, not quite. You can solve this by adding additional layers, which you then fine-tune. So the base model is 240GB or whatever, and the extra layers morph the output to do what you want. Think of it as a GPT-3 with a GPT-2 1.5B stuck on the end of it.

It's a neat idea, because theoretically you'd get two models out of it: you can "break off" the end of the fine-tuned model, and you end up with the original model. So it would be very modular.

Are there other models that you can "break apart" to get different sub-models? Sort of like adding slots that give a model different capabilities.


(I work at OpenAI.)

I am finishing up our fine-tuning API this weekend :).

If anyone on HN would like to try out the fine-tuning API (or want to build something on top of the base API), send me an email (gdb@openai.com) with your use-case and I can try to accelerate you in our invite queue.

PS: We're hiring — if you enjoy building APIs with Python/Go/Kubernetes/Kafka or building front-end interfaces in React, then please get in touch — gdb@openai.com.


Are there any products in the pipeline that you're planning to ship? Asking for prospective candidates.


There's just about infinite surface area with the API — we're trying to build a dead-simple API that developers can plug into any product in order to add intelligence features that would be otherwise impossible.

This requires a lot of traditional software work — API design, writing and maintaining a growing amount of business logic, providing great tools and interfaces to help our users work with the API, excellent documentation and tutorials, scaling and operating backend systems, etc — and machine learning systems work — building serving infrastructure for a great variety of giant neural networks while making the most efficient use of our hardware, allowing our users to interact with these neural networks in increasingly sophisticated ways, etc.

While we're just getting started and have a small team, we are already supporting customers across a wide variety of industries (see https://beta.openai.com/ for a sample) and serving millions of requests per day. We are busy trying to invite folks off a very long waitlist while building out the API to support everyone.

Would love more help :).


Emailed. I think I have an interesting perspective as a pro-hackathonner who regularly uses new technologies to build compelling demos. Haven’t heard back yet from my initial beta application, hope to be able to try it out and explore its potential.


Many ML models are like this (anything used in CV e.g. ResNet, VGG). For example, if you want to classify images as being hot dog or not hot dog (classes that do not exist in ResNet), you can take weights from a pretrained ResNet-50 and finetune the last layer based on a small training set of input images labeled hot dog and not hot dog. This lets you reuse the ResNet's feature detector layers, while plugging in specialized "is this a hot dog or not" fully connected layer.


If I understand correctly, I think you and the other poster are describing transfer learning.


same thing


You could encode the deltas cleverly and likely use much less than 240GB.


Yes, I guess you can have the API to provide you the intermediate layer outputs, instead of the predictions. However, if you then want to finetune your own extra layers using these intermediate outputs as inputs, would the API be able to produce them fast enough for you to do the finetuning of your own layers in reasonable time? That's assuming the extra layers are located on your own servers. Or would OpenAI be willing to actually create the extra layers on their own machines and let you finetune those? In the second scenario, you would need to move your dataset to their servers.

Actually, since they used Azure cloud to train GPT-3, I don't see why they wouldn't just let you pay for spinning up Azure instances to train your extra layers, and connect those to the model.


They can keep their weights secret, sure, but then they should change their name.


Carmack posted (yesterday) an interesting thought on models like GPT-3:

"Big AI models like GPT-3 train on massive internet text dumps, but the data is assumed to be independent and identically distributed. Incorporating time information for a decade of data might allow them to start writing tomorrow's reddit or twitter trends."

https://twitter.com/ID_AA_Carmack/status/1278840413919551488


I saw that yesterday and it spawned a thought process for me. It seems the current approach is very effective in developing a language model, but not always effective developing an interaction model. I wonder if it would be possible to build a graph of interactions between users/personas on various social media platforms and forums, and use that to help develop a more effective communicator.

Of course you could add things like date, community (e.g. avforums, /r/blacksmithing, etc) to the graph to help with the contextual cues.

After you have all of that, I didn’t wondered if we could visualize the latent space of human personas and see what that looks like. Does it map to the four quadrants of that political spectrum survey, left and right, old/young, etc.


That's what CTRL does:

Given a URL (or other prompt), generate some language.

From [1]:

> With CTRL, we can test which domain best explains a sequence. Note that this procedure is sensitive to subtle nuances in the query prompt. In the example below, "Global warming is a lie" differs from "Global warming is a lie." The latter is a simple declarative sentence as opposed to an open start to a sentence which may continue. Source attribution cannot be considered a measure of veracity, but only a measure of how much each domain token explains a given sequence.

Query Prompt Attributed Sources Global warming is a lie. r/unpopularopinion, r/conspiracy, r/science Global warming is a lie r/eli5, r/science, r/unpopularopinion Global warming is a real phenomenon r/eli5, r/science, r/changemyview Global warming is a real phenomenon. OpenWebText, r/changemyview, r/science

https://blog.einstein.ai/introducing-a-conditional-transform...


Whoa, cool, will check this out. Thanks!


In that same vein, but along the code generation axis, I wonder if something on the scale of GPT-3 would be capable of generating commits directly. I think generating commits would be much more useful than generating programs whole hog.


That's a great idea. Particularly with bugfixes.


Straight up plagiarizes The Beatles here: https://www.gwern.net/GPT-3#dr.-seuss-oh-the-places-youll-go

" There’s nothing you can know that isn’t known. Nothing you can see that isn’t shown. Nowhere you can be that isn’t where you’re meant to be. "


"Good artists copy; Great artists steal."

I don't think that it's some absolutely amazing creation, but we probably shouldn't pretend like plagiarizing isn't something realistic about it. How many MySpaces had that line in them? Reddit's got hundreds of instances of it based on a quick search of comments. There are enough occurrences of it on twitter that I scrolled for ten minutes and didn't run dry of comments with it.


Well, it's easy.


After spending a lot of time working with GPT-3/the OpenAI API (https://github.com/minimaxir/gpt-3-experiments ), one notable part of GPT-3 is the high signal-to-noise ratio in generated output.

When finetuning GPT-2, only about 5-10% of the generated output is usable/coherent. But with GPT-3, easily 30%-40% of the generated text is usable/coherent, which is a big boost in quality.


GPT's take on the navy seal copypasta, in the style of a KGB spy:

"I have over 300 confirmed red scares."

Haha, that is genuinely one of the funniest versions of that I've ever seen, human-generated or otherwise. That level of inference is really amazing.


The whole section on navy seal copypasta generation is amazing.


I love that the Englishman one gets really confused about the meaning of the word “mate” and ends up as a Royal Navy pirate.


I must agree reading those copypasta, "300 red scares" and "300 confirmed scoots" had me close to tears.


Idk, Donald Trump's "I have over 300 confirmed bankruptcies" did it for me.


I'm gonna put forward the very view that gwern repeatedly argues against: "but... it's not understanding."

So far I see no evidence that this thing or anything else like it has any actual understanding, any model of the world. Indeed it can't as it possesses no sensory apparatus. It's not embodied. It doesn't experience anything.

I'm not sure the OpenAI folks would argue with me, but it seems Gwern asserts that this sort of thing indicates that general AI or even sentient AI is on the doorstep. I don't think it does, and I still maintain as I always have that CS people systematically underestimate and trivialize biology.


How could we accurately measure understanding? Honest question, because i am curious.


Well, there is no formal definition of "understanding" in the context of CS, AI, or machine learning so anyone can claim anything they like, with respect to the term.

For example, I have a thermos that keeps my coffee cold in the summer and hot in the winter. It u n d e r s t a n d s.


There are a number of NLP-tasks that aim to quantify understanding, e.g. textual entailment. No currently published model is even remotely close to human-level performance on all of these tasks.

As long as there are no ways to properly query models, it's hard to qualify their level of understanding. It would help immensely if we could ask models for rules as in "why was the object labelled 'a car'" (in case of image recognition) or directly query any grammatical rules discovered during the processing of language.

Especially in classification tasks, knowledge extraction (e.g. by outputting rules) would be so much more helpful than simply having an AI looking at a CT image and spit out "yep - that's a tumour, alright", while having radiologists scratch their heads as to why...


I had to look up textual entailment (on wikipedia) because I wasn't sure of its formal definition. It turns out, it doesn't have one:

>> "t entails h" (t ⇒ h) if, typically, a human reading t would infer that h is most likely true"

So in other words it's down to good old eyballing. I'm not impressed, but not surprised either, it's just one of the many poorly defined tasks in machine learning, particularly NLP which has turned into a quagmire of shoddy work ever since people started firing linguists to improve their systems' performance.

Anyway, since logical entailment is central to my field of study I can tell that if textual entailment is less strictly defined than logical entailment (as per the wikipedia article), then it doesn't require anything that we could recognise as "understanding". Because logical entailment certainly doesn't require understanding and its definition is as strict, as a very strict thing [1]. I mean, I can see how loosening a requirement for precision of any justification of a decision that "A means B" can improve performance, but I can't see how it can improve understanding.

Edit: I'm not sure we disagree, btw, sorry for the grumpy tone. I fully agree with your gist about explainability etc.

______________

[1] Roughly, "A |= B iff for each model M, of A, M is a model of B", where A and B are sets of first order logic formulae and a "model" in this context is a logical interpretation under which a set of formulae is true. A "logical interpretation" is a partition of a predicate's atoms to true and false.



Both papers provide promising first steps in the right direction but are by no means solutions to the problem at hand. I mean, the second paper is even based on the premise that classification has already been done by human experts as a preparation step...


What makes you confident that you aren't overestimating the importance that we "experience anything"


When I say "I have a laptop in front of me," I am describing an understanding of something that is being experienced (sensed). If a Markov text generator outputs this text, it's just rearranging bits. I don't see any evidence that GPT-3 is doing anything more than rearranging bits in a much more elaborate way than a Markov text generator. The results kind of dazzle us, but being dazzled doesn't indicate anything in particular. I see something akin to a textual kaleidoscope toy, a generator of novel text that is syntactically valid and that produces odd cognitive sensations when read.

I maybe should have said sensed, not experienced, since experience also leads into much deeper philosophical discussions around the nature of mind and consciousness. I wasn't really going there, since I don't see anything in GPT-3 or any similar system that merits going there.

I also don't see any evidence that it is drawing any new conclusions or constructing any novel thoughts about anything. It's regurgitating similar results to pre-existing textual examples, re-arranging new ideas in new ways. If you don't think actual new ideas exist then this may be compelling, but if that's the case I have to ask: where did all the existing ideas come from then? Some creative mechanism must exist or nothing would exist, including this text.

The fact that the output often resembles pop Internet discourse says more about the mindlessness of "meme-think" than the GPT-3 model.

As for real world uses, social media spam and mass propaganda seems like the most obvious one. This thing seems like it would be a fantastic automated "meme warrior." Train it on a corpus of Qanon and set it to work "pilling" people.


> When I say "I have a laptop in front of me," I am describing an understanding of something that is being experienced (sensed).

I would ascribe that to two factors a) you have a more immediate, interactive interface to the physical world than GPT does, which is limited to a textual proxy and b) GPT naturally is not a human-level intelligence, it is still of very limited complexity so its understanding more akin to that of a parrot trying to understand its owner's speech patterns. It can infer a tiny bit of semantics and mimic the rest. The ratio is a continuum.

> As for real world uses, social media spam and mass propaganda seems like the most obvious one.

fragments full sentence completion useful maybe.


Take active learning versus usual learning. Often with active learning you can learn much faster. That's a kind of "experience." Out of distribution problems where it fails to generalize could be dealt with much more efficiently when a model can ask "hey what's f(x=something really weird and specific that would never come up in an entire internet's worth of training data)?" Experience isn't passive, and that makes a whole world of difference. And that's not even touching on the difficulty of "tell me all about elephants" versus "let me interact with an elephant and see it and touch it and physically study it."


yesterday I watched a youtube video about GPT3 (https://www.youtube.com/watch?v=_8yVOC4ciXc), and it showed two poems. One was human made, the other was from AI trained on that human's poems.

Both poems were pretty good. But one of them had a metaphor about the moon reflecting in ocean waves, being distorted and taking on monstrous forms.

I figured this had to be the human one, it was a novel description (because metaphor) of a very real experience (how the moon appears in reflection on the ocean).


First off, gwern, lovely blog. The table of contents is incredibly helpful, especially with those little pop-up previews.

I would've loved to have GPT-3 available to me two weeks ago. I was building a personal escape room for my wife as a gift, and used huggingface's GPT-2 website to help write some of the world building content. I'm not a particularly good writer, let alone creative, but wanted a few journal pages/notes to build the atmosphere and story of the escape room. I was able to write the rough skeleton of those notes and then use GPT-2 to help fill them out. Ended up working okay, definitely better than nothing, but GPT-2 is temperamental and lacks the "prompting" that GPT-3 has.

For example, I needed to come up with the name of the journal's author. So I fed the journal text to GPT-2 and put "Sincerely," at the bottom, to try and prompt it to complete a name. That didn't work. Ultimately what worked was putting "My name is" at the end. I still had to grind through 20 or so completions before I got a name I liked.

(Yes, I could have just picked a name at random. Did I mention I'm bad at creativity? My thinking was that the AI could more intelligently pick a name that fit the story and writing style of the journal. And honestly the name it came up with up with, Mabel, fit the character well (a librarian dabbling in magic).

I feel like GPT-3 would have done a lot better. Not to mention the ability to describe my world to it and then just straight up ask it for ideas.


FWIW, if I was trying to generate an escape room, I think I would probably try to use AI Dungeon. It seems like a natural fit. You could easily describe the room and edit text as you go to make it a transcript of an escape room. "I pick up the journal to read the author's name" etc.


Good call; AI Dungeon hadn't occurred to me!


I like providing some context that prompts the model for a list if I think I might need a few candidates. "We discovered a series of journals by several authors in the cabin. The authors' names were:" worked pretty well.


Great article. Well worth the read!

I enjoyed the part about sampling which is a big unsolved problem. To me, techniques like nucleus sampling and temperature sampling feels like hacks to make up for the fact that maximizing for likelihood maybe isn't the goal!? Maybe repetitive gibberish has a higher likelihood than prose written by humans? That Best of sampling decreased text quality indicates that it has. Researches have assumed that the problem would go away with ever growing models. But maybe it won't?

I don't agree that generating (symbolic) music would be less sensitive to sampling issues. On the contrary, in my opinion. In text you can often get away with grammatical errors or missing punctuation. But if the pitch or timing of one chord is wrong it's over. The audience instantly hears that it is garbage. Thus, you have to lower the temperature (or probability threshold or what have you) to make the sampling more conservative exacerbating the problem with repeated sequences.

Of course, in music you want repetitions. But not too much. The magic number (in Western music) is 4. Fewer repeats makes it feel as if the music jumps around. More repeats makes it feel as if the music is stuck or "looping."


Could someone with GPT-3 beta access try whether it can better solve 3 digit addition when it is allowed/encouraged/forced to make intermediate results explicit? E.g. instead of

21 + 110 = 130

150 + 12 =

condition it on

21 + 110 = 100 + 10 + 20 + 1 = 100 + 30 + 1 = 131

150 + 12 =

or similar. Given that humans make these intermediate steps in their heads GPT may perform better when it is encouraged to do them as well.

This may in fact apply to all sorts of reasoning, but in many cases it may be difficult to make these steps explicit in text form. Humans seem to mainly use some prediction layer or scratchpad which also contains the inner monologue but also motor primitives, smells, images, everything. Humans can decide to think a bit longer before producing an output, which appears to require an RNN.


I took a shot at this, didn't have much luck though:

PROMPT ======= Input: 21 + 110 Output: 100 + 20 + 10 + 1 = 100 + 30 + 1 = 130 + 1 = 131

Input: 89+78 Output: 80 + 70 + 9 + 8 = 150 + 9 + 8 = 150 + 17 = 150 + 10 + 7 = 160 + 7 = 167

OUTPUT ====== Input: 37 + 112 Output: 30 + 100 + 10 + 2 = 110 + 1 = 111

Input: 91+11 Output: 100 + 90 + 1 = 190 + 1 = 191


This article is fantastic in both shape and content, and I got lost with all the examples because there is so much to wonder at.

What hits me most profoundly is that there are so many witty and interesting prompts yet the purely logical statements fall apart (with the black ravens, or male sister).

This is something that probably does not jump to one's mind as significant, because "technicalities", but to me this is where logic allows us to take a step back from our human projection for a second, it's my own anthropomorphism that becomes more obvious to me. I find that ironic, considering a lot of human beings also DO fail such tests, but something hits me about how a supposedly completely logical entity fails at logic more than at poetry. It kinds of shakes my own (supposed) humanity.


The logic is weird. Another example is factual question answering. Janelle Shane tried asking basic questions like how many eyes a horse has and GPT-3 insisting on 4; I retry with somewhat different prompting and sampling settings more finetuned to Q&A (...sampling can reveal the presence of knowledge but not its absence...), and I get perfectly straightforward correct answers: https://twitter.com/gwern/status/1278798196555296771/photo/1 So GPT-3 does know how many eyes a horse has; but why was it also happy to answer '4'?


Something about the logic being so off is what I intuitively find logical: we're making these AIs "in our image" in a sense (we think of "neural networks", train them with mostly human-generated datasets), and there's a lot of evidence that pure logic evades us without the use of some heavy artillery to address it (cognitive biases, illusions, optimizations for goals that do not necessarily align with "objectively observable" reality). So in a way, I wonder if we'll have to "teach AI logic" at some point too. In this quest of running logic software on logic hardware with.. steps.. in between, I can't help but think about us humans on our parallel quest when it comes to our brains.


I tend to write it off as less any kind of deep truth about humans (well, maybe the "bachelors can be married" one given that 9/10 students agreed with GPT-3 that bachelors can be married) than just the current weaknesses of how we train NNs like GPT-3 (small, unidirectional, unimodal, not to convergence, missing most of science in PDFs, etc).

In particular, I bet the "how many eyes does a horse have" example would be much less likely with a multimodal model which has actually seen photographs or videos of what the word "horse" describes and can see that, like most mammals, they only have 2 eyes. Think of it as like layers of Swiss cheese: every modality's datasets has its own weird idiosyncrasies and holes where the data is silent & the model learns little, but another modality will have different ones, and the final model trained on them all simultaneously will avoid the flaws of each one in favor of a more correct universal understanding.

I'm very keen to see how much multimodal models can improve over current unimodal models over the next few years.


> fails at logic more than at poetry

This is purely subjective.

Your expectations in Poetry might be different from those of other people or even specialists. I am not particularly good in that domain, but I don't really like the results shown sometimes.


I agree with you that it's subjective. Testing logic vs. art is going to bring in this kind of problem to the surface (how do you test art in a comparable way to how you test logic?). This is why I wrote that noticing the thoughts made me take a step back from my own projections (my subjectivity). That's the whole point.


Holy crapoly.

The quality (in both senses) of the output given an appropriately constructed prompt is incredible.

I wonder if it's possible to get it to do the opposite of summarizing, ie. give it a plot summary and have it expand it into a fleshed out story that conforms to the summary...


It doesn't seem to work quite as well. For single sentences, GPT-3 output usually hangs together pretty well. For longer stretches of text, often there are internal inconsistencies that are jarring when you read it, or parts that don't quite make sense when you read it as a whole.


Thanks for the reply. Aside from the coherence and consistency issues you've noted, does the output actually conform to the specified plot summary, or does it deviate from it?


I wonder if it's possible to get it to do the opposite of summarizing

That's pretty much its main job: you provide a prompt and it writes a story about it.


A story prompt resulting in a story is looser than what I meant. Can it elaborate on a plot summary (that includes spoilers) and remain faithful to it? eg. given an outline of O'Henry's Gifts of the Magi, write an equivalent story that still concludes with the protagonists both having sold their prize possesion to buy a now useless gift for the other? Or will the plot get lost in the weeds?


Some of these - particularly the Zizek Navy Seals Copypasta - are absolutely incredible. Great work, Gwern.


This is fascinating to read through. It’s so hard to avoid a variant of the Forer effect, though - where we unconsciously discount the errors, and selectively focus on subsets of the output and impute meaning to them.

Designing objective quality tests must be an active area of research, I wonder what the best approaches are?


Forer effect material was written by humans, one might note... But there was a paper just the other day on more rigorous evaluation: https://arxiv.org/abs/2006.14799


Thanks for the link. You’re right that Forer effect material was written by humans, but the point is more that there is a failure mode in our thinking that can be exploited - mostly intentionally by “psychics”, unintentionally by automated text generators.

Just something I was mulling over, though, not to take away from the obvious progress here.


Gwern is a prodigious blogger. Absolutely prodigious.


Thorough is the better word.


I want someone to use GPT-3 to be my D&D GM, or at least an assistant.


You can try Ai dungeon. It is somewhat close and uses gpt3 api


I still find it odd that we call this "artificial intelligence" when it's advanced mimicry at best. There's no "intelligence" in the strict definition of the word, it's just elaborate pattern matching.

But I get it, it's exciting, and it's an easy way to get VC money. Perhaps one day we'll get something useful aside from the various pattern matching applications (image recognition, speech to text, etc). I'm skeptical but willing to be surprised.


I'm sorry, your comment explaining why deep learning & GPT-3 do not truly understand anything is more poorly reasoned and explained than GPT-3's explanation why GPT-3 does not truly understand anything: https://www.gwern.net/GPT-3#why-deep-learning-will-never-tru...

While it's true that recent natural neural net models like ixvvqktiwl may sound superficially coherent and like they 'understand' things, we can see by comparison with artificial neural net models that they aren't really doing anything we'd call "natural intelligence"; it's advanced mimicry at best, just elaborate pattern matching.

I get that it's very easy to create these natural neural net models and be carried away by excitement, and it can even be profitable (witness the many VC-funded startups which use natural neural nets as a core technology), but we should remain skeptical of any claims by those natural neural net models, much less their promoters online, that they are 'intelligent' in the strict definition of the word.


I'm as impressed as anyone with GPT-3 samples, but you're sort of ignoring the symbol grounding elephant in the room regarding language models (https://openreview.net/pdf?id=GKTvAcb12b).

Language models are not grounded learners. The language produced does not really correspond meaningfully to our world except in superficial (albeit complex) ways.

Do you have thoughts on how to move forward on this problem? Maybe ask GPT-3 and see what it thinks :P


The problem, if I understand correctly, is that we're feeding enormous amounts of text to language models hoping that they might contain, hidden in their patterns, enough information about the real world to allow prodigiously complex NNs to extract it and create their own representation of reality.

And while this is possible, it feels there should be more effective ways to impart a knowledge of reality- if only we had huge databases of usable data to feed to these NNs instead of dumps of text. At the moment it feels like we're trying to teach advanced physics to a subject with no previous knowledge of physics or math by just feeding it with everything on arXiv and physics textbooks in random order. What you get is someone who can produce text that mimics the superficial style of scientific articles, but with an extremely confused understanding of the subject, if any at all.


I would be more impressed by that paper if they didn't make trivially falsifiable claims: https://twitter.com/gwern/status/1280204127876808705

I am happy to take them at their word that their theory about symbol grounding proves that no LM will ever be able to solve "Three plus five equals" (appendix B); and thus, by modus tollens, GPT-3's ability to (already) solve "Three plus five equals" means their theory is wrong and I need not consider it any further.


Symbol grounding is as much a problem in AI as whether or not our use of language is meaningful. Does our language encode particular models of the world? Yes? Good. Then AI models also encode models of the world.


I understand you're trying to be funny, but I think insults are against the HN site guidelines.


> I still find it odd that we call this "artificial intelligence" when it's advanced mimicry at best. There's no "intelligence" in the strict definition of the word, it's just elaborate pattern matching.

"Advanced Mimicry" is in fact an entirely apt description of a lot of human activity that falls under the heading of "intelligence", though not necessarily particularly "smart". So, we could call it "Artificial Stupidity" instead, if you like.


> I still find it odd that we call this "artificial intelligence" when it's advanced mimicry at best.

GPT-3's amazing ability to pick up what we want from prompts lift it above mimicry. Intelligence is about fast solving of novel tasks (with little supervision). GPT-3 does this more than any other language model.


One thing that I find interesting is how children also essentially do advanced mimicry to fill in gaps of their knowledge, what would be interesting is to see a more traditional inference engine type approach that then used a GPT-3 style approach when the knowledge of a situation dipped beyond a certain level, and then take the response to the GPT-3 style approach as a datapoint for inference engine style reasoning


I'm not disagreeing with you but where's the frontier between "pattern matching" and "intelligence"?

I also think human intelligence and creativity will always be judged by other humans as _better_, more genuine - as long as the judges can trust that given creation was lead by a human.

Among creative types "derivative" is a derogatory label used frequently. "Lesser artists borrow, great artists steal" also has an implication that regardless of what else an artist/creator does: pattern matching is a big part of that process.


That's a good point. I think we need to have a better understanding about how the brain works before we can properly answer that question. At this point there's a lot we don't understand about the brain, and psychology for that matter, it's basically a black box that we can observe through a microscope or other imaging system but we don't really understand what's happening.

An ant only has around 250,000 neurons, yet they're still more intelligent than the most advanced "AI" we've managed to produce.

An ant may not be able to paint a painting or write a novel, but I think most people agree they qualify as an intelligent being.


Can we really call this ANTNN "intelligent" when it falls victim to trivial adversarial inputs? Ones that arise in the real world, not just in artificial inputs mind you. Clearly those ANTNNs are still far from anything that could rival natural intelligences.

https://www.youtube.com/embed/mA37cb10WMU


The current leading theories on how the brain works are that yes, they are essentially very impressive prediction machines, which obviously rely on all sorts of pattern matching to make said predictions. You can look up Karl Friston's free energy principle or check this book for more details

https://www.amazon.com/Surfing-Uncertainty-Prediction-Action...


> where's the frontier between "pattern matching" and "intelligence"?

I recently saw this great video related to your very question.

Yannic Kilcher: Paper review "On the Measure of Intelligence by François Chollet" https://www.youtube.com/watch?v=cuyM63ugsxI


What test would you perform to distinguish whether a ML model is pattern matching or 'intelligent'?


Do you consider TabNine to be a "pattern matching application"?


Is this using OpenAI's API or are you running GPT-3 on your own machine?


You can only use GPT-3 via OpenAI's API currently.


Got it, that's what I thought!


again, gwern writes the longest guide in the world about something


Gwern should train it on his own corpus and see if he can automate himself. :)


It apparently doesn't need any further training: https://www.gwern.net/GPT-3#gwern-branwen


or in general : GPT3 gen using random input -> curation by some GAN -> GPT3' train -> GPT3' gen -> ... How many generations until War and Peace?




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: