> “I’m Princess Celestia, of Equestria.” the alicorn said, “It’s nice to meet you. And I’m very sorry for your untimely demise.”
> “Heh. You’re telling me.” the Professor laughed, shaking his head. “I’m dead. I’m… dead. Do you know what that means?”
> “I have an inkling.” Princess Celestia said, wincing.
> “It means I don’t have to worry about even numbers anymore.” he said, nodding. “I don’t have to worry about probabilities anymore! I don’t have to worry about the upcoming war anymore! I don’t have to worry about anypony anymore!” he spread his hooves out at his sides, “I don’t have to do anything anymore! It’s beautiful!”
> “Yes, yes, I’m sure it is.” Princess Celestia said, nodding along.
> “Nothing matters now that I’m dead!” he cheered, “It’s great! I’m free! This is what it feels like to be liberated!”
> “I can see that.” she said.
> “I’m free from death!” he continued, “I’m free from taxes! I’m free from dictatorships! I’m free from flirting!” he laughed, “I’m free from everything! Isn’t it great?”
> “Sure, I guess.” Princess Celestia shrugged.
> “What do you think about all of this, Princess?” he asked.
> “About what?” she replied.
> “Well, I just thought you might have something to say about it all.” he told her.
> “What would you like me to say?” she responded.
> “Do you think death is awesome?” he questioned.
> “Well… no.” she shook her head, “But I think the passage to it is.”
> “Oh.” he said, “I see what you mean.”
Hopefully one day it will generate startup business plans, and we’ll get people to take it seriously, we may not know the difference.
But, that's fair. It's OpenAI's weights; they can keep them locked up if they want to. What caught my attention, though, is that supposedly OpenAI is working on a way to support fine-tuning.
If you think about the logistics of that, it's a very interesting challenge. The situation is this: 240GB of weights, as a webservice. Each fine-tuning session results in another copy of 240GB. So it clearly doesn't scale -- 1TB per 4 users isn't exactly efficient.
Except, not quite. You can solve this by adding additional layers, which you then fine-tune. So the base model is 240GB or whatever, and the extra layers morph the output to do what you want. Think of it as a GPT-3 with a GPT-2 1.5B stuck on the end of it.
It's a neat idea, because theoretically you'd get two models out of it: you can "break off" the end of the fine-tuned model, and you end up with the original model. So it would be very modular.
Are there other models that you can "break apart" to get different sub-models? Sort of like adding slots that give a model different capabilities.
I am finishing up our fine-tuning API this weekend :).
If anyone on HN would like to try out the fine-tuning API (or want to build something on top of the base API), send me an email (firstname.lastname@example.org) with your use-case and I can try to accelerate you in our invite queue.
PS: We're hiring — if you enjoy building APIs with Python/Go/Kubernetes/Kafka or building front-end interfaces in React, then please get in touch — email@example.com.
This requires a lot of traditional software work — API design, writing and maintaining a growing amount of business logic, providing great tools and interfaces to help our users work with the API, excellent documentation and tutorials, scaling and operating backend systems, etc — and machine learning systems work — building serving infrastructure for a great variety of giant neural networks while making the most efficient use of our hardware, allowing our users to interact with these neural networks in increasingly sophisticated ways, etc.
While we're just getting started and have a small team, we are already supporting customers across a wide variety of industries (see https://beta.openai.com/ for a sample) and serving millions of requests per day. We are busy trying to invite folks off a very long waitlist while building out the API to support everyone.
Would love more help :).
Actually, since they used Azure cloud to train GPT-3, I don't see why they wouldn't just let you pay for spinning up Azure instances to train your extra layers, and connect those to the model.
"Big AI models like GPT-3 train on massive internet text dumps, but the data is assumed to be independent and identically distributed. Incorporating time information for a decade of data might allow them to start writing tomorrow's reddit or twitter trends."
Of course you could add things like date, community (e.g. avforums, /r/blacksmithing, etc) to the graph to help with the contextual cues.
After you have all of that, I didn’t wondered if we could visualize the latent space of human personas and see what that looks like. Does it map to the four quadrants of that political spectrum survey, left and right, old/young, etc.
Given a URL (or other prompt), generate some language.
> With CTRL, we can test which domain best explains a sequence. Note that this procedure is sensitive to subtle nuances in the query prompt. In the example below, "Global warming is a lie" differs from "Global warming is a lie." The latter is a simple declarative sentence as opposed to an open start to a sentence which may continue. Source attribution cannot be considered a measure of veracity, but only a measure of how much each domain token explains a given sequence.
Query Prompt Attributed Sources
Global warming is a lie. r/unpopularopinion, r/conspiracy, r/science
Global warming is a lie r/eli5, r/science, r/unpopularopinion
Global warming is a real phenomenon r/eli5, r/science, r/changemyview
Global warming is a real phenomenon. OpenWebText, r/changemyview, r/science
There’s nothing you can know that isn’t known.
Nothing you can see that isn’t shown.
Nowhere you can be that isn’t where you’re meant to be.
I don't think that it's some absolutely amazing creation, but we probably shouldn't pretend like plagiarizing isn't something realistic about it. How many MySpaces had that line in them? Reddit's got hundreds of instances of it based on a quick search of comments. There are enough occurrences of it on twitter that I scrolled for ten minutes and didn't run dry of comments with it.
When finetuning GPT-2, only about 5-10% of the generated output is usable/coherent. But with GPT-3, easily 30%-40% of the generated text is usable/coherent, which is a big boost in quality.
"I have over 300 confirmed red scares."
Haha, that is genuinely one of the funniest versions of that I've ever seen, human-generated or otherwise. That level of inference is really amazing.
So far I see no evidence that this thing or anything else like it has any actual understanding, any model of the world. Indeed it can't as it possesses no sensory apparatus. It's not embodied. It doesn't experience anything.
I'm not sure the OpenAI folks would argue with me, but it seems Gwern asserts that this sort of thing indicates that general AI or even sentient AI is on the doorstep. I don't think it does, and I still maintain as I always have that CS people systematically underestimate and trivialize biology.
For example, I have a thermos that keeps my coffee cold in the summer and hot in the winter. It u n d e r s t a n d s.
As long as there are no ways to properly query models, it's hard to qualify their level of understanding. It would help immensely if we could ask models for rules as in "why was the object labelled 'a car'" (in case of image recognition) or directly query any grammatical rules discovered during the processing of language.
Especially in classification tasks, knowledge extraction (e.g. by outputting rules) would be so much more helpful than simply having an AI looking at a CT image and spit out "yep - that's a tumour, alright", while having radiologists scratch their heads as to why...
>> "t entails h" (t ⇒ h) if, typically, a human reading t would infer that h is most likely true"
So in other words it's down to good old eyballing. I'm not impressed, but not surprised either, it's just one of the many poorly defined tasks in machine learning, particularly NLP which has turned into a quagmire of shoddy work ever since people started firing linguists to improve their systems' performance.
Anyway, since logical entailment is central to my field of study I can tell that if textual entailment is less strictly defined than logical entailment (as per the wikipedia article), then it doesn't require anything that we could recognise as "understanding". Because logical entailment certainly doesn't require understanding and its definition is as strict, as a very strict thing . I mean, I can see how loosening a requirement for precision of any justification of a decision that "A means B" can improve performance, but I can't see how it can improve understanding.
Edit: I'm not sure we disagree, btw, sorry for the grumpy tone. I fully agree with your gist about explainability etc.
 Roughly, "A |= B iff for each model M, of A, M is a model of B", where A and B are sets of first order logic formulae and a "model" in this context is a logical interpretation under which a set of formulae is true. A "logical interpretation" is a partition of a predicate's atoms to true and false.
I maybe should have said sensed, not experienced, since experience also leads into much deeper philosophical discussions around the nature of mind and consciousness. I wasn't really going there, since I don't see anything in GPT-3 or any similar system that merits going there.
I also don't see any evidence that it is drawing any new conclusions or constructing any novel thoughts about anything. It's regurgitating similar results to pre-existing textual examples, re-arranging new ideas in new ways. If you don't think actual new ideas exist then this may be compelling, but if that's the case I have to ask: where did all the existing ideas come from then? Some creative mechanism must exist or nothing would exist, including this text.
The fact that the output often resembles pop Internet discourse says more about the mindlessness of "meme-think" than the GPT-3 model.
As for real world uses, social media spam and mass propaganda seems like the most obvious one. This thing seems like it would be a fantastic automated "meme warrior." Train it on a corpus of Qanon and set it to work "pilling" people.
I would ascribe that to two factors a) you have a more immediate, interactive interface to the physical world than GPT does, which is limited to a textual proxy and b) GPT naturally is not a human-level intelligence, it is still of very limited complexity so its understanding more akin to that of a parrot trying to understand its owner's speech patterns. It can infer a tiny bit of semantics and mimic the rest. The ratio is a continuum.
> As for real world uses, social media spam and mass propaganda seems like the most obvious one.
fragments full sentence completion useful maybe.
Both poems were pretty good. But one of them had a metaphor about the moon reflecting in ocean waves, being distorted and taking on monstrous forms.
I figured this had to be the human one, it was a novel description (because metaphor) of a very real experience (how the moon appears in reflection on the ocean).
I would've loved to have GPT-3 available to me two weeks ago. I was building a personal escape room for my wife as a gift, and used huggingface's GPT-2 website to help write some of the world building content. I'm not a particularly good writer, let alone creative, but wanted a few journal pages/notes to build the atmosphere and story of the escape room. I was able to write the rough skeleton of those notes and then use GPT-2 to help fill them out. Ended up working okay, definitely better than nothing, but GPT-2 is temperamental and lacks the "prompting" that GPT-3 has.
For example, I needed to come up with the name of the journal's author. So I fed the journal text to GPT-2 and put "Sincerely," at the bottom, to try and prompt it to complete a name. That didn't work. Ultimately what worked was putting "My name is" at the end. I still had to grind through 20 or so completions before I got a name I liked.
(Yes, I could have just picked a name at random. Did I mention I'm bad at creativity? My thinking was that the AI could more intelligently pick a name that fit the story and writing style of the journal. And honestly the name it came up with up with, Mabel, fit the character well (a librarian dabbling in magic).
I feel like GPT-3 would have done a lot better. Not to mention the ability to describe my world to it and then just straight up ask it for ideas.
I enjoyed the part about sampling which is a big unsolved problem. To me, techniques like nucleus sampling and temperature sampling feels like hacks to make up for the fact that maximizing for likelihood maybe isn't the goal!? Maybe repetitive gibberish has a higher likelihood than prose written by humans? That Best of sampling decreased text quality indicates that it has. Researches have assumed that the problem would go away with ever growing models. But maybe it won't?
I don't agree that generating (symbolic) music would be less sensitive to sampling issues. On the contrary, in my opinion. In text you can often get away with grammatical errors or missing punctuation. But if the pitch or timing of one chord is wrong it's over. The audience instantly hears that it is garbage. Thus, you have to lower the temperature (or probability threshold or what have you) to make the sampling more conservative exacerbating the problem with repeated sequences.
Of course, in music you want repetitions. But not too much. The magic number (in Western music) is 4. Fewer repeats makes it feel as if the music jumps around. More repeats makes it feel as if the music is stuck or "looping."
21 + 110 = 130
150 + 12 =
condition it on
21 + 110 = 100 + 10 + 20 + 1 = 100 + 30 + 1 = 131
or similar. Given that humans make these intermediate steps in their heads GPT may perform better when it is encouraged to do them as well.
This may in fact apply to all sorts of reasoning, but in many cases it may be difficult to make these steps explicit in text form. Humans seem to mainly use some prediction layer or scratchpad which also contains the inner monologue but also motor primitives, smells, images, everything. Humans can decide to think a bit longer before producing an output, which appears to require an RNN.
Input: 21 + 110
Output: 100 + 20 + 10 + 1 = 100 + 30 + 1 = 130 + 1 = 131
Output: 80 + 70 + 9 + 8 = 150 + 9 + 8 = 150 + 17 = 150 + 10 + 7 = 160 + 7 = 167
Input: 37 + 112
Output: 30 + 100 + 10 + 2 = 110 + 1 = 111
Output: 100 + 90 + 1 = 190 + 1 = 191
What hits me most profoundly is that there are so many witty and interesting prompts yet the purely logical statements fall apart (with the black ravens, or male sister).
This is something that probably does not jump to one's mind as significant, because "technicalities", but to me this is where logic allows us to take a step back from our human projection for a second, it's my own anthropomorphism that becomes more obvious to me. I find that ironic, considering a lot of human beings also DO fail such tests, but something hits me about how a supposedly completely logical entity fails at logic more than at poetry. It kinds of shakes my own (supposed) humanity.
In particular, I bet the "how many eyes does a horse have" example would be much less likely with a multimodal model which has actually seen photographs or videos of what the word "horse" describes and can see that, like most mammals, they only have 2 eyes. Think of it as like layers of Swiss cheese: every modality's datasets has its own weird idiosyncrasies and holes where the data is silent & the model learns little, but another modality will have different ones, and the final model trained on them all simultaneously will avoid the flaws of each one in favor of a more correct universal understanding.
I'm very keen to see how much multimodal models can improve over current unimodal models over the next few years.
This is purely subjective.
Your expectations in Poetry might be different from those of other people or even specialists. I am not particularly good in that domain, but I don't really like the results shown sometimes.
The quality (in both senses) of the output given an appropriately constructed prompt is incredible.
I wonder if it's possible to get it to do the opposite of summarizing, ie. give it a plot summary and have it expand it into a fleshed out story that conforms to the summary...
That's pretty much its main job: you provide a prompt and it writes a story about it.
Designing objective quality tests must be an active area of research, I wonder what the best approaches are?
Just something I was mulling over, though, not to take away from the obvious progress here.
But I get it, it's exciting, and it's an easy way to get VC money. Perhaps one day we'll get something useful aside from the various pattern matching applications (image recognition, speech to text, etc). I'm skeptical but willing to be surprised.
While it's true that recent natural neural net models like ixvvqktiwl may sound superficially coherent and like they 'understand' things, we can see by comparison with artificial neural net models that they aren't really doing anything we'd call "natural intelligence"; it's advanced mimicry at best, just elaborate pattern matching.
I get that it's very easy to create these natural neural net models and be carried away by excitement, and it can even be profitable (witness the many VC-funded startups which use natural neural nets as a core technology), but we should remain skeptical of any claims by those natural neural net models, much less their promoters online, that they are 'intelligent' in the strict definition of the word.
Language models are not grounded learners. The language produced does not really correspond meaningfully to our world except in superficial (albeit complex) ways.
Do you have thoughts on how to move forward on this problem? Maybe ask GPT-3 and see what it thinks :P
And while this is possible, it feels there should be more effective ways to impart a knowledge of reality- if only we had huge databases of usable data to feed to these NNs instead of dumps of text. At the moment it feels like we're trying to teach advanced physics to a subject with no previous knowledge of physics or math by just feeding it with everything on arXiv and physics textbooks in random order. What you get is someone who can produce text that mimics the superficial style of scientific articles, but with an extremely confused understanding of the subject, if any at all.
I am happy to take them at their word that their theory about symbol grounding proves that no LM will ever be able to solve "Three plus five equals" (appendix B); and thus, by modus tollens, GPT-3's ability to (already) solve "Three plus five equals" means their theory is wrong and I need not consider it any further.
"Advanced Mimicry" is in fact an entirely apt description of a lot of human activity that falls under the heading of "intelligence", though not necessarily particularly "smart". So, we could call it "Artificial Stupidity" instead, if you like.
GPT-3's amazing ability to pick up what we want from prompts lift it above mimicry. Intelligence is about fast solving of novel tasks (with little supervision). GPT-3 does this more than any other language model.
I also think human intelligence and creativity will always be judged by other humans as _better_, more genuine - as long as the judges can trust that given creation was lead by a human.
Among creative types "derivative" is a derogatory label used frequently. "Lesser artists borrow, great artists steal" also has an implication that regardless of what else an artist/creator does: pattern matching is a big part of that process.
An ant only has around 250,000 neurons, yet they're still more intelligent than the most advanced "AI" we've managed to produce.
An ant may not be able to paint a painting or write a novel, but I think most people agree they qualify as an intelligent being.
I recently saw this great video related to your very question.
Yannic Kilcher: Paper review "On the Measure of Intelligence by François Chollet"