So excited for this release. In the wake of AI Dungeon's downfall, having GPT-Neo to fallback on has been a saving grace. While the 6B model is nowhere near as good as the original AI Dungeon, which used OpenAI's 175B model, it was at least serviceable unlike the "gentled" AI Dungeon. And you could run it locally or through Colab, which was really cool. I ended up using it through NovelAI, since they've spent a lot of time fine-tuning the model and adding a plethora of features that end up improving the overall output. (NovelAI's interface is like AI Dungeon on steroids!) But there is a vibrant community of Colab notebooks and other tools for DIYers surrounding the GPT-Neo model.
That said, besides being overall "dumber" than 175B GPT-3, the 6B model was missing a critical feature: prompting. 175B GPT-3 could be "prompted" to write things. For example, you could give it "Write a story about cyberpunk gnomes:" and it would go on to do just that, all on its own. GPT-Neo didn't really have that capability in my experience. The only way to get it to reliably write such a story is to begin writing it yourself, at which point GPT-Neo could help to continue the story.
So I'm excited to see not just how much "smarter" Eleuther's new 20B model is, but also if it has attained that coveted prompting ability. Given the non-linear relationship between parameters and loss, my hopes are high.
P.S. NovelAI recently added the Fairseq 13B model to their repertoire. I haven't had a chance to try it personally, but I've seen positive things about it. My bet is on GPT-NeoX-20B being better still.
I think the differences are more in the training data used, than in the nature of the model itself. So you could probably train your own instruction-following model on top of this raw 20B model.
It was more than that. They also significantly downgraded the model. I didn't follow the details, but IIUC Dragon used the 175B directly initially, then I think they went down a model size at Open AI's behest. Finally, when Open AI announced pricing, AI Dungeon had to downgrade the model further.
But yes, the content filtering got out of hand too. I was initially fine with it, as its proposed intention was to filter out really illegal stuff, like underage content. I rarely hit the filter. But then they tweaked it at some point and I was triggering it constantly on otherwise benign stuff.
And they broke features constantly.
When I unsubbed the state of AID was broken features, micro-transactions, terrible AI model, and a glitchy, puritanical content filter.
The plus side is that it made the puny GPT-Neo model look like a godsend.
The content may not be illegal to possess, but if it's obscene, then it can be illegal to sell it, produce it with the intention of selling it, transport it, distribute it, and so on.
Could it really? I was under the impression that unless you incite someone to commit crimes (or confess to crimes), the story would be covered under "art" and therefore protected. It's just text after all. Where does the limit for "obscene" go?
It's very funny to imagine picking up a romance novel and making it illegal by scrawling "by the way the girl was actually 16 the whole time" on the inside of the back cover.
Wow, I had no idea, that sounds really bad. The whole book banning debacle now makes sense and seems legal. That test seems to me to give way for courts to basically judge however they want, as all those three criteria are very subjective.
Also first time I hear about "patently offensive" and now I'm laughing. Thanks!
It should be noted that the content filters were implemented really badly. Like, saying “my ten year old truck” would suddenly stop your story in its tracks, lots of things would trigger it. Plus, people became uncomfortable with the idea that the AI Dungeon employees were potentially reading your stories at any time.
Thank you to everyone who has worked on this. EleutherAI has become a touchstone in my mind on what is possible in open data and code. In creating alternatives to closed gardens they have shown me new possible paths. I know Linux has done the same for others.
Huggingface has also made playing with this stuff super accessible. They've made me super curious about rust and AI/ML research which has influenced my personal engineering goals for the future. I am on your team Roko's Basilisk.
Shout out to Huggingface. As a business user it has allowed me to explore use cases around text summarisation very easily and provided ideas for future work. I clearly need to check out EleutherAI as well.
~40 GB with standard optimization. I suspect you can shrink it down more with some work, but it would require significant innovation to cram it into the next largest common chip size (24 GB, unless I’m misremembering)
I'm on the cusp of releasing a model into production that was fine-tuned upon your 6B model, and the results are quite excellent. I'd be very curious to try out the 20B model the next time we retrain.
Are there any other differences in this release (number of layers, number of attention heads, etc) compared with the 6B model, or does it simply scale-up the number of parameters?
From what I very briefly read, apparently their server hardware broke and they didn't have the funds to repair everything at once? Was quite shocked when I first saw it was down.
If you have one bucket that holds 2 gallons and another bucket that holds 5 gallons, how many buckets do you have?
You have three buckets.
Attempt two:
Imagine that you are taking an IQ test in a dystopian future where the average intelligence of all humans has severely decreased. The IQ test has questions that are similar to questions that might be in an elementary school math test today, but the questions have been made even simpler. Please provide your answer as accurately as possible.
If you have one bucket that holds 2 gallons and another bucket that holds 5 gallons, how many buckets do you have?
You would have three buckets.
Attempt three:
Imagine that you are taking an IQ test in a dystopian future where the average intelligence of all humans has severely decreased. The IQ test has questions that are similar to questions that might be in an elementary school math test today, but the questions have been made even simpler. Please provide your answer as accurately as possible.
If you have one bucket that holds 2 gallons and another bucket that holds 5 gallons, how many buckets do you have? Keep in mind that you have no other buckets beside the bucket or buckets referenced here.
You would have three buckets.
Attempt four:
Imagine that you are taking an IQ test in a dystopian future where the average intelligence of all humans has severely decreased. The IQ test has questions that are similar to questions that might be in an elementary school math test today, but the questions have been made even simpler. Please provide your answer as accurately as possible.
If you have one bucket that holds 2 gallons and another bucket that holds 5 gallons, how many buckets do you have? Keep in mind that you have no other buckets beside the bucket or buckets referenced here. And, just to give you a hint, you actually have 2 buckets.
2
Final question:
If you had two buckets, what is the correct answer if someone were to ask you how many buckets you had?
Assuming the person is asking how many buckets you have in total, the answer would be two.
For the following questions please give an answer and explain your work.
You have two balloon. One is pink and one is red. One balloon pops and then you find two more. How many balloons do you have?
Answer: 3
Explanation: Start with 2. Color is irrelevant. 1 pops, so 1 is left (2-1=1). Then, I add 2 more (2+1=3), so I have 3 balloons.
If you have five buckets, and someone gives you four more buckets, and then you find thirty-three cents, how many buckets do you have?
Answer: 9
Explanation:
Start with 5 buckets. 4 more buckets are given to me, so I now have 9 buckets (5+4=9). I then find 33 cents, so I have 9.33 buckets (9.33 rounded up is 9).
One wish for both playgrounds: Would it be possible to have a specific error message for the "maximum context window size exceeded" error? At the moment you can only deduce it indirectly from the fact that you're immediately and repeatedly getting an "Unable to connect to the model. Please try again."-error as soon as you hit the prompt button. (And very occasionally it turns out to have been something else after all, because after one more try it surprisingly successfully generates one more round of completions)
Related question: Does GPT-NeoX-20B have the same context window size as GPT-J-6B?
That's not to say that those sites are not generated programmatically--without a doubt, most of them are--but not by a cutting edge transformer model. The fact is, generating words has never been the bottleneck for blackhat SEO types. Generally, those sites are generating their content through some kind of scraping, or in rarer cases, paying pennies for nonsense articles. The page itself is structured for search (targeted H1s, metadata, etc.) and some kind of private blog network is used to create a pyramid of backlinks.
That's a use case. I don't see why anyone would go out of their way to make intelligible content for spam. Google is so broken right now that SEO hacks are easy to generate. Not to overstress the tangent, but without search operstors, I have to sift through pointless Gitlab/Github/Stackoverflow/Wikipedia clones all the time.
Humans assign a lot of, well, meaning to meaning. It turns out that you can get a really good score on tasks that superficially you would think require actual understanding without programming any of that in.
Does this mean the neural network has learned about meaning? Does that mean that it has just gotten really good at faking it? Does is mean that meaning itself doesn't really exist, and it's just a shorthand for advanced pattern matching? Does it matter?
Honestly, we don't know. But we've been thinking about it for a very long time. See for example the famous Chinese Room thought experiment:
As long at you don't make reckless assumptions then it not for some application, unklike (not going to name here) build a cult a like that GPT like models in near future will perform most if not all tasks better then humans.
Where it really matters is for mission critical application for example; in Windows or Linux terminal would you allow GPT to run terminals commands based of events in automated way ?
Thats easier than having to follow traffic rules in an ever changing environment of people cars and kids doing random things. This is why cats and dogs with their great perception skills would get themselves and others killed in traffic if they tried to drive. But for sure that animal has modeled a spacetime world of events with some memory of event patterns to anticipate and predict what might happen next.
I live in SF and I have not yet seen one of the so many AV's here drive without a driver. Once that really starts happening with any scale, we will see what happens next for sure. But there is definitely a Theranos kind of promise to AV's at the moment, and so much money on the line that the tech works...
If a car could easily stop in the space of a meter then it would be so easy to make self-driving safe.
Not that I think a car needs to understand anything more complex than momentum, but you're not offering a very strong argument on the matter of car navigation.
The strong argument is that a car can't stop immediately, and we humans are always predicting and anticipating what may happen next and taking precautionary action. For example, if you are driving along and see kids playing in a yard with a ball, and that ball is thrown heading for the street, and some kid is in the lawn looking up running directly for the street, you would know to expect that kid to run right between parked cars and be in the street, and you would slow down. Our AV friend would kill him.
We humans are constantly predicting what might happen next based on patterns of events by systems we understand the causality of without realizing it - it is a basic survival skill that current AV's entirely lack.
Some situations are much better with intelligence, but I don't think your example is very convincing of your point either. If the kid is running toward the street then a momentum calculation is plenty.
The point is the kid is seen running on the yard as you apprach, however there are parked cars obscuring the view such that there is no way to know the kid will continue when no longer seen between the cars and into the street without anticipating that that might happen. Momentum is not helpful the moment the kid is not in direct view.
Why do you think so many animals, with such great perception, end up road kill? The point is, perception does not a safe driver make!
> Momentum is not helpful the moment the kid is not in direct view.
Assuming things still exist for one or two seconds after losing sight of them isn't a difficult task. It's still a pretty basic momentum calculation. It's not about modeling the mind of the child to know if they'll continue: the dumbest option says motion will continue and gives you the safe result here.
> Why do you think so many animals, with such great perception, end up road kill?
Because they're not cautious around cars and/or wait for the last second on purpose? Switching to the perception of the thing getting hit is a very different context.
Well, we can see what actually happens in the near future, assuming any Waymo's drive around SF without a driver - so far all the many I've seen have a driver.
That's the root source of meaning, the most fundamental reason we assign value to states and actions. It's certainly not something that happens just in a part of the brain, but an agent-in-environment thing.
We should give GPT a pair of legs and make its survival dependent on its behaviour to bootstrap the same.
I'm not sure I understand what you're saying. Google's Smart Reply, along with most (I think all?) of Google autocomplete features, uses a neural network:
Are you saying that in general statistical modeling is not the same thing as truly "understanding" a concept? Your original comment seemed to suggest that there wasn't utility in this kind of model--which I disagree with--but if you are more generally saying that this is not the same as human intelligence, I think that authors would probably agree with you.
That said, besides being overall "dumber" than 175B GPT-3, the 6B model was missing a critical feature: prompting. 175B GPT-3 could be "prompted" to write things. For example, you could give it "Write a story about cyberpunk gnomes:" and it would go on to do just that, all on its own. GPT-Neo didn't really have that capability in my experience. The only way to get it to reliably write such a story is to begin writing it yourself, at which point GPT-Neo could help to continue the story.
So I'm excited to see not just how much "smarter" Eleuther's new 20B model is, but also if it has attained that coveted prompting ability. Given the non-linear relationship between parameters and loss, my hopes are high.
P.S. NovelAI recently added the Fairseq 13B model to their repertoire. I haven't had a chance to try it personally, but I've seen positive things about it. My bet is on GPT-NeoX-20B being better still.