Hacker News new | past | comments | ask | show | jobs | submit login
GPT-3 is no longer the only game in town (lastweekin.ai)
342 points by sebg 78 days ago | hide | past | favorite | 210 comments

The future is not as dark as it seems because of the rat race of megacorps.

You can use reduced versions of language models with extremely good results.

I was involved in training the first-ever GPT2 for Bengali language, but with 117 million parameters.

It took a month's effort (training + writing code + setup) and about $6k in TPU cost, but Google Cloud covered it.

Anyway, it is surprisingly good. We fine-tuned the model for several downstream tasks and we were shocked when we saw the quality of generated text.

I fine-tuned this model to write Bengali poems with a dataset of just about 2k poems and ran the training for 20 minutes in GPU instance of Colab Pro.

I was really blown away by the quality.

The main training was done in JAX, and it is much faster and seamless than PyTorch XLA, much better than TensorFlow in every way.

So, my point is, although everyone is talking hundreds of billions of parameters and millions in training cost, you can still derive practical value from language models, and that too, at a low cost.

Good to know. We're trying to attempt something similar[1] but for Tamil. I'm also surprised how well the OSS language model & library AI4Bharat [2] performs for NLP tasks against SoTA systems. Is there a way to contact you? [1] https://vpt.ai/posts/about-us/ [2] https://ai4bharat.org/projects/

Among a master's degree, a consultancy gig, personal research and study, and finding unis abroad- I am living a hectic life.

I don't see how I can be of help.

But I can talk. Leave me something through which I can reach you. And I will reach you within a week.

> The future is not as dark as it seems because of the rat race of megacorps.

Just wait until NVidia comes with a "Neural AppStore" and corresponding restrictions. Then wait until the other GPU manufacturers follow suit.

Much of the work done is fully open source and are liberally licensed.

DeepMind and OpenAI have a bad rep in this regard.

But a lot is available for free (as in beer and speech).

And most of the research papers are released in arXiv. It's very refreshing.

The bottleneck is not the knowledge or code, but the compute. People are fighting this in innovative ways.

I have been an inactive part of Neuropark that first demoed collaborative training. A bunch of folks (some of them close to laypeople) ran their free Colab instances and trained a huge model. You can even utilize a swarm of GT1030s or something like that.

Also, if you have shown signs of success, you are very likely to have people willing to sponsor your compute needs, case in point- Eluether AI.

The situation is far from ideal with this megacorps rat race [0], and NLP research being more and more inaccessible, but it is not completely dark.

[0]: I, along with many respected figures tend to think that this scaling up stuff approach is not even useful. We can write good prose with GPT-3 nowadays, that are, for all intents and purposes, indistinguishable from text written by humans. But we are far, far away from true understanding. These models don't really understand anything and are not even "AI", so to speak.

The Transformer architecture, the backbone of all these approaches- is too brute-force-y for my taste to be considered something that can mimic or, further- be intelligent.

> about $6k in TPU cost, but Google Cloud covered it.

I'm glad this all worked out for you. This is unrelated, but I just want to say that I hate how many people Google managed to convert to TPU with their research program and that their managed TPU/GPU offerings are absolutely horrible and infuriating to work with unless you somehow get on their radar.

I have never used TRC, but I heard from many that if you reach out to them, they are really helpful.

I have not converted to TPU, because it is literally offered by only one company. It will be the height of "vendor lock-in".

But, I must say that JAX on TPU is faster than anything and everything that I have ever seen.

That is a fantastic result - nagging question - these work best on predictable things. How much of Bengali poetry is predictable?

> these work best on predictable things

Umm, not really.

You are talking about single and multi-label classification tasks, maybe?

Bengali poetry is just like poetry in any other rich languages like English, French, etc.

What these models do, from a high level, is that they learn the distribution of the data. In this case, they learn the style of the poets.

Writing poetry in specific styles has been done long before Transformer architectures came.


- https://www.tensorflow.org/text/tutorials/text_generation

- https://machinelearningmastery.com/text-generation-lstm-recu...

The goal of poetry generation is to generate something that is unique, is in the poetic style, and is coherrent, grammaticallly correct, ideally indistinguishable in the eyes of a human.

Ah, I see the distinction now, thanks.

It's an interesting subject, I think, since poetry has so many forms it can take and you need the output to capture the idiosyncratic aesthetics and "inner world" of a piece of verse.

I've actually tried using GPT-3 to generate poetry and the naive approach of just sending prompts and text snippets through the API had wildly varying results. Some pretty good, some that were basically word salad and nonsense.

But I also run a poetry journal on the side, so maybe that skews my understanding of it!

Start with LSTM or a distilled version of a language model that is trained for generation task, e.g. GPT2.

There are a lot of pretrained language models (preferably at HuggingFace Hub). Take one, and fine-tune it on a much smaller dataset.

You can then pass prompts to let this model write poetry.

I would also suggest learning the fundamentals of Deep Learning, RNNs, LSTM, Transformer, etc. Because even if you know all these, this kind of taks is not trivial.

I would use HuggingFace + PyTorch for these kind of tasks.

Besides Deep Learning knowledge, you will also need to know your way around tokenizers, controlling parameters, evaluation metrics, etc.

Having some basic understanding of poetry is helpful, but to be able to truly apply your knowledge of running your poetry journal, you will really need to be on the proverbial edge. I hope you get there, and maybe get some papers out!

Although I worked on a language model because none existed in my language, my expertise (and employment) lies in Computer Vision. So, take what I said with a grain of salt.

I must tell you that your intuitions are not wrong.

Many language models do predict.

In this case, they either try to predict what the next word (or character, or sub-character in case of Chinese, Japanese, etc.- this is totally the decision of the DS) is , or what some "masked" word are.

    w_i becomes w_(i-1) in the sequence
    where w_i is the last word generated
The ones that are trained to be able to predict the next word are the ones that are good generators.

What library did you use for JAX on TPU? Also curious how much data you had?

We used Flax, of course.

We thought Haiku was cool, but none of us knew it, and learning resources were scarce. But we are very happy with Flax.

We used the Bengali subset of the mC4 dataset [0] for training the GPT-2 model.

[0]: https://huggingface.co/datasets/mc4

The GPT-3 family is still to expensive to use, too big to fit in memory on a single machine. Prices need to come down before large scale adoption or someone needs to invent the chip to hold it (cheaply).

The most exciting part about it is showing us there is a path forward by scaling and prompting, but you can still do much better with a smaller model and a bit of training data (which can come from the expensive GPT-3 as well).

What I expected from the next generation: multi-modality, larger context, using retrieval to augment input data with fresh information, tuned to solve thousands of tasks with supervised data so it can generalize on new tasks better, and some efficient way to keep it up to date and fine-tune it. On the data part - more data, more languages - a lot of work.

Neat to see more models getting closer, thought it appears only one so far has exceeded GPT-3's 175B parameters.

That said, what I'm really curious is how those other models stack up against GPT-3 in terms of performance -- does anyone know of any comparisons?

I’m surprised that no one has answered for three hours!

The answer is at https://github.com/kingoflolz/mesh-transformer-jax

It has detailed comparisons and a full breakdown of the performance, courtesy of Eleuther.

I was so frustrated when that was first announced because it didn't include those metrics, and everyone ate it up like the models were equivalent.

+1, does the new generation match or exceed GPT-3 in terms of relevance ? Is there a way for a non-AI-researcher to understand how the benchmarks measure this ? Bigger does not mean better.

Whenever I've seen language modeling metrics, GPT-3's largest model has been at the top. If you see a writeup that doesn't include accuracy-type metrics, you're reading a sales pitch, not an honest comparison.

There's Wu Dao 2.0 and Google has 2 models with 1T+ params.

For clarity, i believe these are all mixture of expert models, where each input only sparsely activates some subset subset of the full model. This is why they were able to make such a big jump over the “dense” GPT3. Not really an apples-to-apples comparison.

> most recently NVIDIA and Microsoft teamed up to create the 530 billion parameter model Megatron-Turing NLG

Get it, cause it's a generative transformer? Hah

GPT-3 is the most overrated game in town. And Microsoft spending $1 Billion for an exclusive license will seem really foolish a few years from now.

> And Microsoft spending $1 Billion for an exclusive license will seem really foolish a few years from now.

No it won't. Capital was (and still is) cheap when they made that purchase. A billion used to really mean something, but it's chump change for companies with valuations over $1 trillion.

This move was a defensive move. Keep any other MEGA-CORP from getting access.

Clearly this is not very defensible. More data, more GPUs, more time, and you have similar or better results. There's a reason why Google, FB, Amazon haven't bitten the same bullet for an exclusive license.

I never got why GPT-3 was so closed off, like you needed permission to use it. If it’s so good then why not just make it available?

One of the reasons is ensuring that unsafe generated text will not make it back to the internet.

OpenAI has strict requirements for the usage of GPT-3. For instance, you cannot automate posting to social media without a human in the middle.

How else would you be able to sell a $1 Billion exclusive license to Microsoft?

The concern, nominally, is that it’s too good. They’re worried that it'll lead to a huge influx of things like spam comments and fake news articles.

That's the first time I've heard of a company being worried that their product is too good.

This is legitimately dangerous technology. It probably(?) wouldn't pass the Turing Test, but it would fool a lot of people.

I wish Microsoft would have worried that Windows was too good, and never have released it for this reason. Then we wouldn't have had to go through the whole Windows thing.

Because of vast piles of bad science fiction being taken as fact and outright hysteria.

> Because of vast piles of bad science fiction being taken as fact

None of these systems could work if only facts were used as input. Maybe some "factual" score could be presented with the data, but wow, that's a philosophical problem there.

> If it’s so good then why not just make it available?

OpenAI is pivoting to corporate evil, and to do that properly they need proprietary assets to rent out.

I think the reasoning was that GPT-3 could easily be used to fill the world with realistic bullshit that would take ages to debunk.

As if this stopped anyone ever! This logic is immediately rationalized with the famous "Someone will do it anyhow!" statement.

I think other comments are more valid. Sale potential. Buyouts.

surely the creators know that sooner or later the cat will escape the bag...

They bet with Microsoft on automating programmers (or at least coders)

Are we heading to the (distant) future where to make progress in any field you have to spend big $$$ to train a model?

Fortunately, costs for training superlarge models are coming down rapidly thanks to TPUs (which was the approach used to train GPT-J 6B) and DeepSpeed improvements.

Are there any TPUs that can be purchased off-the-shelf and then owned, like you can do with a CPU or GPU? Or are you just limited to paying rent to cloud providers and ultimately being at their mercy when it comes to pricing, ToS, etc?

No, but you probably aren't going to buy an A100 either, so it's a moot point.

An A100 looks to be about $12k or so. A bit out of reach for individuals, but not so bad as a business expense, but maybe you can use it to mine Bitcoin when you're not using it to train models to help pay for itself or something.

I don't think these are good for training though, unfortunately.

That’s not even distant - most of the self-supervised vision and language models at the bleeding edge of the field require huge compute budgets to train.

We are already there. Machine learning is the flavor of A.I. that keeps business barriers of entry high. If we had invested in symbolic A.I., things would be different. A similar thing happens with programming language flavors. PHP lowers barriers of entry so it is discredited by the incumbents.

The difference between ML and symbolic AI is that ML works and symbolic AI doesn't. At my job, dropping the computational load of our ML models is heavily invested in, and every success is celebrated. Everybody wants it to be easier and cheaper to train high quality models, but some things are still intrinsically hard.

> The difference between ML and symbolic AI is that ML works and symbolic AI doesn't.

IBM managed to beat Garry Kasperov using symbolic AI did they not? So in what way does it not work?

> IBM managed to beat Garry Kasperov using symbolic AI did they not? So in what way does it not work?

Ok, I should be clearer. ML approaches are way way better than symbolic approaches. Given almost any problem, it is much much easier to make an ML approach work than any symbolic approach.

Yes, chess was first solved symbolically, but it's since been solved by ML better and more easily, to the point that stockfish now incorporates neural nets [1]. ML has also given extremely high levels of performance on Go, Starcraft, DoTA, and on protein folding, image recognition, text processing, speech recognition, and pretty much everything else.

I would challenge you to name any (non-simple) problem where traditional AI methods are still state of the art.

[1] https://stockfishchess.org/blog/2020/stockfish-12/

>> I would challenge you to name any (non-simple) problem where traditional AI methods are still state of the art.

Theorem proving, classical planning, SAT solving, robotics, search, in particular adversarial search, program induction, knowledge representation.

Plus all the stuff that used to be considered "AI" but aren't anymore, like rule-based systems (e.g. for fraud detection) etc.

Sorry, I know you asked for only one.

Great answers. If we check back in a couple years, I'm sure we'll see good learned approaches to most of these problems.

Robotics is already on its way there. Is this program induction https://paperswithcode.com/task/program-induction ? Looks like it's headed towards ML too. I suspect you could stick ML into SAT solving and get yourself a system that worked pretty decently.

Program induction (or inductive programming, program synthesis from incomplete specifications etc, the task of learning programs from examples) is not "headed towards ML", it is a branch of machine learning research - only, one that is dominated by symbolic learning approaches such as Inductive Logic Programming and Inductive Functional Programming.

When you say "ML" you probably mean the deep neural networks approaches that are currently state of the art for machine vision etc. Deep neural network approaches have been proposed for the task of program induction but they generally lag well behind symbolic machine learning approaches.

The most coherent efforts to tackle program induction by neural networks that I am aware of is the work of Dawn Song's group at Berkeley [1] and of Joshua Tenenbaum's group at MIT. I can't find a handy link to a compilation of the latter group's work but the Dreamcoder paper in the paperswithcode search you linked to was an interesting milestone [2].

There is a lot of work on neuro-symbolic approaches to program induction, for example see the recent (two weeks ago) NeSy workshop [3], part of the first International Joint Conference on Learning and Reasoning for some new work in that burgeoning field. Statistical Relational AI combines symbolic with probabilistic learning; see the STAR-AI workshop [4] also at IJCLR.

If you're interested in recent developments on the front of program induction (again, learning programs from examples) then IJCRL is the conference to keep an eye on.


[1] https://sunblaze-ucb.github.io/program-synthesis/index.html

[2] https://arxiv.org/abs/2006.08381

[3] https://sites.google.com/view/nesy20/home

[4] https://starai.cs.kuleuven.be/2021/

I've never met a coherent definition of symbolic AI so I'll just focus on what it contained: some type of search or inference algorithm (iterative deepening, depth first search being major ones) often combined with heuristics written in a programming language like prolog or lisp. Specification + Inference ≊ Specification + Control flow. That means Probabilistic models written in a language like Stan, which is Inference + Specification, fit neatly into so called symbolic AI (which is basically just programming with search/inference).

These search and sampling algorithms still play key roles in game playing AI (chess, poker, Go) and natural language generation. It is the human knowledge, specification and heuristics, part that tends to be more readily replaceable. A lot of control flow and data-structures that powered old AI approaches can be found in databases, compilers, type inference, computer algebra and even the autodiff libraries neural nets are written in.

Video game AI, constraint solving and business rules engines are probably closest to still using the full symbolic approach rather than merely extracting the control flow and structures portion.

We can therefore make a compact prediction: learned approaches replace human written computer programs (specifications, rules systems or heuristics) whenever human contribution is not valuable or is somehow harmful to robustness/generalization.

>> Great answers. If we check back in a couple years, I'm sure we'll see good learned approaches to most of these problems.

Around November 2023? I'll try to remember.

“I would challenge you to name any (non-simple) problem where traditional AI methods are still state of the art.”

Lossless file compression. As far as I know none of the algorithms in widespread use are neural-based, despite the fact that compression is clearly a rich statistical modeling problem, at least on par with GPT-3-style language understanding in difficulty. There are published attempts to solve the problem with neural networks, but they simply don’t work well enough to date. Modern solutions also still use old-fashioned AI ingredients like compiled dictionaries of common natural-language words — any other domain where nat-lang dictionaries are useful has been conquered by neural solutions, e.g. spelling and grammar checkers.

I'm far from an expert in this subject but doesn't this ranking of large text compression algorithms with NNCP coming first suggest that neural-nets are pretty great at compression?



I don't see examples of high performing symbolic AI based compression algorithms anywhere, but again I am very ignorant, do you have examples?

The ranking criteria of this list make it very unrepresentative of compressors used in the real world. The benchmark they’re using for example is the sum of the compressed file plus the compressor binary: this penalizes memorization of the evaluation text in the compressor binary itself. But in the real world, you would have no concerns at all that your compressor is “cheating” by working too well only for your particular data — having useful priors that model real-world data for more compact representations is the whole point. Many of these algorithms are also impractical due to speed or memory use. Ask yourself: How many of the top-10 algorithms do you have installed right now, or even recognize? The winners aren’t dominating outside the arena of this list.

I’m also not an expert in symbolic AI — my comment above is more about neural vs. pre-neural NLP methods, rather than symbolic AI, which I admit drifts a bit from the parent. A compressor replacing word tokens with dictionary indices is definitely symbolic but it’s not especially “AI”.

> I would challenge you to name any (non-simple) problem where traditional AI methods are still state of the art.

Did ML methods best classic AI in dialog comprehension, say to the level of SHRDLU? I'm curious, can ML system do that - https://en.wikipedia.org/wiki/SHRDLU ?

Short answer: no. The capabilities of SHRDLU remain unsurpassed by modern systems.

A simulated robot hand controlled by natural language to move blocks inside a virtual world. It's not terribly useful but nothing that was created since can do any better and the state-of-the-art NLP approach of large language models is completely incapable of anything like it.

Which is a bit sad, really, if you think that SHRDLU was created by one graduate student, fifty years ago.

How about LOReL, where a robot manipulates objects and open drawers when given text instructions: https://sites.google.com/view/robotlorl

Thanks for clearing that up, I do agree that ML-based AI has surpassed symbolic approaches in every field.

1. There's a world of problems (such as "perception-related" e.g. vision and NLP) which we tried to solve for decades with symbolic AI and got worse results than what nowadays first-year students can do as a homework with ML;

2. For your example of chess, for some time now ML engines are pretty much untouchable by engines based on pre-ML methods.

>> 1. There's a world of problems (such as "perception-related" e.g. vision and NLP) which we tried to solve for decades with symbolic AI and got worse results than what nowadays first-year students can do as a homework with ML;

Perception tasks were traditionally attempted with statistical machine learning approaches rather than symbolic AI, for example the Perceptron was a very early neural network that was used in machine vision, created by Frank Rosenblatt in 1958.

A lot of that research was carried out under the rubrik of "pattern recognition" rather than machine learning. In any case, no, "we" did not try "to solve [those problms] for decades with symbolic AI". Symbolic AI has traditionally focused on reasoning, which is generally considered to be on some kind of separate level to perception.

As to chess engines, they're still symbolic-statistical hybrids. E.g. the Alpha-x family combines Monte Carlo Tree Search with neural nets that learn an evaluation function etc.

Various NLP tasks used to be very heavy on symbolic approaches (with some of them still being used), I myself worked on them for some years until the statistical approaches started to work better. For computer vision, I would probably consider the work on edge detectors, HOG and SIFT algorithms as the "symbolic" heritage for object detection which has now been replaced with pure ML.

NLP has been dominated by statistical learning approaches for some time, that's true, although when I did my Master's in 2016, I seem to remember the Brill tagger was still considered state-of-the-art and there was still a lot of work on learning PCFGs or dependency grammars. Perhaps that just happened to be what the tutors at my course were working on, though.

In any case, it seems to me that while real progress has been achieved in language modelling, the same cannot be said for language understanding. That's a bigger conversation but anyway, modelling is still what statistical learning techniques do best, whereas anything to do with semantics, you still need some kind of symbolic approach.

I never thought of HOG and SIFT as "symbolic". If I remember correctly, they were just sets of hand-crafted features? But, features for classifiers, like SVMs and so on.

Yes I agree with all your points - I was however responding to the point being made that symbolic AI "wasn't useful"...which in the past it was. Perhaps in the future some new method or breakthrough will mean it becomes useful once again?

this is a great point.

much like deep learning was invented decades ago but didn't become feasible until technology caught up, could the same be true for symbolic AI?

i.e., is the ceiling for symbolic AI technical and transient or fundamental and permanent?

My feeling is that even in our own thinking symbols are used mostly to communicate our (inherently non-symbolic) thoughts to others or record them; i.e. they are a solution to a bandwidth-limited transfer of information while the actual thinking process happens with concepts that have more similarity to collections of vague parameters and associations which can be compressed to symbols only imperfectly with losses.

From that perspective, I don't see how symbolic AI would be competitive but there would be a role for symbolic AI in designing systems that can be comprehensible for humans, but perhaps just as a distillation/compression output from a non-symbolic system. I.e. have a strong "black box" ML system that learns to solve a task, and then have it construct a symbolic system that solves that task worse, but in an explainable way.

they didn't. that was just alpha beta search with some custom hardware to speed it up. also at this point, both of the strongest chess ai (stockfish and lc0) are using neutral networks and are roughly 1000 elo above where deep blue was (and most of that is from software, not hardware)

> just alpha beta search

I will cling to these goal posts every time. Search was and still is AI, unless you think Russell and Norvig should have named the field's foundational textbook something other than "Artificial Intelligence: A Modern Approach"

I was more arguing with symbolic than AI.

>The difference between ML and symbolic AI is that ML works and symbolic AI doesn't.

There was a point when it was the other way around, this is not static but the result of resources being poured. The data heavy, computational heavy, black box style of ML gives power to large business over small business. So it's seen as a safer bet than symbolic A.I. This in turn makes it work better, which makes it an even safer bet. Notice that startups dream of being big business so they still pick ML.

Also notice that in some domains ML is still behind symbolic A.I., for instance a lot of robotics and autonomous vehicles.

> Also notice that in some domains ML is still behind symbolic A.I., for instance a lot of robotics and autonomous vehicles.

Classical AI has failed in robotics. There's practically no field where it does worse than in robotics. It's getting cut out of every part of every system one piece at a time, and being replaced by ML methods that really work. Even Kalman Filters aren't safe.

It's been a few years since I investigated, but Boston dynamics was using mostly old non ML methods. See [1]

IIRC and Waymo vehicles used ML for some of its perception but also depended heavily on the lidar and a rules based approach. Sadly I can't find a link at the moment.

What examples are you thinking of?

[1] https://www.quora.com/How-does-Boston-Dynamics-use-AI-machin...

Here's Tesla's AI day presentation on how they are going to use AI in their planning system because traditional methods aren't scalable enough: https://www.youtube.com/watch?v=j0z4FweCy4M&t=4392s (1:20:15). You can see hints of this in presentations from the other self-driving companies.

Boston Dynamics was definitely an exception for a long time. Certainly the perception systems that they are adding use ML, but you are right that they use expressly use classical methods for their control system.

PHP wasn't discredited by the incumbents. It was discredited by its creator.

"I'm not a real programmer. I throw together things until it works then I move on. The real programmers will say Yeah it works but you're leaking memory everywhere. Perhaps we should fix that. I'll just restart Apache every 10 requests." -Rasmus Lerdorf

"I was really, really bad at writing parsers. I still am really bad at writing parsers." -Rasmus Lerdorf

"We have things like protected properties. We have abstract methods. We have all this stuff that your computer science teacher told you you should be using. I don't care about this crap at all." -Rasmus Lerdorf

To most programmers that doesn't discredit PHP at all. He cares about a working product, much like 90% of programmers, who don't have the privilige to worry about theory. They just need an ecommerce, or blog or whatever, running asap. To use a pg's analogy, they are there to paint not to worry about painting chemistry.

The incumbents do discredit PHP though. For instance, facebook was built on PHP, and still runs on it. They used the language of personal home pages to give every person on the planet a personal home page. Nevertheless, once they suceeded they forked PHP with a new name and isolated devs culturally.

It’s not about practice versus theory. It’s about the actual costs of writing fast PHP versus the cost of writing good secure code (which is also possible in variants of PHP).

Terrible code in PHP is possible, therefore likely. I say this having spent over a decade writing it and half of that time fixing OWASP bugs created in it.

The incumbents hate it because their vendors use it and everyone is worse off for having their vendors use it. And Facebook did use PHP in the first few years, but they quickly started compiling it (HipHop) and later converted their code based to use a different strongly-typed language based on PHP (Hack). They stopped using PHP because it is a starter language.

PHP is also discredited by its apologists.

If you're just there to paint, but painting with mashed potatoes, you SHOULD have been more worried about your paint chemistry.

"Using these toolkits is like trying to make a bookshelf out of mashed potatoes." -JWZ

Your point about incumbents not wanting it to be easier to create beginners in a language or technology is very understated.

Excluding participation in having the time and resources available to overcome the initial inertia required to become productive is a form of opportunity and earning segregation.

Despite having a background in your tech, there is little more if satisfying than people experiencing putting tech to work for them, rather than the other way around or being dependent on others.

And Facebook's success would like a word with them. Even if the company magically disappeared tomorrow, the millionaires who've already cashed out built atop some shitty PHP app that a college kid wrote don't care one whit over that discreditation.

Training model is getting cheaper. GPT-3 is one of the very few countable examples where it is so expensive. In the end it all depends on the size of data you have that you could scale up the model without overfitting. And the internet text data is one of the only data size is this big.

For most purposes you don’t need to train from scratch. Instead you fine-tune an existing model, on smaller amounts of data and for a fraction of the time.

This is akin to teaching an adult human about a specific domain. Better to just do that than make a whole new human from scratch!

>However, the ability of people to build upon GPT-3 was hampered by one major factor: it was not publicly released. Instead, OpenAI opted to commercialize it and only provide access to it via a paid API. This made sense given OpenAI’s for profit nature, but went against the common practice of AI researchers releasing AI models for others to build upon. So, since last year multiple organizations have worked towards creating their own version of GPT-3, and as I’ll go over in this article at this point roughly half a dozen such gigantic GPT-3 esque models have been developed.

Seems like aside from Eleuther.ai you can’t use the models freely either, correct me if I’m wrong.

I believe you are correct, at least for GPT-3 scaled things. Hopefully that'll change with time, though.

I have supported my beliefs on this topic in these threads to the point of exhausting myself. The tools that we use to find these agents are the underpinning of AGI, it’s coming way faster than even most people here appreciate, this development is intrinsically against the interest of human beings. Please stop and think, please.

I argue it's very much in the interest of human beings. It has been since we first picked up a rock and used it has a hammer. It's the ultimate tool and has the potential to bring unseen prosperity.

Since you two have expressed two very extreme opinions, I would like to express my very bland opinion that this will effectively be the equivalent of when human computers of the early 1900s were replaced with microchips. We adapted, and in this instance the human computers are just propagandists.

I believe this will displace the workers of places like the Internet Research Agency and some Marketing specialists. While that accounts for a large percentage of propaganda jobs, that is easy enough to adapt from (nor do I really care given who is generally doing it), and marketing people who I generally don't have respect for either (I mean their founding document is literally titled "Propaganda"). The real marketing pros are good at managing hard to handle talent or bad PR disasters, which are actually meaningful skills that this won't affect.

So basically no moral or real consequential impact on society whatsoever.

It won’t. You’re wrong.this is the perfect illustration. You think a rock is good therefore AI is good. You’re just unbelievably wrong.

So is there any one of them that I could play around with?

AI21 labs 178B parameter model


If I were to run GPT-3 on my 70000 browser bookmarks, what kind of insights could I get from that?

Only by analyzing the page title (from the bookmark, not by re-fetching the url) and eventually also the domain name.

GPT-3 is a text generator, so i doubt you would get anything of use. You cant even supply such a large input to GPT-3.

GPT-3 is also a classifier and data-extractor.

You could give it a couple dozen bookmarks with example classifications and then feed it a new bookmark and ask GPT-3 what category the page belongs in. Repeat for the entire data set.

For data extraction you could ask questions about the titles. Maybe have it list all machine-learning model names that appear in the set of bookmark titles.

It's pretty good at extending lists. That might require some sorting first.

Gosh, anyone else remember Google Sets? You typed in a few things, eg colors, and Google would add some more things of the same type.

print gpt.submit_request("Give me insights")

>>> You are spending way to much time browsing.

Number of parameters aside, I am really surprised that we havent yet reached hundreds of TB of training data. Especially Chinese model only used less than 10 TB of data.

Is there any similar version to GPT-3 available for free? Or usable online via web interface?

Is there any estimate out there of how many joules of energy it took to train GPT-3?

Can anyone tell me what the value of GPT-3 actually is other than generating meaningless prose? What would a business use it for

Actually using this class (larger transformer based language models) of models to generate text is to me the least interesting use case.

They can also all be adapted and fine tuned for other tasks in content classification, search, discovery etc. Think facnial recognition for topics. Want to mine a whole social network for anywhere people are talking about _______ even indirectly with very low false negative rate? You want to fine tune a transformer model.

Bert tends to get used for this more because it is freely available, established and not too expensive to fine tune but i suspect this is what microsoft licensing gpt-3 is all about.

Have you heard of GitHub copilot ? It’s based on GPT3 and I can tell you one thing: it does not generate meaningless prose (90% of the time)

The fact that GPT3 works at all for coding indicates that our programming languages are too low level and force a lot of redundancy (low entropy). From a programmer productivity optimization perspective it should be impossible to reliably predict the next statement. Of course there might be trade offs. Some of that redundancy might be helping maintenance programmers to understand the code.

I think this is kind of true, but also kind of not true. Programming, like all writing, is the physical manifestation of a much more complex mental process. By the time that I know what I want to write, the hardwork is done. In that way, you can think of co-pilot as a way of increasing the WPM of an average coder. But the WPM isn't the bit that matters. In fact almost the onlt thing that matters are hte bits you won't predict.

Indeed, in the limit of maximal abstraction, i.e. semantic compression, code becomes unreadable by humans in practice. We can see this in code golf competitions.

Let me rephrase:

> The fact that GPT3 works at all for English indicates that English is too low level and forces a lot of redundancy (low entropy).

I don't think the goal is to compress information/language and maximize "surprise".

That would only follow if we were trying to optimize code for brevity, and I have no clue why that would be your top priority.

I have indeed seen codebases where it seems like the programmer was being charged per source code byte. Full of single letter methods and such - it takes a large confusion of ideas to motivate such a philosophy.

Not at all. Brevity (or verbosity) is largely orthogonal to level of entropy or redundancy. In principle it ought to be possible to code at a higher level of abstraction while still using understandable names and control flow constructs.

I’m really not sure about that. By definition, high entropy means high information density (information per character). So with the same amount of information you would have less characters.

Think in terms of information density per symbol on the abstract syntax tree, not per character in the source code.

>From a programmer productivity optimization perspective it should be impossible to reliably predict the next statement

Why? 99.9% of programming being done is composition of trivial logical propositions, in some semantic context. The things we implement are trivial, unless you're thinking about symbolic proofs etc

I think that’s precisely the problem the parent commenter is describing.

I write code directly as gziped text. Saves lots of key strokes and you can use off the shelf compilers by piping through zcat first

Yes I'm used it now but first time it started doing its thing, I wanted to stop and clap for how jaw dropping and amazing this technology is.

I was a Jetbrains fan but this thing takes productivity to a whole new level. I really don't think I can go back to my normal programming without it anymore.

Someone at work showed me copilot works on WebStorm today (I also use VSCode).

Luckily, there’s a jetbrains addon for it.

This - it is tremendously valuable to me and I use it all the time at work.

But what about the potential for intellectual property problems?

That's beside the point, which is that the output copilot produces is useful.

I don't see how that's besides the point. How can it be that useful if the output is a such legal mystery?

I'd love to use it but not when there's such a risk of compromising the code base.

What do you use it for?

Coding, I actually had to forbid it today in a course I teach because it solves all the exercises :) (given unit tests with titles students needed to fill those tests in)

Isn't that just because others have stored solutions to these problems in GitHub?

Probably, also I’m sure 99%+ of the code I author isn’t groundbreaking and someone did it before.

That is my question too. Is it a fancier autocomplete? Or does it reason about code?

It reasons over the semantic network between tokens, in a feedforward inference pass over the 2k(ish) words or tokens of the prompt. Sometimes that reasoning is superficial and amounts to probabilistic linear relationships, but it can go deeply abstract depending on training material, runtime/inference parameters, and context of the prompt.

In some sense you could think of as a fancy autocomplete which uses not only code but also comments as input, looks up previous solutions for the same problem but (mostly) appropriately replaces the variable names to those that you are using.

Code is easier to write than read and maintain, so how useful is something that generates pages of 90% correct code?

It's not useful if you use it to auto complete pages of code. It is useful to see it propose lines, read, and accept its proposals. Sometimes it just saves you a second of typing. Sometimes it makes a suggestion that causes you to update what you wanted to do. Sometimes it proposes useless stuff. On the whole, I really like it and think it's a boon to productivity.

How many % of the time does it produce code that compiles?

I was exceptionally skeptical about it, but it's been very useful for me and I'm only using it for minor tasks, like automatically writing loops to pull out data from arrays, merge them, sort information, make cURL calls and process data, etc.

Simply leading the horse to water is enough in something like PHP:

// instantiate cURL event from API URL, POST vars to it using key as variable name, store output in JSON array and pretty print to screen

Usually results in code that is 95-100% of the way done.

From my anecdotal experience, the vast majority of the time (90+%).

Interesting. Is there any constraint built into the model that makes this possible? E.g. grammar, or semantics of the language? Or is it all based on deep learning only?

Deep learning only I believe. But real good one

In my experience 95% of the time. And 80% of the time it output codes which is better than I would have done myself in a first approach (thinks of corner cases, adds meaningful comments etc.). It’s impressive.

The overwhelming majority. Whatever used to take me an hour or two is now a 10-minute task.

I am so confused. Is there a tutorial explaining how you are using in the IDE whatever it is. I use vscode curious if it can be applied. Thanks

It works very well with VSCode. It has an integration. It shows differently than normal autocomplete, it shows just like gmail autocomplete (grayed out text sugggestion, and press tab to actually autocomplete). Sometimes the suggestion is just a couple tokens long, sometimes it’s an entire page of correct code.

Nice trick: write a comment describing quickly what your code will do (“// order an item on click”) and enjoy the complete suggested implementation !

Other nice trick: write the code yourself, and then just before your code, start a comment saying “// this code” and let copilot finishe the sentence with a judgement about your code like “// this code does not work in case x is negative”. Pretty fun !

Interesting second use case; I use comments like this already as typical practice and I agree Copilot fills in the gaps quite well - never thought to do it in reverse... will give that a shot today.

I also like to do synthesis from example code (@example doccomment) and synthesis from tests.

GPT-3 is fairly effective at summarization, so that's one potential business use case:


At https://reviewr.ai we're using GPT-3 to summarize product reviews into simple bullet-point lists. Here's an example with backpack reviews: https://baqpa.com

Did you test it against extractive summarizers?

We experimented with BERT summarization, but the results weren't too good. Do you have any resources or experiences in this area?

That sounds like BERT alright.

How do you avoid libel?

Are you confusing libel with something else? Can you extrapolate what you mean here? Are you saying that they will be liable for libel (!) if they publish a negative summary of a product?

If they mischaracterize a positive review into a negative summary based on factual mistakes they know the system makes at a high rate, I would think they would be liable for libel right?

Maybe under UK "judgements widely banned from enforcement" libel laws but it would be basically impossible for ML to fall afoul of it in the US. It could not even be knowingly false. Reckless disregard for the truth would also be hard to argue as it is meant to be a best effort in accuracy.

Hey for a long time i was also very sceptical. However i can refer you to this paper to a really cool applciaiton. https://www.youtube.com/watch?v=kP-dXK9JEhY. They baseically use clever GPT-3 prompting to create a dataset, you then train another model on. Besides, you can prompt these models to get (depending on the usecase) really good few shot performance. And finally, github copilot is another pretty neat application.

It's good for the university-industrial-business complex - people writing papers about a model they can't even run themselves. It practically prints money in journal articles, travel per diem, and conference honorariam, not even counting the per-API call rates.

I hope that one day it will allow me to write down my thoughts in bullet-list form, and it will then produce beautiful prose from it.

Of course this will be another blow for journalists, who rely on this skill for their income.

I played with GPT-3 giving it long news stories. It actually replied with more meaningful titles than the journalists themselves used.

Perhaps GPT-3 was optimizing to deliver information while news sites these days optimize titles to get clicks.

Automatic generation of positive fake customer reviews on Amazon, landing pages about topics that redirect to attack and ad sites, fake "journalism" with auto-generated articles mixed with genuine press releases and viral marketing content, generating fake user profiles and automated karma farming on social media sites, etc. etc.

> fake "journalism" with auto-generated articles mixed with genuine press releases and viral marketing content

How would you tell the difference from the real thing these days?

The state of the journalism is so poor, I'd rather take some AI generated articles instead.

You can prompt GPT-3 in ways that make it perform various tasks such as text classification, information extraction, etc... Basically you can force that "meaningless prose" into answers to your questions.

You can use this instead of having to train a custom model for every specific task.

While the generation is fun and even suitable for some use cases, I'm particularly interested in its ability to take in language and use it for downstream tasks.

A good example is DALL-E[0]. Now, what's interesting to me is the emerging idea of "prompt engineering" where once you spend long enough with a model, you're able to ask it for some pretty specific results.

This gives us a foothold in creating interfaces whereby you can query things using natural language. It's not going to replace things like SQL tomorrow (or maybe ever?) but it certainly is promising.

[0] https://openai.com/blog/dall-e/

You can try it yourself - apply for a free API license from OpenAI. If you like to use Common Lisp or Clojure then I have examples in two of my books (you can download for free by setting the price to zero): https://leanpub.com/u/markwatson

I know of some credible developers who were struggling to get access, so YMMV

It took me over a month, so put in your request. Worth the effort!

I put in a request months ago, I think they're not approving people anymore.

You might try signing up directly for a paid non/free account, if that is possible to do. I was using a free account, then switched to paying them. Individual API calls are very inexpensive.

GPT-3 and similar ML/AI projects may have many interesting and valuable commercial applications, not all of which are readily apparent at this stage of the game. For instance, it could be used to insert advertisements for herbal Viagra at https://www.geth3r3a1N0W.com into otherwise-apropros comments on message boards, preferably near the end once it's too late to stop reading.

Life online is about to become very annoying.

I work for a company that re-sells GPT-3 to small business owners. We help them generate product descriptions in bulk, Google ads, Facebook ads, Instagram captions, etc.

Chat bots are an usage. I think you might use it for customer support.

One example of GPT-3 powered chat bot: https://www.quickchat.ai/emerson

That’s like if an alien took Mozart as a specimen and then disregarded the human race because this human, while making interesting sounds, does nothing of value. You have to look at the bigger picture.

>What would a business use it for

If you think about business uses you can actually get advices from Jerome Powell, simulated by GPT-3.

If someone use GPT-3 to simulate Warren Buffet, he can extract even more value.


> Interviewer: Are you in favor of a carbon tax?

> Mr. Powell: I don’t want to get into the details of taxes.

> Interviewer: Are you in favor of a cap and trade system?

> Mr. Powell: I don’t want to get into the details of a cap and trade system.

> Interviewer: How do you think a global pandemic would impact the financial system?

> Mr. Powell: A pandemic will have impact on the economy. It will have an impact on the financial system.

> Interviewer: What would the impact be?

> Mr. Powell: If a major pandemic occurred, the economy would be affected, and the financial system would be affected.

Maybe I'm a bit harsh on GPT-3, but I'm not nearly as fascinated by this kind of output as the author.

I, too, that sounded like Eliza. Anyway, it looks like that’s a small excerpt from the conversation.

It looks like the dialogue is only on the human end. The chatbot is treating each question as the first. I think it sounds a lot like Biden. I prefer that to Trump, but don't like either sort of conversation!

It does pretty well at transforming text into a person's style of talking. So you could have it re-write any sentence to sound like a Trump tweet.

Somehow I don't think that this is quite how Jerome Powell would respond in an interview:

> Interviewer: How do you think a global pandemic would impact the financial system?

> Mr. Powell: A pandemic will have impact on the economy. It will have an impact on the financial system.

> Interviewer: What would the impact be?

> Mr. Powell: If a major pandemic occurred, the economy would be affected, and the financial system would be affected.

Yes, GPT-3 can provide a convincing chatbot, but it shouldn't be confused for domain expertise.

This is a bad example. Here's an output from gpt-j-6b by EleutherAI: Prompt( Jerome Powell was interviewed about the impact of a pandemic on the economy.

> > Interviewer: What would the impact be? > Mr. Powell: If a major pandemic occurred)

Output( it could be very disruptive. > > Interviewer: How disruptive? > Mr. Powell: Well, it could be disruptive in a number of ways. > > Interviewer: How so? > Mr. Powell: Well, one of the first things that would be disrupted would be the supply chain. )

Using prompts well makes a huge difference.

If you parse the generated output, classify it, then develop a decision tree that uses further prompts to refine the response, you can get more sophisticated, valuable responses.

The output in the parent is comparable to an off-the-cuff interview response. If you emulate a deeper thought process, you can get more meaningful output, and if you use the right prompts, you can access the semantic networks in the model related to your domain of interest.

I think the "bad example" is actually the good one, because it's a reminder that actually you're not getting business advice from someone with Warren Buffet or Jerome Powell's understanding of the economy, you're getting text generated by analysing patterns in other not-necessarily-applicable text. If you start forcing it in very specific directions you start getting text that summarises the commentary in the corpus, but most of that commentary doesn't come from Warren Buffet or Jerome Powell and isn't applicable to the future you're asking it about...

I can't tell whether or not this article is parody... is this a new kind of turing test

Hey, I work there! To be honest it's still very much a prototype. We have big plans for the next few months.

People were blaming cryptocurrencies miners for the prices of GPUs, when in fact it was the AI researchers who bought all the GPUs. :D

I wonder what if somebody designs an electronic currency rewarded as payment for general GPU computations instead of just computing hashes? You pay some $, to train your model and the miner gets some coins.

Every one is happy, electricity is not wasted and the GPUs gets used for a reasonable purpose.

The current way of training is efficient when compute is located in a single place and is colocated with large quantities of training data. Distributing small parts of computation to remote computers is theoretically possible (and an active direction of research) but currently not preferable nor widely used; you really need very high bandwidth between all the nodes to constantly synchronize the hundreds-of-gigabytes sized weights they're iterating on and the resulting gradients.

This may not be true in the future. There is some work being done on distributed neural net training. I can't recall the name of the technique at the moment, but a paper came out in the last year showing results comparable with backprop that only required local communication of information (whatever this technique's alternative to gradients is).

My understanding is that proof-of-work is intentionally wasteful; the objective is to make 51% attacks (where a single entity controls at least 51% of the global hashrate) infeasible by attaching a cost to the mining process.

Making the mining process produce useful output that can be resold nullifies the purpose as it means an attacker can now mine "for free" as a byproduct of doing general-purpose computations (as opposed to tying up dedicated hardware), lowering the barrier for a 51% attack dramatically.

Yes, this is an old idea (which I really like) but it hasn't really taken off yet. GridCoin was one example, where you solved BOINC problems or RLC that's for more general computation.

The problem is that, currently, large ML models need to be trained on clusters of tightly-connected GPUs/accelerators. So it's kinda useless having a bunch of GPUs spread all over the world with huge latency and low bandwidth between them. That may change though - there are people working on it: https://github.com/learning-at-home/hivemind

It hasn't taken off because it doesn't work. PoW only works for things that are hard to calculate but easy to verify. Any meaningful result is equally hard to verify.

> Any meaningful result is equally hard to verify.

This is very much not true. A central class in complexity is NP whose problems are hard to answer but easy to verify if the answer is yes.

E.g. is there a path visiting all nodes in this graph of length less than 243000? Hard to answer but easy to check any proposed answer.

It's easy to verify ML training - inference on a test set has lower error than it did before.

Training NN ML is much slower than inference (1000x at least) because it has to calculate all of the gradients.

It'd be awesome if instead of the proof of work algorithms being based on generating useless hashes, they were instead based on computing protein folding simulations in a way that actually benefits everyone.

If everyone offer GPUs, is the same game. If I will buy more GPU I will get more money, so the average payment for a person with a single or a small bunch of GPU will be low.

And second, the principles of electronic currency are different from gold/money. That's why crypto uses GPU ;)

I think companies should be banned from having "Open" in their names.

OpenAI takes the Orwellian cake.

I hear a lot of low effort takes about OpenAI but how exactly is providing your service via a paid API the "Orwellian cake"? Is this really the most (or even at all) Orwellian practice for you?

I think it's more the contrast where they claim, via their name, to be open, but actually aren't.

If their name was ProfitableAI, there'd probably be fewer complaints.

"Open"AI but you can only use it how we want you to and no, you can't run it yourself.

The only thing open in OpenAI is your wallet.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact