AI language models are struggling to “get” math

mjburgess · on Oct 12, 2022

How much of this is just "AI is bad at everything", but in the math case, it's easier for the lay person to tell.

It's all just passable garbled nonesense that the reader (goes to lengths) to interept based on their prior knowledge, which is not expressed in the syntax of what these systems output.

In the case of mathematics, we're far less willing to "BS away" the interpretive failures. But if we were equally demanding, likewise, all prose generated by these systems isnt AI "getting" anything either.

Pass a film reel thru' a shredder and an art student would still call it a film. Pass math thru' and a mathematician wont. This says more about our ability and inclination to make sense out of nonesense when in apparent communicative situations (since, when speaking to a person, this actually improves our mutual understanding).

So, how much of AI is just hacking people's cognitive failures: (1) people's willingness to attribute intention; (2) people's willingness to impart sense "at all costs" to apparent communication; and (3) "hopeium".

woah · on Oct 12, 2022

Have you ever used Github CoPilot? It does a lot of useful work, automating away rote typing in programming. Have you tried Dall-E or Stable Diffusion? They make good looking images. This comment seems completely unmoored from where the state of the art is right now.

TuringTest · on Oct 12, 2022

Math follows a completely different approach with respect to how machine-learning AIs do their thing.

Reason derives its strength in having a few primitives and creating new assertions through the transformation of symbols by following precise rules (which is how algorithms work).

In ML-based AIs, everything is imprecise and probabilistic, and this kind of generation gets its strength from building recognizable from utterly imprecise inputs and training - quite the opposite of how logic and reason evolve. Now, "classic" AI was a powerful way to derive new knowledge, and automatic theorem proving is a strong discipline; but the recent breakthroughs in AI are not directly applicable to classic techniques.

Do you know what machine-learning AIs could be good for? Generating "insight" in problem solvers for guiding the theorem demonstrations through the proof search space, trying to find the best sub-spaces to explore. If there's a way to create human-like general AI, it will likely combine both kinds of generation - the rational methods of symbolic logic and the "irrational" statistical methods of ML.

zmgsabst · on Oct 12, 2022

Automated theorem proving is the same problem as “complete and label the diagram”, which image generation is okay at.

Work in progress for sure, though.

NateEag · on Oct 13, 2022

> Automated theorem proving is the same problem as “complete and label the diagram”, which image generation is okay at.

How so?

I'm no mathematician, but I don't see how these problem types are equivalent. Could you elaborate?

zmgsabst · on Oct 13, 2022

Sure — the connecting topic is topos theory.

For a type theory we might want to reason about, there’s a diagram (in category theory) which represents the same semantic content. These diagrams turn out to have recurring and common structures.

You can represent those diagrams as adjacency matrices, where those structures have a particular “shape” in the entries. Which if you squint hard looks like an image completion problem, ie, finding missing part of the matrix which represents a proof.

TuringTest · on Oct 13, 2022

Who's using that topos theory, and is it well known in automated theorem proving?

I hadn't heard about it before, although I have some notions of both category theory and theorem proving.

And is your description of “complete and label the diagram” using image generation a thing that actually exists or something that has potential to be created? That could be a breakthrough in applying formal methods to real-world problems.

zmgsabst · on Oct 13, 2022

Topos theory is used by people researching foundations, eg Michael Shulman.

> And is your description of “complete and label the diagram” using image generation a thing that actually exists or something that has potential to be created?

Somewhere between — it’s a topic being researched, but results are very early (basically, just shapes and groups).

Marazan · on Oct 12, 2022

Dall-E produces good looking images within certain parameters.

When you are in its bounds it seems magical, once you go outside it seems like a weak joke.

And many of the reasons it is bad outside its sweet spot are fundamental to how it works not a flaw that can be iterated away.

stevenhuang · on Oct 13, 2022

> fundamental to how it works not a flaw that can be iterated away.

Can you elaborate?

Marazan · on Oct 13, 2022

Dalle is trying to optimise a bunch of random pixels to meet the prompt.

It has a strong sense of what 1 object is. Ask it for a photo of a kitten and you get an jaw droppingly realistic photo of a kitten.

Ask it for a picture of 6 kittens and you get, well, whatever this is: https://labs.openai.com/s/PIZJe6GCfat9soN3WOV3eC9p

7 kittens, none well defined. Because the loss function it is optimising taps out once it has got "close" and close for a multiple object prompt is _a lot_ further away than for a single subject prompt (here is the 1 kitten version https://labs.openai.com/s/1aCOUxNT19kbMZZtEBG7CoFY - this is basically witchcraft it is so good, the group shot is a joke).

I suppose you could massively reduce the loss amount you are willing to accept but that doesn't guarantee dalle with optimise the correct part of the pictures - maybe I'd have just ended up with really, really good floors.

The other thing Dalle is bad at is backgrounds, and once again this is due to "optimising an error score". https://labs.openai.com/s/U1Vo2fxThuXmQZzIwLQ4g9Ai nothing about this is right. At a superficial glance it looks like the view over a city but it's a random splatching of building cutouts and when you look at the detail of the builds they are a blur of pixels that kind of approximate doors and windows but are nothing of the sort. They are super fuzzy and dream like. Because it's trying to generate an image that looks like a cityscape from its memory of Glasgow cityscapes. THere's no coherance because it's trying to covert random pixels into a cityscape not for buildings out of components that humans know go to makeup buildings.

woah · on Oct 13, 2022

I'm not an expert on AI, but your complaints sound like minor versions of the major problems that these image generation AI's had a couple of years ago. It used to be that they could only create a mishmash of textures reminiscent of the subject and style, and struggled creating distinct objects at all.

Now, your examples simply show some slight artifacts and lack of details on specific things. You're presenting remaining shortcomings on these metrics as "fundamental to how it works not a flaw that can be iterated away", when in fact they have mostly been iterated away over the past few years.

Marazan · on Oct 14, 2022

Dalle doesn't produce any better images of people than "This Person Does Not Exist". All that's happened is it has a large corpus.

A larger corpus allows it to be more varied but it doesn't fix the fundamental limitations.

Marazan · on Oct 13, 2022

Getting 7 things when I ask for 6 is not a minor artefact!

mjburgess · on Oct 13, 2022

If you ranked its training data by similarity to your "witchcraft" images, it wouldnt seem witchraft.

Recall that it's stored a compressed version of TBs of images.

Marazan · on Oct 13, 2022

Sure, it _seems_ like withcraft. The illusion disappears when you know how it is done.

moonchrome · on Oct 13, 2022

I haven't used copilot because I'm not sure I'm allowed, but I'll try it on a personal project eventually.

I'm hoping it's not as bad as Dall-E and Stable Diffusion - I've tried to use those to generate some generic product looking stock photos for a demo and it's spectacularly bad. The only context I see it get praised is fantasy style art - and that is visually appealing nonsense by definition.

If the code generated by copilot has the same "looks convincing but is fundamentally flawed" quality then it sounds like an insidious bug generator.

mjburgess · on Oct 12, 2022

sure, but co-pilot is mostly just copying code (see, for example, the issue with it producing quake source code).

If you think of AI as a dial from sample(data) to mean(data), then as the dial is turned towards the mean() you get more "generic" results, but also more garbled ones.

Copilot is more like a search engine, having turned the dial more towards sample().

The real invention of the NN is simply to provide that dial in a trainable way.

The only change to the "state of the art" is the size of the weights, and how long they take to train. This "advancement" is no more impressive than google indexing more webpages.

There has been no step-change advancement in AI in, perhaps, 50 years. All we see today is a product of hardware, in GPU/CPUs able to compress TBs of data into c. 300GB of weights. And likewise, the internet to provide it and SSDs to hold it.

The "magic" of AI is no more the magic of wikipida, here: copilot is good only because million+ programmers made github good.

It's still little more than a fancy search.

woah · on Oct 12, 2022

> It's all just passable garbled nonesense that the reader (goes to lengths) to interept based on their prior knowledge, which is not expressed in the syntax of what these systems output.

> It's still little more than a fancy search.

I feel like the goalposts have been moved between your two comments. CoPilot is obviously not producing garbled nonsense, and it's also not just printing the top result from StackOverflow. It is producing code that references my variables, does the right thing 50% of the time, and usually compiles.

One of the nice little things is error messages- when I type `if (!foo) { throw ... ` CoPilot is able to complete a nicely formatted and descriptive error message from its understanding of my code. It's not garbled nonsense, and it's not just a search engine.

Does AI deserve the hype it sometimes gets? Not yet. But I think you're going to have to start digging a little deeper for your commentary.

_justinfunk · on Oct 12, 2022

As someone who uses co-pilot daily - it often does print garbled nonsense (semantic nonsense, not syntactic nonsense - if you get my meaning).

planetsprite · on Oct 12, 2022

Even if AI got to the point of perfectly passing every expert-level Turing test your degree of rigor as to what "thinking" is would never truly permit any belief of AI having struck the golden nugget of intelligence.

Imagine if we were all self-replicating computers, and certain members of this silicon race began experimenting with making creatures with carbon macro-molecules to create organic intelligence, you could make the same claim in the other direction:

"There has been no step-change advancement in Organic Intelligence in, perhaps, 50 years. All we see today is a product of cell count, in neurotransmitter chemistry able to compress TBs of experiences into c. 300B neurons."

visarga · on Oct 12, 2022

I think you are missing the conditional, contextual nature of language models. They mix things in coherent ways, they adapt to the request. Google doesn't create new things when they don't exist, and the pre-written code examples on the internet will never adapt to your needs.

But I agree with you that everything they do seems intelligent because 'intelligence' was in the training data. Not much different from us, if you raise a human removed from society (take his intelligent training data away) he will not accomplish almost anything on his own.

civilized · on Oct 12, 2022

I agree. It's possible to point out the clear limitations of current AI without being oblivious to the huge, indisputable advances that have occurred.

People thought it might take centuries for a computer to defeat a top human in Go. Then deep learning showed up and a few years later it's the opposite.

A lot of the things deep learning methods are doing now are things no one had any idea how long research would take to achieve, or if they were even possible.

Personally, I think we are currently hitting some walls that might take a while to climb before we get to AGI, but I am very impressed at the recent progress.

kazinator · on Oct 13, 2022

> How much of this is just "AI is bad at everything"

"AI Language Models" are not touted as some general AI that is smart at everything, like a clever person with multiple intellectual skills integrated into one.

AI language models are for modeling language, not for math problem solving, or anything else. People good at language aren't always good at math.

DeepL produces very good, correct translations for "Alice has five more balls than Bob, who has two balls after he gives four to Charlie. How many balls does Alice have?" into numerous languages, even though it doesn't offer a solution.

I have little doubt that an AI system could be trained to translate word problems like this into systems of equations, which could be dumped into a some decades-old CAS to obtain a solution, which the AI could map back into the verbal domain through the identities between the math variables like x and Alice's apples.

"Hey look, that human who is supposedly good at math can't produce a painting of the Grand Canyon in the style of Monet, even if given eight months to do it, and is easily defeated in chess."

saghm · on Oct 12, 2022

> How much of this is just "AI is bad at everything", but in the math case, it's easier for the lay person to tell

Honestly, even as someone generally pretty dismissive of the AI hype, I'm not sure you can go that far. The whole reason we have specific mathematical notation is that human languages often are not super great at dealing with it, and English in particular is pretty abysmal for being both unambiguous and precise (and I'd be surprised if language models didn't end up suffering from biases analogous to how many image recognition AI models have been found to not deal well with a diverse set of human appearances). We don't teach math the same way we teach English, and we certainly don't expect people to be experts at teaching both, so why would we expect an AI model designed for language to be able to do math?

bertday · on Oct 12, 2022

Because there is an algorithm for it. Convert the strings into floating point numbers, add them, convert them back to strings. It’s a leetcode medium question. It should be learnable.

The article talks about abstract math questions, but even arithmetic is hard for language models.

stevenhuang · on Oct 13, 2022

That even arithmetic is hard for language models may suggest language models (at least in their current form) are not the most optimum for the task.

planetsprite · on Oct 12, 2022

Language models aren't built for math. Their improvement/training cycles aren't sensitive to the exactness and rule-based nature of mathematical language, plus there are probably a lot of bad/misleading examples of math in the source data.

You'd have to be unrealistically pessimistic to call what GPT-3 and other huge language models produce "nonsense".

visarga · on Oct 12, 2022

It's not that they were not built for math, but more like verification is hard. But it's hard for humans as well. A large generative model + a fast verifier could do wonders.

AlphaGo was built on that - the model can propose moves, but you can verify who won in the end. There are some code generation models that write their own tests as well, or use externally provided tests to verify their solutions. The DeepMind matrix multiplication algorithm was also "learning from verification" of generated solutions, because it's trivial to do that. In general verification remains an open problem.

spywaregorilla · on Oct 12, 2022

I disagree. It is that they were not built for math. While brain analogies are shittier than most people assume, this is like trying to do math in your head without being allowed to think through calculations.

visarga · on Oct 12, 2022

Brains weren't built for math either, just for surviving. And the "trying to do math in your head" is true if you use naive question answering, but if you ask "step by step" or "chain of thought", or "supporting questions", any of them will allow for flexible time steps. There are some solutions called "Language Model Cascades" that compose language models calls to simulate arbitrary complex reasoning chains including recursion. There is no reason to think language models are unfit for math, they are fit for generating possible solutions that need to be verified somehow.

spywaregorilla · on Oct 13, 2022

> Brains weren't built for math either, just for surviving.

Brains were, however, built for language processing, in addition to many other tasks.

> There is no reason to think language models are unfit for math, they are fit for generating possible solutions that need to be verified somehow.

This is just a dumb idea though. Guess and check based on semantically well positioned answers in the ambiguity that is the embedding space until you find something that's not wrong is not the same thing as defining an algorithm and then executing it, which is how people do math.

Sure, you could probably get it to a pretty good working state, but it seems pretty dumb to me.

> There are some solutions called "Language Model Cascades" that compose language models calls to simulate arbitrary complex reasoning chains including recursion.

If you're creating the reasoning chains yourself, you're arguably doing the hard part for the model and giving credit to the language part. If you're able to do get the model to define the chains, then you've already solved the hard part of the problem and could likely use something very different from language models altogether to greater effect.

visarga · on Oct 15, 2022

> This is just a dumb idea though.

True, but that's how "inspiration" works in humans as well: generate stupid ideas until you stumble upon a great one.

That's why I said we only need verification. It's the artist-critic model, we got the artist we need the critic. Sometimes it's easy (in games, code, math) and other times we don't have a good way to verify.

spywaregorilla · on Oct 15, 2022

> True, but that's how "inspiration" works in humans as well: generate stupid ideas until you stumble upon a great one.

I don't agree with this, but even if it were true, "inspiration" is not how we do math.

> That's why I said we only need verification. It's the artist-critic model, we got the artist we need the critic. Sometimes it's easy (in games, code, math) and other times we don't have a good way to verify.

That's not even how the typical language model works.

dimmuborgir · on Oct 12, 2022

AI is bad at music also. Even the state of the art transformer models can't produce more than a few seconds of coherent melodic phrases.

mjburgess · on Oct 12, 2022

I think if we replaced "AI" with "taking averages over subsets of historical examples", then there'd be no mystery for when "AI" will be good or bad at anything.

Would we expect a discrete melodic structure to be expressible as averages of prior music? No.

vladf · on Oct 12, 2022

Have you heard the piano continuations of AudioLM?

https://google-research.github.io/seanet/audiolm/examples/

phillipharr1s · on Oct 12, 2022

Pretty sure the first continuation is a famous piece with a few notes messed up. Can't remember the name. Honestly it only sounds marginally better than the old markov chain continuations.

macrolocal · on Oct 12, 2022

Yep, Moonlight Sonata (mov. 3) no less. Talk about over-fitting!

vladf · on Oct 12, 2022

Isn’t that as good as it gets? The whole point of the continuations is that given a short leading prompt from a real piece that it should continue it realistically.

It didn’t get to train on the test set, if that’s what you’re implying, and I find it hard to believe the assertion that continuations are copies of the train set (if that’s your claim).

p1esk · on Oct 12, 2022

It definitely copied a piece of Moonlight sonata in the last 7 seconds of the first continuation sample: https://youtu.be/4Tr0otuiQuU?t=516

vladf · on Oct 13, 2022

Wow, good find! They definitely sound similar but it’s not a facsimile. I wonder if this holds for the other samples.

I guess in retrospect we asked it to continue the music in a likely way, not be novel. And it definitely convinced me enough to be impressive. An NN that composes completely fresh music, whatever that means (I’m sure most modern human music has a hefty dose of cross song sampling), would certainly be a good next goal post.

holub008 · on Oct 12, 2022

Interestingly, the original piece is a later Beethoven Sonata, Op. 31 No. 3. The model has its styles down! https://youtu.be/P-Q5aBAw-T4?t=78

bloep · on Oct 12, 2022

Indeed, there is lots of denial or ignorance in this thread (ignorance in the technical sense). AudioLM already produced impressive results and it's a tiny fraction of what is already possible because performance simply improves with scale. One can probably solve music generation today with a ~$1B budget for most purposes like film or game music, or personalized soundtracks. This is not science fiction.

p1esk · on Oct 12, 2022

I don't see a lot of progress in AudioLM compared to results from 2018: https://storage.googleapis.com/magentadata/papers/maestro/in...

What's more interesting and concerning - listen carefully to the first piano continuation example from AudioLM, notice the similarity of the last 7 seconds to Moonlight sonata: https://youtu.be/4Tr0otuiQuU?t=516

I'm afraid we will see a lot of this with music generation models in the near future.

bloep · on Oct 12, 2022

There are quite simple tricks to avoid repetition/copying in NNs, e.g. by (1) training a model to predict the "popularity" of the main model's outputs and penalizing popular/copied productions by backpropping through that model so as to decrease the predicted popularity, or (2) by conditioning on random inputs (LLMs can be prompted with imaginary "ID XXX" prefixes before each example to mitigate repetitions), or (3) by increasing temperature or optimizing for higher entropy. LLM outputs are already extremely diverse and verbatim copying is not a huge issue at all. The point being, all evidence points to this not being a show stopper if you massage these evolutionary methods for long enough in one or more of the various right ways.

p1esk · on Oct 12, 2022

I'm not sure what you mean by "backpropping through that model so as to decrease the predicted popularity". During training, we train a model to literally reproduce famous chunks of music exactly as they are in the training set. We can also learn to predict popularity at the same time, but we can't backpropagate anything that will reduce popularity, because this would directly contradict the main loss objective of exact reproduction.

Having said that, I think the idea of predicting popularity is good - we can use it for filtering already generated chunks during post-training evaluation phase.

I don't think the other two methods you suggest would help here, we want to generate while conditioning on famous pieces, and we don't want to increase temperature if we want to generate conservative, but still high quality pieces.

It's true that we (humans) are less sensitive to plagiarism in the text output, but even for LLMs it is a problem when it tries to generate something highly creative, such as poetry. I personally noticed multiple times a particular beautiful poetry phrases generated by GPT-2 only to google it and find out they were copied verbatim from a human poem.

bloep · on Oct 13, 2022

What I had in mind was kind of like a reward model that is trained by on longer outputs that have a very high similarity to training examples. Something similar has been done to prevent LLMs from using toxic language. You'd simply backprop through that model like in GANs. And no it does not contradict the overall training objective completely because the criterion would be long verbatim copies and it would not affect shorter copies of sound fragments and the like which you would want a music model to produce in order for it to sound realistic and natural.

p1esk · on Oct 13, 2022

Oh OK, so you mean training the model after it has already been trained on the main task, right? Like finetuning. Yes, I think the GAN-like finetuning is a good idea. Though it's less clear where the labels would come from, it seems like some sort of fingerprint would need to be computed for each generated sequence, and this fingerprint would need to be compared against a database of fingerprints for every sequence in the training set. This could be a huge database.

bloep · on Oct 14, 2022

You'd need something Spotify.

Another similar possibility might be to do more RL with this data, e.g. using upside-down RL. One can possibly steer this with user feedback as well.

sdenton4 · on Oct 13, 2022

Meanwhile, the music industry is full of copyright cases brought over matching combinatorial fragments... Humans have the same problem in this case.

stephencanon · on Oct 12, 2022

Which is extra funny, because GOFAI models (e.g. David Cope's work) were doing a pretty OK job back in the 1990s!

denton-scratch · on Oct 12, 2022

It doesn't surprise me that an AI model for language can't grok maths or music. I can't see how a language model can map to maths. Hell, I don't even know how to describe music in words. It's possible to articulate some maths in words, but that often involves using words with unexpected definitions.

aaroninsf · on Oct 12, 2022

AI can be quite good at music,

but yes there is not yet at on-demand button rendering from a text prompt of bitstreams encoding composed performed and mastered music.

CactusOnFire · on Oct 12, 2022

AI is bad at Audio. AI can do MIDI fine.

dwringer · on Oct 12, 2022

MIDI is extraordinarily expressive and is likely used to sequence a large majority of music produced within the last three decades. A lot of the instruments you hear are synthesizers or samplers running directly from MIDI. There is a lot more to what MIDI can do, and is used for, than the conception most people have from "canyon.mid" or old website background music. If an AI can do MIDI just fine then it's an extremely small leap to doing audio just fine.

p1esk · on Oct 12, 2022

If an AI can do MIDI just fine then it's an extremely small leap to doing audio just fine.

Unfortunately this is not true. It takes a huge amount of human effort to make MIDI encoded music sound good. The difference between MIDI and raw audio music generation is the same as the difference between drawing a cartoon and producing a photograph.

To clarify, yes MIDI can be expressive, but what's being generated when people say "AI generates MIDI music" is basically a piano roll.

dwringer · on Oct 13, 2022

I'm not familiar enough with existing implementations of such systems to dispute it, but there's no fundamental reason algorithmic composition systems could not include modulation parameters of all kinds (pitch/breath/effects/synthesizer controls/etc) in their output. I am envisioning a DAW set up with several VST's and samplers with routing and effects in place, then using some combination of genetic algorithms and other methods to "tweak the knobs" in the search for something pleasing.

The search space is absolutely enormous, though, so I don't dispute that it's very difficult, but I wouldn't go so far as to say that it can't be done. In such a space there are "no wrong answers" so to speak. I have a python script which creates randomized sequences of notes/rhythm and gives each one a different combination of LP/HP filters and random envelopes - it's not music but it takes on a much less mechanical quality by emulating different attacks and timbres over time, even though it's completely random.

I would go so far as to say I'd be genuinely surprised if algorithmic composition and production hasn't been used to some extent significantly greater than "basically a piano roll" in at least some of the past decade's top 40 music on the radio.

p1esk · on Oct 13, 2022

there's no fundamental reason algorithmic composition systems could not include modulation parameters of all kinds (pitch/breath/effects/synthesizer controls/etc) in their output

There is such a reason - lack of training data. Very few high quality detailed MIDI samples exist to train machine learning models like AudioLM.

For state of the art in MIDI generation, take a look at what https://aiva.ai/ produces (it's free for personal use). There you can compare raw MIDI output to an automatically generated mp3 output (using "VST's and samplers with routing and effects in place, then using some combination of genetic algorithms and other methods to "tweak the knobs" in the search for something pleasing.")

mp3 version will sound much better than raw MIDI, but (usually) significantly worse than music recorded in a studio and arranged/processed by a human.

NateEag · on Oct 13, 2022

thank you for clarifying this.

As a clasically-trained pianist who then got into electronica and synthesis, it was mind blowing to me that people could wrangle expression and phrasing from a MIDI sequencer.

causi · on Oct 12, 2022

Which is a real shame. AI-powered restoration of poor-quality audio would be highly useful.

aaroninsf · on Oct 12, 2022

That particular niche has had some pretty amazing successes already. It's coming.

We can't produce arbitrary media streams with many "stack layers" of meaning and detail yet, but we can do a lot of specific instrumental transformations...

Vaguely relevant: https://koe.ai/recast/

Der_Einzige · on Oct 12, 2022

That's wrong, and shows how ignorant you are of SOTA techniques for music generation. They are far ahead of that.

yeasurebut · on Oct 12, 2022

That’s what a musician does. They make short loops and loop them.

This reads like someone who knows sheet music and theory but does not listen to music. It’s repetition of short phrases over and over.

I’m not really sure what people expect of general AI trained on human generated outputs. It can’t make up anything anything “net new” only compose based upon what we feed it.

I like to think AI is just showing us how simple minded we really are and how our habit of sharing vain fairy tales about history makes us believe we’re masters of the universe.

dimmuborgir · on Oct 12, 2022

Those models are not trained on short loops. They are trained on whole songs just like image generation models are trained on whole images. And yet they struggle to repeat sections, modulate to a different key, create bridges, intros and outros. After a few seconds of hallucinating a melodic line they simply abandon the idea and migrate to another one. There is no global structure whatsoever.

efishnc · on Oct 13, 2022

Maybe that's the problem.

We're trying to train a full composer AI without allowing to learn about different instrument sections independently at first. The human composer will have a good idea of the different parts and know how to merge them in harmony.

I think we might get better results training separate AI systems on percussions, strings, vocals etc. then somehow create connections between them so they learn together. A band AI if you will.

We could try a BERT for each, with the generator learning to output logical sequences of sounds instead of words.

yeasurebut · on Oct 12, 2022

Musicians don’t spit out an album in one sitting and they’re highly trained in theory. They get bored and tired of a process and take breaks. They come up with an album of loops composed together over time.

AIs state will forever be constrained to the limits of human cognition and behavior as that’s what it’s trained on.

I read published research all year. Circular reasoning. Tautology. It’s all over PhD thesis.

There’s no “global structure” to humanity. Relativity is a bitch.

Seeing the world through the vacuum of embedded inner monologue ignores the constraints of the physical one. It’s exhausting dealing with the mentality some clean room idea we imagine in a hammock can actually exist in a universe being ripped asunder by entropy.

It’s living in memory of what we were sold; some ideal state. Very akin to religious and nation state idealism.

mjburgess · on Oct 12, 2022

I think it's deeply depressing that AI has been sold as something even capable of modelling anything humans do; and quite depressing that this comment exists.

"AI" is just taking `mean()` over our choice of encodings of our choice of measurements of our selection of things we've created.

There is as much "alike humans" in patterns in tree bark.

AI is an embarrassingly dumb procedure, incapable of the most basic homology with anything any animal has ever done; us especially.

We are embedded in our environments, on which we act, and which act on us. In doing so we physically grow, mould our structure and that of our environment, and develop sensory-motor conceptualisations of the world. Everything we do, every act of the imagination or of movement of our limbs, is preconditioned-on and symptomatic-of our profound understanding of the world and how we are in it.

The idea that `mean(424,34324,223123,3424,....)` even has any revelance to us at all is quite absurd. The idea that such a thing might sound pleasant thru' a speaker, irrelevant.

This is a product of i dont know what. On the optimist side, a cultish desire to see Science produce a new utopia. On the pessimisst side, a likewise delusional desire to see Humans as dumb machines.

What a sad state!

pessimizer · on Oct 12, 2022

I lack your confidence, and find it a bit religious.

> The idea that `mean(424,34324,223123,3424,....)` even has any revelance to us at all is quite absurd.

Most of what I say to anyone is exactly this.

When I'm about to give anyone any information, I look back at all of the relevant past information that I can recall (through word and sensory association, not by logic, unless I have a recollection of an associated internal or external dialog that also used logical rules.) I multiply those by strength of recollection and similarity of situation (e.g. can I create a metaphor for the current situation from the recalled one?). I take the mean, then I share it, along with caveats about the aforementioned strength of recollection and similarity of situation.

This is what it feels like I actually do. Any of these steps can be either taken consciously or by reflex. It's not hidden.

> I think it's deeply depressing that AI has been sold as something even capable of modelling anything humans do

This is a bizarre position. All computers ever do is model things that humans do. All a computer consists of is a receptacle for placing human will that will continue to apply that will after the human is removed. They are a way of crystallizing will in a way that you can sustain it with things (like electricity) other than the particular combination of air, water, food, space, pressure, temperature, etc. that is a person. An overflow drain is a computer that models the human will. An automatic switch/regulator is the basic electrical model of human will, and a computer is just a bunch of those stitched together in a complementary way.

mjburgess · on Oct 12, 2022

You're an animal. You've no idea what you do, and you're using machines as a model. Likewise, in the 16th C. it was brass cogs; and in anchient greece, air/fire/etc.

You're no more made of clay & god's breath, as you are sand and electricy.

You're an oozing, growing, malluable organic organism being physiologically dynamically shaped by your sensory-motor oozing. You're a mystery to yourself, and these self-reports, heavily coloured by the in-vogue tech are not science, they're pseudoscience.

If you want to study how animals work, you'd need to study that. Not these impoverished metaphors that mystify both machines and men. No machine has ever acquired a concept through sensory-motor action, nor used one to imagine, nor thereby planned its actions. No machine is ever at play, nor has grown its muscles to be better at-play. No machine has, therefore, learned to play the piano. No machine has thought about food, because no machine has been hungry; no machine has cared, nor been motivated to care by a harsh environment.

An inorganic mechanism is nothing at all like an animal, and an algorithm over a discrete sequence of numbers with electronic semantics, is nothing like tissue development.

What you are doing is not something you can introspect. And you arent really doing that. Rather, you've learned a "way of speaking" about machine action and are back-projecting that onto yourself. In this way, you're obliterating 95% of the things you are.

pessimizer · on Oct 13, 2022

This isn't really responsive. Not only am I not using machines as any sort of model for human behavior, I'm trying to think about weird things you could do to a machine to make it ape a human.

> these self-reports, heavily coloured by the in-vogue tech are not science, they're pseudoscience.

I simply don't know what you're referring to. If you're referring to retrieving memories through associations, there's mountains of empirical evidence for that. If you're referring to wondering if I remember things, and being unsure of the information I'm recalling when I have less recall of that, or wondering if past situations compare well to current situations, well you got me. It's my personal belief that conscious thought is an epiphenomenon that is a rationalization of decisions already made.

But the rest of this is nonsense. Vivid imagery is not an argument for exceptionalism, no matter how much I say things drip or ooze. This is just association in action. You're trying to create a distinction for life (or rather what you recognize as life) life oozes and has viscera, so using a bunch of words that feel wet and organy can substitute for reason contra the robots.

1024core · on Oct 13, 2022

See the "joke explaining" examples on Page 36 of the paper: https://arxiv.org/abs/2204.02311 and tell me if it's just "passable garbled nonsense".

mjburgess · on Oct 13, 2022

> 500 bn parameter model

That solution has a compressed representation of half the internet.

NNs are "garabled nonesense" insofar as they try to generalise; insofar as they are search engines, they provide apparent sense by just repeating something in their database (= weights).

Google, likewise, returns joke explanations.

1024core · on Oct 15, 2022

"The Internet" is much, much bigger than 500bn floats. The latest snapshot of archive.org has 505 billion web pages itself.

riskneutral · on Oct 13, 2022

That's astonishing.

stevenhuang · on Oct 13, 2022

Oo this reminds me. One of my favourite sci-fi novels is The Moon is a Harsh Mistress.

In it, it depicts the growth of a nascent AI from its attempts at understanding humor. The AI befriends a technician and gets the human to rate its own crafted jokes.

Eventually the AI gets really good at telling jokes, and becomes sentient as a result.

It was a very fun take on AI gaining sentience, highly recommended!

edouard-harris · on Oct 13, 2022

This might make sense as a response solely to the title of the article, but I have to admit I find it puzzling as a reaction to its content. Notwithstanding the title, the article mentions a model called Minerva that scored fully 50% on the MATH dataset of high-school/undergrad mathematical problems. For comparison, a human computer science PhD student scored 40%. [1]

For context, Minerva came out this July. When it was tested on a national math exam, it scored higher than that year's class of graduating high school seniors. [2] A mere eight months (!) earlier, OpenAI had announced [3] they'd trained a language model that solved math word problems almost as well as an average middle-schooler. So even if you believe — rightly or wrongly — that current capabilities aren't very impressive, it's worth remembering that your understanding of current capabilities might not be entirely accurate, even if it's only a few months out of date.

Incidentally, it may be worth looking at some examples of these models' outputs before deciding what they can or can't do. Here's Minerva solving some math problems, for example:

https://dl.airtable.com/.attachmentThumbnails/32f10fc59039c7...

https://dl.airtable.com/.attachmentThumbnails/f6f86e0edd14d1...

https://dl.airtable.com/.attachmentThumbnails/d9bc743dca1015...

https://dl.airtable.com/.attachmentThumbnails/51562ef70c2b14...

https://dl.airtable.com/.attachmentThumbnails/c081906cc4d23b...

I'll admit I find it challenging to interpret these results as "passable garbled nonesense [sic]", though perhaps I'm not being demanding enough. At any rate, when these models go from beating 10-year olds at math to beating 18-year olds at math in the span of 8 months, one does start to wonder how much of the hype is really due to over-interpretation — and what the next 8 months have in store.

========================

[1] https://arxiv.org/pdf/1709.04326.pdf (For comparison, a 3-time Mathematics Olympiad gold-medalist scored 90%.)

[2] This was Poland's National Math Exam. Minvera scored 65%; the 2021 national average was 57%. See https://www.aitracker.org/ and Section 3 of the Minerva paper at https://arxiv.org/pdf/2206.14858.pdf

[3] https://openai.com/blog/grade-school-math/

PaulHoule · on Oct 12, 2022

Ashby strikes again.

Current sequence models don't have the right structures to represent math. Even if they use floating point internally, they can't really float the point because the nonlinearity in the model has a certain scale.

A system that processes language can take advantage of the human desire for closure

https://www.eurogamer.net/blood-in-the-gutter

to fool people into thinking it is more capable than it really is. Math isn't like that.

lupire · on Oct 12, 2022

Floating point isn't relevant here.

The problem is that human language is approximate and correct math is not, so pattern matching on prose text is doomed. AI trained on exact math does a lot better. But that's not fully generic so fails the weird GPT goal of modeling all of human intelligence through prose. That's not how people solve math at all.

GPT's "Superficially plausible but wrong" math is actually pretty good match for non-expert bad-at-math average human behavior.

zozbot234 · on Oct 12, 2022

> GPT's "Superficially plausible but wrong" math is actually pretty good match for non-expert bad-at-math average human behavior.

Relevant blog post: https://www.greaterwrong.com/posts/YhgjmCxcQXixStWMC/artific... "The best experts in the field estimate it will be at least a hundred years before calculators can add as well as a human twelve-year-old."

PaulHoule · on Oct 12, 2022

I like Yudkovsky parodying himself there although I still don't know if he has a sense of humor or not.

hey_over_here · on Oct 12, 2022

Mwell, the article claims, and points to work that also claims, that large language models can actually be made to perform arithmetic well. They need fine-tuning, verification, chain of thought prompting and majority voting to be combined but the linked Google blog says that Minerva hit 78.5% accuracy (on the GSM8K benchmark).

For me the problem is that we can look at the output and say if it's right or wrong, but we know what language models do, internally: they predict the next token in a sequence. And we know that this is no way to do arithmetic, in the long run, even though it might well work over finite domains.

Which is to say, I'm just as skeptical as you are, and probably even more, but I think it's useful to separate the claim from what has actually been demonstrated. Google claims its Minerva model is "solving maths problems" but what it's really doing is predicting solutions to problems like the ones it's been fine-tuned on, and those problems are problems stated at least partly in natural language, not "naked" arithmetic operations. In the latter, language models are still crap because they can't use the context of the natural language problem statement to help them predict the solution.

Btw, "chain of thought prompting" if I remember correctly is a process by which an experimenter prompts the language model with a sequence of intermediary problems. So it's not so much the model's chain of thought, as the experimenter's chain of thought and the experimenter is asking the model to help him or her complete their chain of thought. I have a fuzzy recollection of that though.

sharemywin · on Oct 12, 2022

computers already do math. language models just need to translate problems into code of some kind that can be run to get the answer.

executive function/planning is probably the biggest problem at this point for ai.

zackmorris · on Oct 12, 2022

That's interesting, I hadn't made the connection between executive function and intelligence.

I went through a burnout in 2019 that felt like having a stroke. My brain finally reached such a level of negative reinforcement after years of failure that it wouldn't let me work anymore. I'd go to do very simple tasks, everything from brushing my teath to writing a TODO list, and it was like the part of my brain that performed those tasks wasn't there anymore. Or at least, it no longer obeyed if it perceived a potential reward involved. It was like my motivation got reversed. I had to relearn how to do everything, despite knowing that no reward might come for a very long time, which took at least 6 months before I began recovering. The closest answer I have is that my brain healed through faith.

I only bring it up because executive function may be associated with a subjective experience of meaning. If there's truly no point to anything, then it's hard to summon the motivation to string together a sequence of AI tasks into something more like AGI.

I guess that's another way of saying that nihilism could be the final hurdle for AGI to overcome. It's like the human philosophical question of why there's something instead of nothing. Or why angels would choose to be incarnate on Earth to experience a life of suffering when it's so much easier to remain dissociated.

sharemywin · on Oct 12, 2022

The point I'm trying to make is LLMs don't need to do everything just be the glue to other systems.

enord · on Oct 12, 2022

Wait what? Glue as in extract high level semantic representations from _syntatic probabilities_ and pass on to appropriate domain specific tools?

This is the glaring hole in LLMs, a paradoxical semantic incoherence despite impressive sentenial and gramatical coherence.

As glue it is so thin as to be potable.

ricopags · on Oct 13, 2022

Quoting this tweet[0]:

"Here's a brief glimpse of our INCREDIBLE near future.

GPT-3 armed with a Python interpreter can · do exact math · make API requests · answer in unprecedented ways"

[0]https://twitter.com/sergeykarayev/status/1569377881440276481

enord · on Oct 13, 2022

All that remains is a small matter of programming…

sharemywin · on Oct 13, 2022

Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents

https://wenlong.page/language-planner/

sharemywin · on Oct 13, 2022

you would still need an executive it would be more like a universal translator.

enord · on Oct 13, 2022

Translate to what? The next likely string of characters? How would this executive even interact with it? Sibling comment of yours mentioned extracting low level steps from high level tasks but it needed another language model (no kidding!) to map to the «most likely» of the admissable actions. I mean, this shit is half baked even in theory.

PaulHoule · on Oct 12, 2022

That's not a bad approach, necessarily.

There is a fairly simple program in

https://www.amazon.com/Paradigms-Artificial-Intelligence-Pro...

that solves word problems using the methods of the old AI. The point is that is is efficient and effective to use real math operators and not expect to fit numbers through the mysterious bottleneck of neural encoding.

the_af · on Oct 12, 2022

> language models just need to translate problems into code of some kind that can be run to get the answer

A huge "just"! Isn't this the magic step? Translating ambiguous symbols to meaning and combining them in meaningful ways is a big deal which, apparently, these AI models cannot do. They can just parrot things.

nl · on Oct 13, 2022

> Translating ambiguous symbols to meaning and combining them in meaningful ways is a big deal which, apparently, these AI models cannot do.

Plenty of AI models do exactly this. Very clear examples include question answering models and code generation. In both cases novel, meaningful responses are generated.

> They can just parrot things.

That isn't true. While language models can parrot things it is generally special conditions that make them do it. Specifically, the conditional probability of the next character (or BPE or word depending on the model) has to be much higher than anything else which happens when the thing being parroted is unique text.

If you ask most Americans or a language model what word comes next in this: fourscore and seven year.. they'll give the same answer, for the same reason.

the_af · on Oct 13, 2022

So is in your opinion General AI solved? Because reliably turning symbols into meaning, outside narrow or special cases, is General AI.

In my opinion, it's not solved. GPT-3 is not General AI, it's a more clever mechanism for parroting back text it cannot truly understand. Comparisons to ways humans confuse themselves are a red herring in my opinion: the old ELIZA program could reply like a very confused or trollish human would, but nobody would argue ELIZA was a general AI.

It's just that GPT is a fascinating and more convincing illusion than ELIZA. Unlike ELIZA, it can also be used for meaningful purposes.

nl · on Oct 15, 2022

I don't think "general intelligence" is a bright-line, but instead is a continuum, and I don't agree with your definition (although I appreciate you do at least give a definition).

I think that in general most human decision making is just pattern matching (plenty of evidence for this - read "thinking fast and slow" for an overview).

I think the extrapolation that ML models can do is a form of intelligence. I also think that the compression and encoding of inputs into a lower dimensional space is exactly the "turning symbols into meaning" that you call for in your definition.

the_af · on Oct 15, 2022

Thanks for the conversation.

We obviously disagree. I don't think we are near general AI (and yes, I know the objection that everyone who says this is simply moving the goalposts-- regardless, I'm unconvinced). I think GPT et al are very interesting tricks, but still not general AI; and that the path to it doesn't lie in this direction.

I subscribe to the view we think of the human mind as a pattern matching computer simply because this is the current major tech, much like people in the past thought of "humors" or "steam machines". I think some of the analogies are useful, to a point, but I don't think there's hard evidence the mind is like a neural net (irony notwithstanding) or a pattern marching GPT-like algorithm.

Re: Thinking Fast and Slow, I see there are serious doubts about the validity of the book's foundations and conclusions, and that it's been challenged.

nl · on Oct 16, 2022

> I don't think we are near general AI

Nor do I. But I don't agree with your definition of general AI at all.

I think by your definition we are well on the way towards it. I'd note that you didn't address the idea that compression of input concepts into a lower dimensional space is exactly the "reliably turning symbols into meaning" idea you suggest.

The fact the latent representations of concepts can be manipulated in ways that make logical sense is a good indication that the symbols have meaning. The classic Word2Vec experiments showing how the relationships Paris->France ~= London->England and King - Man + Woman = Queen show this well. Modern large language models are much more complicated of course, but the principles remain.

> I subscribe to the view we think of the human mind as a pattern matching computer simply because this is the current major tech, much like people in the past thought of "humors" or "steam machines". I think some of the analogies are useful, to a point, but I don't think there's hard evidence the mind is like a neural net (irony notwithstanding) or a pattern marching GPT-like algorithm.

I don't think the computational approach is an interesting question. No one thinks brains operate like software neural networks (not sure what "pattern marching GPT-like algorithm" means - it is just a neural network). That doesn't matter because all computational methods are ultimately equivalent.

I think outcomes on metrics like benchmarks is important, and while some benchmarks have issues that some approaches exploit I think things like Chollet's "On the Measure of Intelligence" (https://arxiv.org/abs/1911.01547) are reasonable frameworks for discussion of progress.

> Re: Thinking Fast and Slow, I see there are serious doubts about the validity of the book's foundations and conclusions, and that it's been challenged.

This in inaccurate. There are some reasonable criticisms of the studies on the priming affect (one chapter of the book). These and Kahneman's response (he accepts the criticisms) are outlined in https://replicationindex.com/category/thinking-fast-and-slow... and https://retractionwatch.com/2017/02/20/placed-much-faith-und...

These issues don't detract from the overall theme of the book about the two systems of decision making (rational and reflex) and how often we use the reflex decision system but convince ourselves we are using the rational system.

If you can point at any additional doubts about the books conclusions I'd appreciate a reference.

the_af · on Oct 16, 2022

> I don't think the computational approach is an interesting question.

Great, so we agree then!

> No one thinks brains operate like software neural networks

A bold claim.

> (not sure what "pattern marching GPT-like algorithm" means - it is just a neural network).

Yet you understood me perfectly.

Anyway, you are more well read than me on this subject, yet we both agree we are not near general AI.

nl · on Oct 17, 2022

>> I don't think the computational approach is an interesting question.

> Great, so we agree then!

Maybe? I think all forms of computation are equivalent, so if the brain is implemented in the same way as a neural network is uninteresting. They can do the same thing (which is interesting).

> yet we both agree we are not near general AI.

Sure. My disagreement is with the idea with your original statement:

"Translating ambiguous symbols to meaning and combining them in meaningful ways is a big deal which, apparently, these AI models cannot do. They can just parrot things."

Modern AI systems can do this, and do not just parrot things.

They are capable of novel outputs.

This is because the models have sufficient "understanding" to manipulate "things" (latent vector representations, which you call symbols) to output novel but meaningful outputs.

gamegoblin · on Oct 12, 2022

It’s already being done and will only get better: https://twitter.com/sergeykarayev/status/1569377881440276481

the_af · on Oct 12, 2022

I suspect it's not solved, because solving this (beyond some trick/toy examples) is essentially solving General AI.

gamegoblin · on Oct 12, 2022

It's unclear what you mean by "solved". Even a human can't turn every arbitrary problem into code to solve, but we still consider humans "generally intelligent".

GPT3 can't turn as many problems into code as I can, but it can do some, and GPT4 (or whatever) will be able to do more, etc.

the_af · on Oct 13, 2022

So is in your opinion General AI already working, only it cannot solve "as many problems"?

JacobiX · on Oct 12, 2022

I'm not so sure about that. Of course computers can do arithmetic operations, but this is not the same as solving math problems, proving theorems, etc. Even mathematical objects are approximated up to an approximation error in a computer (like a differentiable manifold or a real number).

thwayunion · on Oct 12, 2022

> Of course computers can do arithmetic operations, but this is not the same as solving math problems, proving theorems, etc.

Computers can solve math problems and prove theorems; this remains a significant subfield of Computer Science with lots of industrial use cases. However, pure machine learning based approaches toward these problems remain subpar.

> Even mathematical objects are approximated up to an approximation error in a computer (like a differentiable manifold or a real number).

Only because it caught on (and in the case of non-computationally-intensive applications, for purely historical reasons). For example, Mathematica has Reals and even functionality for Reals that is literally impossible to implement for integers [1,2]. There are also precise characterizations of objects in differential geometry [3]. You could imagine applying LLMs to these types of programs a la Copilot, but when you do this you will find yourself agreeing with Paul Houle's observation that math is harder to fake than eg art, language, or even glue code for web apps.

[1] https://reference.wolfram.com/language/ref/Reduce.html

[2] https://en.wikipedia.org/wiki/G%C3%B6del%27s_incompleteness_...

[3] https://github.com/bollu/diffgeo

the_af · on Oct 12, 2022

> Computers can solve math problems and prove theorems

But the specification of the problem must be done by a human, translating to a formalized system that the software can understand. And if there's a problem in the formal specification, it's mostly up to the human to notice and fix; the computer will happily output garbage or crash or enter an infinite loop.

So it seems this translation, going from an exploration of the problem statement, usually in ambiguous terms, to a formal specification, and the awareness to possibly detect whether the answers make sense and the specs were right, is uniquely human.

PaulHoule · on Oct 12, 2022

There has been big progress in automated theorem proving lately

https://en.wikipedia.org/wiki/Automated_theorem_proving

you just don't hear about it much because the technology is not so fashionable today. Also it is more clear what the limits are, I mean, Turing, Godel, Tarski and all of those apply to neural networks as well any other formal system but people mostly forget it.

Knuth wrote a really fun volume of The Art of Computer Programming about advances in SAT solvers which are the foundation for theorem provers

https://www.amazon.com/Art-Computer-Programming-Fascicle-Sat...

Everybody is aware that neural network techniques have improved drastically in performance, it's much more obscure that the toolbox of symbolic A.I. has improved greatly. Back in the 1980s production rules engines struggled to handle 10,000 rules, now Drools can handle 1,000,000+ rules with no problems.

sva_ · on Oct 12, 2022

> There has been big progress in automated theorem proving lately

It doesn't seem like there has been much progress for anything but FOL?

thwayunion · on Oct 12, 2022

The wiki article on automated theorem proving is quite bad as an overview of the active field; it's more a historical article about the mid to late 20th century. Most of the interesting things in automated reasoning have happened since the naughts, and that article kind of stops in the 90s

SMT solvers have gotten quite good over the past couple decades, there are tons of domain-specific tools (eg in software and hardware verification), tons of niche applied decidable or semi-decidable theories (eg various modal and description logics), a lot of progress on the proof assistant ("non-fully-automated theorem proving") paradigm, and so on.

PaulHoule · on Oct 12, 2022

It's clear that commonsense reasoning needs to deal with modals, counterfactuals, defaults, temporal logic, etc.

It's not hard to add some extensions to logic for a particular application but a very hard problem to develop a general purpose extended logic.

I look at the logic-adjacent production rules systems which never really standardized some of the commonly necessary things such as agendas, priorities, defaults, etc.

IshKebab · on Oct 12, 2022

Computers are much much better at all that stuff than almost everyone too. Try asking Wolfram Alpha to solve something. Computers have gotten really good at proving things in the last couple of decades and formal verification methods are becoming increasingly popular.

I think sharemywin is probably on to something. It's going to be really hard for an AI to prove that e.g. x>0 && x+y <= 1 && y>1 is unsatisfiable, but it's trivial for an SMT solver. On the other hand it probably isn't that much of a leap to make an AI that can feed that problem into an SMT solver.

sharemywin · on Oct 12, 2022

Online Integral Calculator Solve integrals with Wolfram|Alpha

https://www.wolframalpha.com/calculators/integral-calculator...

sva_ · on Oct 12, 2022

Now try to make a computer prove that there are no natural numbers a,b,c; so that a^n + b^n = c^n for any n > 2.

sharemywin · on Oct 12, 2022

I guess it depends on the outcome your worried about. Super intelligence or machines that replace the average office worker.

Sharlin · on Oct 12, 2022

Shifting the goal posts a bit, aren't we?

Sharlin · on Oct 12, 2022

Well, you don't need anything else than basic arithmetic to encode the entirety of, say, ZFC, enumerate every proposition in it, and halt iff you find a proof of whatever theorem you're after. It just might take a while…

casey2 · on Oct 12, 2022

Counterexample: Shalosh B. Ekhad is a computer who is also a mathematician.

pessimizer · on Oct 12, 2022

That's because they're not modelling anything. The shocking thing about current AI models is that just sort of repeating and copying from memory what you've heard and seen gets you 97% of the way to imitating a person.* They still need to generate actual models somewhere to create consistency; so many generated images with one eye completely different from the other, or three arms, or fingers that grow into their cellphones.

If you solve this, you've probably solved almost anything in the simulation field. I have no confidence that the solution will even be complicated. Information consumed needs to be used to add to some sort of model, and that model always needs to be used as part of input. The complicated part would be to make that base model able to modify itself reasonably based on input, to tolerate constant inconsistency, and to constantly refine itself towards consistency i.e. ruminate.

I think a huge difference (which I think was approached through theories of embodied cognition) is that people start with a model (or the ability to create a model) of themselves. We can apply that model to other things and use it both to change how we ourselves behave, and how we speculate about the invisible states of other things. It's not for nothing that we can (and must) anthropomorphize anything.

-----

* Which was huge towards the confirmation of my belief that this is all people do 97% of the time.

mgraczyk · on Oct 12, 2022

This is factually wrong, both in terms of quantity and quality.

Current AI models are not "just sort of repeating and copying from memory". This is just an incorrect characterization of how they work and how they perform.

AI skeptics often say things like this then backpedal with something like "Well they aren't really repeating what they heard, but their generative model is just a slightly more sophisticated version of repeating what they've heard." But this weaker claim is also true of humans. It's certainly the case that >97% percent of what humans say is "just repeating and copying" in the same sense.

pessimizer · on Oct 12, 2022

> Current AI models are not "just sort of repeating and copying from memory". This is just an incorrect characterization of how they work and how they perform.

You say this, but don't explain how. Because this is exactly what they are doing.

> AI skeptics often say things like this

I'm not really an AI skeptic. I think that we're very close to AI being indistinguishable from people. There are clearly problems that need to be solved, but I think the hardest problem was accepting the fact that humans are largely just copying and realizing that would be enough to get you 97% of the way there, especially if you gave a machine far more to copy than a human could consume.

> then backpedal with something like "Well they aren't really repeating what they heard, but their generative model is just a slightly more sophisticated version of repeating what they've heard." But this weaker claim is also true of humans. It's certainly the case that >97% percent of what humans say is "just repeating and copying" in the same sense.

Maybe I'm not expressing myself clearly, but it seems that you're just repeating my comment with a sneer. Agreeing angrily?

mgraczyk · on Oct 12, 2022

I'm disagreeing with the language you are using to characterize models. "copying from memory" implies that there is something being copied, and a memory that you are copying it from. I am pointing out that LLMs do not do this. It's not how they work.

If you polled 1M random English speakers randomly and asked them whether or not a system that "just sort of repeating and copying from memory" could produce completely novel answers in response to completely novel questions, I suspect that the overwhelming majority would respond by saying no.

Similarly if you asked 1000 people working on LLMs whether they work by "copying from memory", I suspect nearly all would say no. It would be accurate to say they are "generating text via a probabilistic model of language, which is encoded in the weights of a neural network", but there really is just no sense in which the models are "copying" anything.

That being said, these models do "copy" some text in the sense that they can reconstruct some strings from their training input. For example every LLM I have played with can recite the first few paragraphs of A Tale of Two Cities verbatim. But that's a capability they have _in spite of_ their actual design, not because of it.

pessimizer · on Oct 12, 2022

> I'm disagreeing with the language you are using to characterize models. "copying from memory" implies that there is something being copied, and a memory that you are copying it from. I am pointing out that LLMs do not do this. It's not how they work.

Then we're arguing about the semantics of the word "copy." That is not an interesting argument when you know exactly what I mean and can express it clearly.

edit: If it helps, either substitute your description in whenever I say 'pretty much copy' or change the word "copy" to whatever word you want to use. But even though I can't reproduce the opening paragraph to A Tale of Two Cities verbatim, I can certainly write something that is "copying" it without doing that, and anyone who was familiar with the book and read my paragraph would agree with me.

mgraczyk · on Oct 12, 2022

It is semantics, but that was your whole point no?

> That's because they're not modelling anything

If we agree on "how LLMs work", then how can you claim that they aren't modeling anything? They are modeling language, and while it's unlikely current paradigms will be proving new mathematical truths, it's completely plausible to me that bigger models will be able to handle simple math word problems like those in the article, precisely because LLMs can model the "Alice", "Apple", and "Bob" entities.

pessimizer · on Oct 13, 2022

I disagree that they are modeling language.* I think that not only bigger models but same-sized or much smaller models will be able to handle arbitrarily complicated word problems if they're eventually supplemented with some explicit model-building process.

-----

* ...and that would be a completely semantic argument to have. I don't care whether it's called modeling, other than the fact that when I'm talking about modeling, I'm not talking about language probability, I'm talking about categories. But discussing what current AI is (a language model, copying?) is a waste of time, because I absolutely agree with your description of how it works, so we're talking about exactly the same thing.

smeagull · on Oct 13, 2022

The memory is the model. Having high probability to a particular area in latent space is copying it from the memory of the training data.

You're a massive pedant, and this style of comment is a large reason HN comment sections are a pain to read.

mgraczyk · on Oct 13, 2022

What is the difference between this and describing a human brain the same way? The brain is the model, you are "just" copying things from the memory of your brain to words that you speak or write?

I don't think it's pedantic to say that an argument is wrong because it's making an incorrect claim. The claim here is that there is something different or missing between a true "model" and LLMs, and that missing thing has something to do with "copying". But that's not true, the missing thing is the complexity of the table, or the size of the table. The fact that it's copying in some incredibly abstract sense doesn't matter.

8n4vidtmkvmk · on Oct 13, 2022

i think humans are just copying every time we give an answer without much thought. we're either regurgitating something we previously thought/solved or something we heard somewhere.

when you have to put your head down and actually think for awhile, then maybe you're doing something new or at least not within your brain-training data. I don't think AI can do this yet. it can only copy pieces of its training data out to look like something new, but it isn't really. like when i was describing a video game idea i had to a friend and he called me out on just stealing bits of other games and mashing them together. he was right. it wasn't original. and this is all the AIs can do right now.

sebastialonso · on Oct 12, 2022

can you actually share what "current AI models" are then? Not trying to be rude, but you just said "na ah" and then refused to argument any position.

mgraczyk · on Oct 12, 2022

Current LLMs are "modeling" something according to pretty much any sense of the word "model".

In the technical, computational linguistics sense, LLMs are language models that give a conditional posterior distribution over sentences. Given some (constrained) context, the model tells you the posterior distribution over sentences in or around that context.

In the nontechnical, layman sense of the word, they are a system that is used as an example of language. LLMs imitate language by generating new sentences. They are a "model" in the same way that an architectural model is a model, or in the same way that a statue is a model of a human.

The other point I disagreed with is the characterization that LLMs "just sort of repeat and copy from memory". I went into more detail about that in other replies.

sebastialonso · on Oct 24, 2022

Great summary, thanks!

kcexn · on Oct 12, 2022

A more layman way to describe it that avoids too much over simplification is that these learning models try to group things and apply probabilities to sequences of groupings.

E.g.; A word is a grouping of letters, try to find the sequences of letters with the highest probabilities.

A phrase is a grouping of words, with punctuation marks. Try to find the sequences of words with the highest probabilities.

A sentence is a grouping of phrases. Try to find the highest probability sequences.

A paragraph is a sequence of sentences. And so on and so on.

Within very narrow domains (specific writing styles, say technical or legal writing), these models can be very accurate, since the sequencing of words into phrases, and phrases into sentences, and sentences into paragraphs etc., are very predictable. People call this kind of predictable sequencing a 'style', and it aids us in understanding text more quickly. More generally across all domains, it's much harder to accurately predict these sequences, because AI identify the 'style' of a text, purely from the text itself. No context surrounding the text is given to the AI, and so it guesses.

For example:

a political press release, will be written in one style of writing. And a company marketing press release will be written in a slightly different style of writing. As humans, we can easily distinguish between what is commercial marketing, and what is political, because we are given that information upfront. In latin (the choice language for some mathematicians and logicians for historical reasons), we have the information 'a priori'. A learning algorithm, isn't given that information up front, and must determine only from the text itself, whether it is more likely to be a marketing release selling some product, and therefore it should adopt a certain language style, or a political release selling an ideology and therefore should adopt a slightly different language style.

When we don't know the right answer, and have no way to determine it, the solution that most computers are programmed to adopt is a minimax solution, i.e., minimise the maximum possible error. It does this by sort of mixing and matching both marketing and political styles.

When a human reads it, sometimes it looks very strange and funny. Usually this is because it has some distinguishing feature, that we can immediately recognise as placing it as either a political or marketing document, i.e., a company name, a political party, a corporate or political letterhead, a famous person's name etc. The computer naturally doesn't know who Donald Trump is, since we haven't taught it who or what a Trump is, so it doesn't give it any precedence over any other word on the page. Actually, in the case of Donald Trump, I bet if you took the dates off of all of his tweets, even humans would have a hard time distinguishing if they were political or commercial in nature.

smeagull · on Oct 13, 2022

There is a reasoning loop that LLMs are clearly missing.

intelVISA · on Oct 13, 2022

"repeating and copying from memory what you've heard and seen gets you 97% of the way to imitating a person."

I'm in this hot take and I don't like it..

lynguist · on Oct 12, 2022

> “I think there’s this notion that humans doing math have some rigid reasoning system—that there’s a sharp distinction between knowing something and not knowing something,” says Ethan Dyer, a machine-learning expert at Google. But humans give inconsistent answers, make errors, and fail to apply core concepts, too. The borders, at this frontier of machine learning, are blurred.

This part resonates with me. There was a time when I could calculate congruent modulo problems with exponents, but I couldn’t do it step by step, I could only “hallucinate” in a fuzzy way to arrive at the solution, somehow like recalling the solution from memory.

When we have to explain our reasoning we can’t think the same way. It’s like thinking with a debugger attached.

yshrestha · on Oct 12, 2022

Language models can generate a Python function that does the math perfectly.

I bet you would get better results if you tweaked the prompt to say "Generate a Python program that solves X math problem" and then just ran the resulting Python script.

It does not need to be AGI to be useful.

swyx · on Oct 12, 2022

you can also tell the model that it doesnt know how to do math, and it respects that

https://twitter.com/goodside/status/1568448128495534081

nl · on Oct 13, 2022

This is pretty cool, although the "don't use outside the security sandbox" made me laugh: https://twitter.com/goodside/status/1568704302813700096/phot...

lupire · on Oct 12, 2022

You mean "generate a Python function that calls a library that does math perfectly, right?

hgomersall · on Oct 12, 2022

In the limit, it's going to design an AI to write some python to call a library that does the math perfectly.

thwayunion · on Oct 12, 2022

Unlike 99.99% of human programmers, who can and often do implement everything in sympy/numpy from scratch ;-)

yshrestha · on Oct 12, 2022

Exactly! Hey it gets the job done :)

Software is just a tall wedding cake of abstractions built on top of abstractions.

lairv · on Oct 12, 2022

That could only generate constructivist [0] proofs, and there are many things done in modern maths which are not constructivist. Maybe a better approach would be to use Curry-Howard [1] correspondence to directly get proofs from generated programs

[0] https://en.wikipedia.org/wiki/Constructivism_(philosophy_of_...

[1] https://en.wikipedia.org/wiki/Curry%E2%80%93Howard_correspon...

sergeyk · on Oct 13, 2022

Live demo of this you can run: https://twitter.com/sergeykarayev/status/1569377881440276481

Kim_Bruning · on Oct 12, 2022

That is also a very valid and interesting thing to do.

But it's also quite interesting to see how the model would do "by itself". All kinds of interesting lessons to be learned!

yshrestha · on Oct 12, 2022

Yeah! It is interesting to try and figure out "what" the model is actually learning. It is a valid thread of scientific inquiry.

mlajtos · on Oct 12, 2022

Exactly, we need computer-equipped neural nets. Models need to use traditional UIs (including programming languages) and then we can talk about how to stop them. :)

qualudeheart · on Oct 13, 2022

The title is contradicted by the contents of the article. Minerva’s 78% accuracy is a miraculous result considering the prior 7% SOTA.

ma2rten · on Oct 13, 2022

And almost none of the commenters seem to have read the article, so everyone is saying how predictable this was. Quite bizarre.

qualudeheart · on Oct 13, 2022

It’s wishful thinking that I myself have once fallen for. I don’t trust our society to transition to a world with powerful machine intelligence safely, so would prefer a world in which ML progresses at a glacier’s pace.

Buttons840 · on Oct 12, 2022

Are there any general purpose models that are good at learning math? I mainly know basic feed-forward neural nets, but I don't think they do well outside their training region. Math, of course, has an infinite training region.

geoduck14 · on Oct 12, 2022

From my (limited) experience with the advanced ML models, they can "do basic math", but they make amateur mistakes with basic things - which indicates they don't actually know addition, but they are good at looking at patterns in existing language.

I would assume that state-of-the-art ML models could "convert a word problem into an equation", then feed that equation into a 30 year-old graphing calculator to "do the math"

The fact that no one has done this is an indicator that "there are more important things to work on", and it is just a matter of time that someone connects the two together

MarkPNeyer · on Oct 12, 2022

This seems so much like humans that it makes me think lots of people are learning math with an ML-like approach instead of… whatever the heck people like engineers and mathematicians are doing.

idealmedtech · on Oct 12, 2022

Anyone can do higher level math, the problem is that math education is generally done by people who see math as a tool for computation, rather than a study of deep connections bordering on philosophy, and beautiful insights resembling poetry. I've been in arguments before where someone didn't believe me that the underpinnings of modern philosophy are essentially the same as math!

If the teachers don't love math, how can we expect students to?

vidarh · on Oct 12, 2022

I wonder how these language models would do if we tried to teach them maths the way schools do: Feed them explanations first, then endless sequences of toy problems, see which they got wrong and feed them corrected examples back in.

I'm not at all surprised they don't do well at maths, because while there are maths texts online, I doubt there is enough material to give these models the same experience of repetition and reinforcement to help sufficiently generalise an understanding of the underlying rules.

lupire · on Oct 12, 2022

Generating solved math problems is trivial, like making AlphaZero play itself in chess. Sparse Data is not the problem. Refusing to use it is.

vidarh · on Oct 12, 2022

I don't think it's so much a refusal, as that it's not been a sufficient priority for anyone before. As the article points out there are now a few training sets which includes math problems, and models which do well on them. But the remaining problems seems to be with basics which humans tends to learn to do consistently with a lot of repetition, and it'd be interesting to see those datasets extended to the very simple.

the_af · on Oct 12, 2022

> "convert a word problem into an equation"

Isn't this a huge step? It's not a minor detail remaining to be solved, but possibly the largest step!

lupire · on Oct 12, 2022

What you describe is exactly what state of the art has done. They even lied and said it was "solving math problems" by calling numpy methods.

alan-crowe · on Oct 12, 2022

I attempted to create a general purpose model for the exact version of the "what comes next problem." It enumerated primitive recursive functions, trying them out as it went. The limitation to primitive recursive functions was convenient because they always terminate. I didn't have to filter out the functions that ran for too long. (or do I?)

The enumeration inherently includes functions of several variables, so I wasn't restricted to examples such as 1->1, 2->4, 3->9, 4->16 etc.

I could try it out on examples such as (1,2)->3 (2,1)->3 (0,2)->2, etc. Perhaps with enough it would "learn to add" = find a primitive recursive function that did addition.

I got as far as finding the first problem. The enumeration technique that I used was effectively doing a tree recursion, like that function for computing Fibonacci numbers that bogs down because Fib(10) is computing Fib(5) lots of times. I had a lot of numbers that coded for the identity function, lots of numbers that coded for the first few functions, making the whole thing bog down, trying the same few functions over and over under different numerical disguises.

I thought that I could see my way to fixing this first problem. Have some way of recognizing numbers that give forms that give the same function. I guessed that I could approximate this by saying that if two functions give the same value on a variety of arguments they are probably the same. Then I parameterise this criterion and tune. That opens the way to creating a consolidated enumeration, analogous to fixing the tree recursive fibonacci function by memoization, except trickier.

But my health is poor and I ran out of energy.

Also, I have a guess for the second problem. What happens if I fix the first problem and my enumeration reaches decently complicated primitive recursive functions. While they will all terminate, some might run for far too long, causing the process to bog down. Rejecting them on the basis of limiting the run time might work well. We are happy to only learn reasonably effect functions for doing maths.

It is a fun idea and I encourage others to have a go.

neoneye2 · on Oct 12, 2022

There is "LODA", which uses genetic algorithms, that continuously mutates existing math programs until discovering something new. It uses OEIS as training data, around 350k known integer sequences, such as primes/fibonacci. Around 100k programs have been mined so far.

https://loda-lang.org/

I'm a contributer to LODA.

LODA runs on CPU. It doesn't use GPU. If you have spare computer, then please consider contributing with the mining. Your contribution helps.

https://boinc.loda-lang.org/loda/

fargle · on Oct 13, 2022

Can't drive. Bad at math. AI's are becoming almost human.

Can they do passive aggressive dickhole manager? If so, we're doomed.

jxy · on Oct 12, 2022

> “When multiplying really large numbers together … they’ll forget to carry somewhere and be off by one,” says Vineet Kosaraju, a machine learning expert at OpenAI. Other mistakes made by language models are less human, such as misinterpreting 10 as 1 and 0, not ten.

So the expert has never seen a seven year old struggling in adding two single digit numbers together? Did the expert learn 1 and 0 being 10 first and learn to speak second?

> The MATH group found just how challenging quantitative reasoning is for top-of-the-line language models, which scored less than 7 percent. (A human grad student scored 40 percent, while a math olympiad champ scored 90 percent.)

Is this that surprising? How would our ieee editor score on the same problem set?

abrax3141 · on Oct 12, 2022

The situation is actually much worse for science, or any moving field. This models are by design and necessity historical. So that if, for example, the FDA issues a drug approval overnight, The model camp follow sudden changes in a “reasoned” why.

mavu · on Oct 12, 2022

Talking about this stuff would be so much easier if we stopped calling those software "AI".

It is a machine learning algorithm. It is an electronic Parrot.

thats it. And suddenly no one will wonder "OH MY WHY CANN IT NOT DO MATH< IT SMART?!?!"

nl · on Oct 13, 2022

> It is a machine learning algorithm.

True

> It is an electronic Parrot.

This is incorrect, and unclear why people think this.

The whole point of a good ML system is that it doesn't parrot training data. A good system can extrapolate novel answers from things it has seen. That is very far from "parroting".

chaxor · on Oct 13, 2022

Don't listen to these bots, they're all just parroting this same idea that models parrot.

nl · on Oct 13, 2022

Not sure if you are referring to me or to the OP

chaxor · on Oct 13, 2022

chaxor · on Oct 13, 2022

The OP

simplencomplex · on Oct 13, 2022

Why instead of expecting a language to get math, don't we use the language model to generate code, run the code and use the result?

If a language model basically is the equivalent of a dumb human, how can we expect it to be better than us at math?Most humans use calculators even for simple equations.

I've been thinking about giving access to a search engine and a command line to a GPT-3 based AI, so that it can choose to run code it wrote or to expand its knowledge, I think that's a good way to expand its capabilities, even if that's probably how we're going to get skynet in the end.

abrax3141 · on Oct 12, 2022

More generally, they struggle to get thing right. They’re great at grammatical confabulation, but when you need a correct answer, or a correct drug recommendation, ask an expert.

xiphias2 · on Oct 12, 2022

It is a great sign that we are building AI in the right direction. Before building artificial human intelligence, it makes sense to get to the intelligence level of a mosquito or fly, then go to more intelligent animals in later iterations.

As most of the human knowledge is encoded in videos, getting better at understanding / generating videos will clearly get us closer to make computers understand the world.

burlesona · on Oct 12, 2022

I genuinely wonder if we will find there are some inherent tradeoffs to knowledge and understanding such that if we ever have machines that can “think like humans” they would in practice run into human-like cognition limits: ie such machines would be “bad at math” in the same way humans are “bat at math” compared to conventional computers.

ryandvm · on Oct 12, 2022

Indeed. I posit that as we get closer and closer to simulating how the human brain works in the pursuit of artificial intelligence, we're going to start seeing more and more of the same "bugs" that humans have (logical fallacies, susceptibility to illusions, mental illness, etc.)

You think your job sucks now, just wait until you're dealing with the general AI over on the UX team that's trying to get your ass fired because it's fostering a 3 year old grudge over that time you said Chappie was stupid.

Der_Einzige · on Oct 12, 2022

At first, I thought it was surprising that a language model with a restricted vocabulary (e.g. banning the letter "E") acts significantly more "mentally ill", and then I thought about how I would come across if forced to use that constraint all the time, and I realized that maybe I'd appear mentally ill too!

You can play with LMs with constrained vocabularies here: https://huggingface.co/spaces/Hellisotherpeople/Gadsby

blackbear_ · on Oct 12, 2022

That's an interesting thought. However it's not cognitive limits that make humans bad at math, it's just a "hardware" issue: a human with a piece of paper is much better at math.

aaaaaaaaaaab · on Oct 12, 2022

Even if neural networks were fundamentally incompatible with conventional computation, I don't see why you couldn't augment a neural network with a conventional ALU to do the numerical computations. This is exactly what humans do with pencil and paper - it's just a bit too slow.

auganov · on Oct 12, 2022

Either the language model would need to know what it's doing or the host program would have to know what the AI is doing. Both seem out of reach. The latter seems more doable since you could hack something up for simple scenarios, but you'd effectively have to match the capabilities of the neural network in a classical way to handle every case (which would render using a neural net moot).

_0ffh · on Oct 12, 2022

And no wonder, as they correspond much closer to a Kahneman system 1 than system 2, where we do most of our math.

_0ffh · on Oct 15, 2022

Btw, here's an example of how even a very simple zero-shot/prompt-engineering attempt to introduce a bit of system 2 reasoning into a language model can improve results.

https://arxiv.org/abs/2201.11903

bilsbie · on Oct 12, 2022

It seems like they know as much math as you would if you only listened to people talking about math your whole life

phlipski · on Oct 13, 2022

Don't feel too bad AI - most humans struggle to "get" math too!

make3 · on Oct 12, 2022

The article is actually about how they are getting good at it :)

blueprint · on Oct 12, 2022

Maybe because it's not actual AI.

WalterBright · on Oct 12, 2022

People struggle to get math, too.

bionhoward · on Oct 12, 2022

I bet vision transformers understand math better because it’s somewhat artistic

CommieBobDole · on Oct 12, 2022

Also Excel is terrible at encoding MP3s.

It's a language model; why would we expect it do math or try to somehow shoehorn math into the model? Do the language centers of our brain do math?

If something approximating AGI is going to happen, it's going to be a lot of models tied together with an executive function to recognize and send things to the area that's good at working with them.

hey_over_here · on Oct 12, 2022

> It's a language model; why would we expect it do math or try to somehow shoehorn math into the model?

Language models can do math, or anyway arithmetic. That's because language models are trained to predict the next token in a sequence and an arithmetic operation can be represented as a sequence of tokens.

For example, see Figure 3.10 on page 22, here:

https://arxiv.org/abs/2005.14165

The only problem is that language models are crap at arithmetic because they can only predict the next token in a sequence. That's enough to guess at the answer of an arithmetic problem some of the time but not enough to solve any arithmetic problem all of the time.

More generally, the answer to your question is in the same Figure 3.10 I've referenced above. OpenAI (and others) have claimed that their large language models can do arithmetic. So then people tested the claim and found it to be a bag of old cobblers.

Hence the article above. Nobody's trying to "shoehorn" anything anywhere. It's just something that language models can do, albeit badly.

CommieBobDole · on Oct 12, 2022

Right, but what you're describing is 'not being able to do math'. Like, if I've memorized a multiplication table and can give you any result that's on the table but can't multiply anything that wasn't on the table, I can't do multiplication.

hey_over_here · on Oct 12, 2022

It depends on how you see it. I agree with you, generally, but in the limit, if you memorised all possible instances of multiplication, then yes, you could certainly be said to know multiplication.

I've not just come up with that off the top of my head, either. In PAC-Learning (what we have in terms of theory, in machine learning) a "concept" (e.g. multiplication) is a set of instances and a learning system is said to learn a concept if it can correctly label each of a set of testing instances by membership to the target concept with arbitrary probability of error. Trivially, a learner that has memorised every instance of a target concept can be said to have learned the concept. All this is playing fast and loose with PAC-Learning terminology for the sake of simplification.

The problem of course is that some concepts have infinite sets of instances, and that is the case with arithmetic. On the other hand, it's maybe a little disingenuous to require a machine learning system to be able to represent infinite arithmetic since there is no physical computer that can do that, either.

Anyway that's how the debate goes on these things. I'm on the side that says that if you want to claim your system can do arithmetic, you have to demonstrate that it has something that we can all agree is a recognisable representation of the rules of arithmetic, as we understand them. For instance, the axioms of Peano arithmetic. Which though is a bit unfair for deep learning systems that can't "show their work" in this way.

thwayunion · on Oct 12, 2022

What are some (non-nefarious) applications of generative language models that produce language which isn't constrained by some sort of rationality or directed by some sort of high-level goal?

The point isn't the math. The point is that, in math and similar disciplines, it's harder to get away with producing mostly undirected gibberish that happens to have some imputed meaning. The point is "use language to do something where it's easy to verify correctness and generating infinite amounts of synthetic data is trivial"

If a language model can't even do high school algebra, then I have a lot less confidence that it will ever be useful for customer service applications or any other number of potential applications outside of propaganda, advertising, and spam.

dr_dshiv · on Oct 12, 2022

Well, because we want rational language models. Something with a sense of truth.

Math is not irrelevant—and I’m sure it’s a solvable problem with language models.

CommieBobDole · on Oct 12, 2022

But if it's rational and has a sense of truth, then it's AGI. Which I don't think is impossible or even unattainable within a reasonable amount of time, but we're .001% of the way there, not 50% or 75%.

These models are fascinating, but the problem 'a lot of the things this model generates lack any semantic meaning' is inherent and likely insurmountable without connecting the model to other, far more complex models that haven't been built yet.

We are at the level where our models can consistently generate blocks of text with full sentences in them that make grammatical sense. Which is pretty cool.

But the next step is being able to consistently generate full sentences that make grammatical sense and usefully convey information. And while the current models do that a lot of the time, they don't do that all of the time because they don't and can't know the difference without essentially being a different thing. Because to do that consistently, we need an "understanding what things mean" model. Which is many orders of magnitude larger and more difficult than a text generator.

qualudeheart · on Oct 13, 2022

Language models aren’t even terrible at math. The Minerva paper provides a devastating counterexample. It will soon be replaced by more powerful linguistic-mathematical systems. Within the next twelve months we may well see 100% performance on all major benchmarks.