Hacker News new | past | comments | ask | show | jobs | submit | thomasahle's comments login

People will keep improving LLMs, and by the time they are AGI (less than 30 years), you will say, "Well, these are no longer LLMs."

Will LLMs approach something that appears to be AGI? Maybe. Probably. They're already "better" than humans in many use cases.

LLMs/GPTs are essentially "just" statistical models. At this point the argument becomes more about philosophy than science. What is "intelligence?"

If an LLM can do something truly novel with no human prompting, with no directive other than something it has created for itself - then I guess we can call that intelligence.


How many people do you know who are capable of doing something truly novel? Definitely not me, I'm just an average phd doing average research.

Literally every single person I know that is capable of holding a pen or typing on a keyboard can create something new.

Something new != truly novel. ChatGPT creates something new every time I ask it a question.

adjective: novel

definition: new or unusual in an interesting way.

ChatGPT can create new things, sure, but it does so at your directive. It doesn't do that because it wants to which gets back to the other part of my answer.

When an LLM can create something without human prompting or directive, then we can call that intelligence.


I did a really fun experiment the other night. You should try it.

I was a little bored of the novel I have been reading so I sat down with Gemini and we collaboratively wrote a terrible novel together.

At the start I was promoting it a lot about the characters and the plot, but eventually it starting writing longer and longer chapters by itself. Characters were being killed off left right and center.

It was hilariously bad, but it was creative and it was fun.


What does intelligence have to do with having desires or goals? An amoeba can do stuff on its own but it's not intelligent. I can imagine a god-like intelligence that is a billion times smarter and more capable than any human in every way, and it could just sit idle forever without any motivation to do anything.

Does the amoeba does make choices?

Do you make that choice? Did I?

I'm a lowly high school diploma holder. I thought the point of getting a PhD meant you had done something novel (your thesis).

Is that wrong?


My phd thesis, just like 99% of other phd theses, does not have any “truly novel” ideas.

Just because it's something that no one has done yet, doesn't mean that it's not the obvious-to-everyone next step in a long, slow march.

AI manufacturers aren't comparing their models against most people; they now say its "smarter than 99% of people" or "performs tasks at a PhD level".

Look, your argument ultimately reduces down to goalpost-moving what "novel" means, and you can position those goalposts anywhere you want depending on whether you want to push a pro-AI or anti-AI narrative. Is writing a paragraph that no one has ever written before "truly novel"? I can do that. AI can do that. Is inventing a new atomic element "truly novel"? I can't do that. Humans have done that. AI can't do that. See?


Isn't the human brain also "just" a big statistical model as far as we know? (very loosely speaking)

What the hell is general intelligence anyway? People seem to think it means human-like intelligence, but I can't imagine we have any good reason to believe that our kinds of intelligence constitute all possible kinds of intelligence--which, from the words, must be what "general" intelligence means.

It seems like even if it's possible to achieve GI, artificial or otherwise, you'd never be able to know for sure that thats what you've done. It's not exactly "useful benchmark" material.


> What the hell is general intelligence anyway?

OpenAI used to define it as "a highly autonomous system that outperforms humans at most economically valuable work."

Now they used a Level 1-5 scale: https://briansolis.com/2024/08/ainsights-openai-defines-five...

So we can say AGI is "AI that can do the work of Organizations":

> These “Organizations” can manage and execute all functions of a business, surpassing traditional human-based operations in terms of efficiency and productivity. This stage represents the pinnacle of AI development, where AI can autonomously run complex organizational structures.


There's nothing general about AI-as-CEO.

That's the opposite of generality. It may well be the opposite of intelligence.

An intelligent system/individual reliably and efficiently produces competent, desirable, novel outcomes in some domain, avoiding failures that are incompetent, non-novel, and self-harming.

Traditional computing is very good at this for a tiny range of problems. You get efficient, very fast, accurate, repeatable automation for a certain small set of operation types. You don't get invention or novelty.

AGI will scale this reliably across all domains - business, law, politics, the arts, philosophy, economics, all kinds of engineering, human relationships. And others. With novelty.

LLMs are clearly a long way from this. They're unreliable, they're not good at novelty, and a lot of what they do isn't desirable.

They're barely in sight of human levels of achievement - not a high bar.

The current state of LLMs tells us more about how little we expect from human intelligence than about what AGI could be capable of.


Apparently OpenAI now just defines it monetarily as "when we can make $100 billion from it." [0]

[0] https://gizmodo.com/leaked-documents-show-openai-has-a-very-...


That's what "economically valuable work" means.

That's a silly definition, even if somebody with a lot of money wrote it. Organizations can do more than individuals for the same reason that an M4 can do more than a Pentium 4--it's a difference in degree.

Generality is about differences in kind. Like how my drill press can do things that an M4 can't. How could you ever know that your kinds of intelligence are all of them?


The way some people confidently assert that we will never create AGI, I am convinced the term essentially means "machine with a soul" to them. It reeks of religiosity.

I guess if we exclude those, then it just means the computer is really good at doing the kind of things which humans do by thinking. Or maybe it's when the computer is better at it than humans and merely being as good as the average human isn't enough (implying that average humans don't have natural general intelligence? Seems weird.)


When we say "the kind of things which humans do by thinking", we should really consider that in the long arc of history. We've bootstrapped ourselves from figuring out that flint is sharp when it breaks, to being able to do all of the things we do today. There was no external help, no pre-existing dataset trained into our brains, we just observed, experimented, learned and communicated.

That's general intelligence - the ability to explore a system you know nothing about (in our case, physics, chemistry and biology) and then interrogate and exploit it for your own purposes.

LLMs are an incredible human invention, but they aren't anything like what we are. They are born as the most knowledgeable things ever, but they die no smarter.


>you'd never be able to know for sure that thats what you've done.

Words mean what they're defined to mean. Talking about "general intelligence" without a clear definition is just woo, muddy thinking that achieves nothing. A fundamental tenet of the scientific method is that only testable claims are meaningful claims.


Looking back at CUDA, deep learning, and now LLM hypes, I would bet it'll be cycles of giant groundbreaking leaps followed by giant complete stagnations, rather than LLM improving 3% per year for coming 30 years.

They‘ll get cheaper and less hardware demanding but the quality improvements get smaller and smaller, sometimes hardly noticeable outside benchmarks

What was the point of this comment? It's confrontational and doesn't add anything to the conversation. If you disagree, you could have just said that, or not commented at all.

There's been a complaint for several decades that "AI can never succeed" - because when, say, expert systems are developed from AI research, and they become capable of doing useful things, then the nay-sayer say "That's not AI, that's just expert systems".

This is somewhat defensible, because what the non-AI-researcher means by AI - which may be AGI - is something more than expert systems by themselves can deliver. It is possible that "real AI" will be the combination of multiple approaches, but so far all the reductionist approaches (that expert systems, say, are all that it takes to be an AI) have proven to be inadequate compared to what the expectations are.

The GP may have been riffing off of this "that's not AI" issue that goes way back.


The people who go around saying "LLMs aren't intelligent" while refusing to define exactly what they mean by intelligence (and hence not making a meaningful/testable claim) add nothing to the conversation.

OK, but the people who go around saying "LLMs are intelligent" are in the same boat...

I'll happily say that LLMs aren't intelligent, and I'll give you a testable version of it.

An LLM cannot be placed in a simulated universe, with an internally consistent physics system of which it knows nothing, and go from its initial state to a world-spanning civilization that understands and exploits a significant amount of the physics available to it.

I know that is true because if you place an LLM in such a universe, it's just a gigantic matrix of numbers that doesn't do anything. It's no more or less intelligent than the number 3 I just wrote on a piece of paper.

You can go further than that and provide the LLM with the ability to request sensory input from its universe and it's still not intelligent because it won't do that, it will just be a gigantic matrix of numbers that doesn't do anything.

To make it do anything in that universe you would have to provide it with intrinsic motivations and a continuous run loop, but that's not really enough because it's still a static system.

To really bootstrap it into intelligence you'd need to have it start with a very basic set of motivations that it's allowed to modify, and show that it can take that starting condition and grow beyond them.

You will almost immediately run into the problem that LLMs can't learn beyond their context window, because they're not intelligent. Every time they run a "thought" they have to be reminded of every piece of information they previously read/wrote since their training data was fixed in a matrix.

I don't mean to downplay the incredible human achievement of reaching a point in computing where we can take the sum total of human knowledge and process it into a set of probabilities that can regurgitate the most likely response to a given input, but it's not intelligence. Us going from flint tools to semiconductors, vaccines and spaceships, is intelligence. The current architectures of LLMs are fundamentally incapable of that sort of thing. They're a useful substitute for intelligence in a growing number of situations, but they don't fundamentally solve problems, they just produce whatever their matrix determines is the most probable response to a given input.


There's an interesting extension to Analytic Combinatorics by Flajolet/Sedgewick called "Analytic Combinatorics in Several Variables" by Pemantle and Wilson: https://www2.math.upenn.edu/~pemantle/papers/ACSV.pdf and https://acsvproject.com/

It extends the original framework with a lot of useful new primitives, like Hadamard products. Super useful!


While this interesting, what are its main applications?


I'm currently using it to analyze a version of the Kaczmarz algorithm, generalizing my answer here: https://mathoverflow.net/a/490506/5429

But more generally, why do we want to analyze combinatorics and algorithms? I suppose it gives us some confidence that we are actually making progress.


Thanks!


> I optimistically imagine a world where we replace the relatively non-specific asymptotic bounds of an algorithm with a very specific quantification of the work an algorithm will take.

I think you're getting this backwards. The (good) point you and Mitzenmacher are making is that our analysis standards are _too strict_, and hence we avoid good algorithms that are just too hard to analyze.

If we instead require exact combinatorics, e.g., using Flajolet/Sedgewick, that will be _even harder_ than if we _just_ try to do asymptotic analysis. Remember that asymptotics are a way to _approximate_ the exact number of steps. It's supposed to make it _easier_ for you.


Oh I think there is no real doubt that this is true, it's why analytic combinatorics did not succeed in replacing asymptotic analysis. I just wish it had. If it was tractable in a similar set of areas I think a lot of things would be quite a bit better.



That nice benchmark shows that multiple implementations of HNSW perform differently (my experience also). It would be helpful therefore if HANN benchmarked its implementation against the others, and tried to get the details the same as the best version.


> something interactive with exercises and engagement

Books have exercises. It's your job to engage.

This book, in particular, has 3 pages of Problems per chapter. The only way to learn the math is to do all of them.


The EU has to start working more with China, for better or worse.

Not as friends or allies, but there aren't a lot of those left anyway. It's only rational in this multi polar world to have some level of engagement with all parties.

Most of the sanctions Europe have on China were just to please the US anyway.


Why is it in the interest of the EU to work with an entity that doesn't condone concepts like democracy, due process, or the rule of law?

Shouldn't it be the mandate of liberal democracies to enable liberal democracies and to prevent authoritarian entities from growing power and reach?


At the end of the paper they mention "two problems from the 2025 MIT Integration Bee qualifying exam which the system continued to answere incorrectly".

They say the questions were among the most complex questions on the exam, but the first one is just

   ∫ ∛(x · ∜(x · ∜(x · √(x · √(x · ⋯ ))))) dx
which just requires you to compute

   1/3 + 1/(3*4) + 1/(3*4*5) + ...
So hardly very advanced math.


It's a 7B model. So while the problem is not advanced the model is far from it too.


I'd be curious what would happen if you SFTed a larger model with successful reasoning traces from the smaller model. Would it pick up the overall reasoning pattern, but be able to apply it to more cases?


I'm surprised they didn't benchmark it against Pixtral.

They test it against a bunch of different Multimodal LLMs, so why not their own?

I don't really see the purpose of the OCR form factor, when you have multimodal LLMs. Unless it's significantly cheaper.


> the model is allowed to train on pretty much the test case knowing the ground truth

The task is to solve the integral symbolically, though, right?

It's a hard problem to solve, even if the model is given access to a numerical integrator tool it can use on the main problem itself.


That's a fair point.


This is neat! I've been thinking about how to best express recursion with diagrams for tensorgrad (https://github.com/thomasahle/tensorgrad) and this might just be it.

So much computation these days are done on tensors, mostly for easy parallelism reasons. But we can't get completely rid of recursion. Great to have work like this making them more computationally compatible!


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: