Don't study or work on LLMs

CSMastermind · 2024-05-22T17:24:34 1716398674

He said if you're a student interested in building the next generation of AI systems not to work on LLMs. Which is different than telling no one to study or work on them.

He's been pretty consistent in saying that he doesn't think they're the future because they lack a world model.

corimaith · 2024-05-22T18:07:03 1716401223

To be fair, designing a world model or a proper knowleddge representation system would go more into the foundations of mathematics that would likely require a literal genius to figure out. We'd be right back to the failures of symbolic systems during the AI Winter in the 90s.

im3w1l · 2024-05-22T19:03:03 1716404583

I promped ChatGPT as follows

    I have a logical puzzle. I want you to write code for solving it using an SAT (or SMT) solver of your choice.

    "Question 2: Amit, Bharati, Cheryl, Deepak, and Eric are five friends sitting in a restaurant. They are wearing caps of five different colours — yellow, blue, green, white and red. Also, they are eating five different snacks — burgers, sandwiches, ice cream, pastries, and pizza.

        The person wearing a red cap is eating pastries.
        Amit does not eat ice cream, and Cheryl is eating sandwiches.
        Bharati is wearing a yellow cap and Amit wearing a blue cap.
        Eric is eating pizza and is not wearing a green cap."

It gave me python code for solving the problem using pysmt. Each constraint it added had a nice little comment referring back to the problem statement. After correcting a trivial typo, the code ran and produced the correct answer.

So in other words, LLM's are already almost powerful enough to use and integrate with a symbolic approach.

corimaith · 2024-05-23T04:58:57 1716440337

Our existing knowledge representation systems can already deal with this. This is beyond my field of knowledge, but I guess a better example of a problem we can't solve easily yet would be achieving paraconsistent logic; A logical system that can deal with contradictions without falling into the principle of explosion.

Like I said, this is some very abstract stuff that delves more into philosophy and mathematics that not many people are going to doing. The kind of system that people are trying to build here would be close to a "Theory of Everything".

antman · 2024-05-22T20:04:46 1716408286

I asked: show me tbe nearest places to xxx of historical or natural beaty interest and add coordinates wikipedia, and google maps links. Create a python program that uses the coordinates and optimizes visiting them.

It created a functioning python program and I learnt scipy has a function that can solve tsp problems with a different name, Then it run it but timed out but ok

tsunamifury · 2024-05-22T20:54:08 1716411248

Does a human's inability to do that in less than 20 seconds prove that we aren't intelligent?

em-bee · 2024-05-22T23:33:18 1716420798

no. thinking takes time. the reason a computer is faster at this is because a computer doesn't think.

lagrange77 · 2024-05-22T18:19:24 1716401964

I have the feeling that both, symbolic methods and Reinforcement Learning, will be play a major role in more intelligent AI systems.

aaroninsf · 2024-05-22T18:32:23 1716402743

I continue to find him a clown and blowhard.

They certainly have a world model. What that is and how it compares to ours is the interesting part.

tsunamifury · 2024-05-22T17:20:41 1716398441

I've listened to hours of Hinton and Yan LaCun on their tacit debate on this. And over and over I come away with Hinton explaining mathematically and conceptually whats going on in LLMs and why it's a useful simulation of intelligence and will continue to grow, also while the world backs him up.

While Yan LaCun's arguments are just 'it doesn't work that way' and 'thats not intelligence' and at point literally quoting stories of failed LLM tests that are patentently untrue today.

I dunno I find the whole thing REALLY wierd... beyond the explanation that LaCun still adheres to a really outdated form of saying we must teach AI logic explicitly for it to be intelligent.

staticman2 · 2024-05-22T19:26:30 1716405990

Hinton is the guy saying LLMs are conscious, do you agree with that as well?

https://asia.nikkei.com/Business/Technology/Godfather-of-AI-...

tsunamifury · 2024-05-22T19:31:15 1716406275

I think when he uses the term he means a good simulation of what we would call consciousness -- so yes I agree. It can pay attention at many layers of abstraction and create from it.

staticman2 · 2024-05-22T19:43:42 1716407022

He said: "I think multimodal chatbots are already having subjective experiences."

You changed this in your paraphrasing to something along the lines of "I think multimodal chatbots are already good simulations of subjective experiences".

I don't think the meaning is the same. A simulation of the weather wouldn't be expected to blow my house down. It isn't interchangable with the word weather.

tsunamifury · 2024-05-22T20:45:32 1716410732

I also don't disagree with that. Attention layers can alter vectors derived from learned/read/stored experiences either by others or itself, then alter its vectors to output new relative responses based on its unique and subjective experiences. I think it a consciousness that can be snapped in and out of existence -- but yes... in a way...so can humankind.

nybsjytm · 2024-05-22T21:34:53 1716413693

> And over and over I come away with Hinton explaining mathematically and conceptually whats going on in LLMs

Every time I see Hinton talking about LLMs he's just anthropomorphizing whatever 'mathematics' is going on there. He's a great researcher but tbh I think he's a really silly guy

luke-stanley · 2024-05-22T17:46:13 1716399973

I'd love to know more about what he says which is now untrue. Could you give specific examples of false claims? Thanks.

tsunamifury · 2024-05-22T20:47:34 1716410854

In his lex interview he repeatedly stated that GPT4 failed basic reasoning tests that if you sat down and ran, like the monte test with modified elements, it can pass just fine. It can reason on new applications of older observations.

This BASIC misunderstanding of GPT was repeated constantly to a degree that made me question if he even understood them at all.

luke-stanley · 2024-05-22T22:09:52 1716415792

Is this something that changed much between versions of GPT-4? Do you think Yan meant that GPT-4 failed with some special private test set in the same style (e.g: to avoid contaminated tests)? I couldn't identify a "monte test with modified elements", do you mean the "Monty Hall problem"? (Not sure that's a basic reasoning test, but it's the only thing I could think of). I didn't see that in the transcript either though: https://lexfridman.com/yann-lecun-3-transcript Or do you mean that by adding some Monte Carlo magic to the test's completion sampling that it can pass just fine?

For the Lex interview I only managed to catch the first parts about image processing, which did seem perhaps a bit dated. I mean to watch the podcast soon though.

Thanks for your thoughts, sorry I didn't get it yet!

iwjfksosofn · 2024-05-22T19:26:09 1716405969

Do you have any tips for where to find these discussions? I’d be interested to learn more

tsunamifury · 2024-05-22T19:33:20 1716406400

The two basic places to start are Yan LaCun on Lex Podcast and Hintons lecture at University of Toronto on YouTube -- both are easy to find.

iwjfksosofn · 2024-05-22T23:52:50 1716421970

thanks, I’ll check them out

Havoc · 2024-05-22T18:03:19 1716400999

Work on the next big thing is pretty useless advice though. If people knew what that was they wouldn’t be going for LLMs

lagrange77 · 2024-05-22T17:56:43 1716400603

"Don't work on LLM. This is in the hands of large companies, there's nothing you can bring to the table. You should work on next-gen AI systems that lift the limitations of LLMs."

Even if, or especially if the technology is in the hands of large companies, understanding/studying them is important, not futile.

vermorel · 2024-05-22T17:50:41 1716400241

The other comment stating "Yann LeCun is the Paul Krugman of AI" does resonate with me. There is a lot to be criticized about his takes on AI in general, and the need for a "worldview" in particular.

The longer version at: https://www.lokad.com/blog/2024/3/18/ai-interview-with-yann-...

max_ · 2024-05-22T17:29:28 1716398968

Context, I shared this because it sounded consistent with what I said previously about potential areas to explore in AI in case we want to make new progress. [0]

[0]: https://news.ycombinator.com/item?id=40333962

thecleaner · 2024-05-22T21:51:18 1716414678

Not quite sure the hate on this thread. He's right, theres not much you can contribute to LLMs. LLMs are unlike traditional software, they require large expensive machines and access to these is difficult to come by. Compare that with studying databases for example you can get quite far just with a laptop, the bottleneck is the knowledge.

Take for example even simple systems like wav2vec 2.0. The original model was trained on 128Gpus and if one were to try and reproduce the paper on normal hardware it will take months to get to a result. Not just individuals, these applications are put of the reach of all but the most well funded of companies.

deodar · 2024-05-22T17:20:23 1716398423

Victory has many fathers. Defeat is an orphan.

I guess that explains the number of people being touted as "The Godfather" of AI.

billconan · 2024-05-22T17:34:27 1716399267

what are some of the other directions then? any ideas?

haliyat · 2024-05-23T15:09:03 1716476943

There are a ton of different promising AI approaches explored by researchers. When I was at MIT in the early 2010s, when Deep Learning was just taking off, it was seen as one of a suite of new exciting techniques. For example, some of the grad students that taught the AI classes I took were hyped on an approach called Probabilistic Programming, which adds a complete suite of programming concepts (eg. If-statements) to Bayesian networks allowing you to write extremely concise and powerful programs that can learn based on data and handle uncertainty during an execution. Also while I was there Geoff Hinton gave a series of master lectures on the future of AI after Deep Learning and he talked a lot about an approach called Inverse Graphics: basically treating images as one output from a graphics rendering pipeline that includes a scene with geometry and lighting, projection transformations, etc and then trying to learn all the parameters of that pipeline from images so that you produce not just a classifier output but a whole scene description. Both really cool and exciting approaches that build on top of deep neural nets but aren’t bound to them.

One of the negative effects of the huge hype wave (hype tsunami is maybe more appropriate) around LLMs and genAI generally is that it starves these other approaches of resources (as well as discouraging people from exploring other new approaches). This is what LeCunn is responding to. I know some zealots believe that “bigger LLMs” is all we need for AI progress forever, but based on the entire history of the field, a number of technical issues with LLMs, and the nature of progress of LLMs in the last few years I would described this view as blinkered and risky at best. The field often advances fastest from the early years of new approaches rather than massive over-investment in a single approach based on some early promising results. Historically the later approach tends to lead to AI winters.

haliyat · 2024-05-23T15:26:23 1716477983

Also, FWIW, I saw a bunch of demos of “token sequence learning” that did a lot of the applications that people have been so excited about with LLMs: producing text descriptions of video and images, text summarization with question answering, etc. Those demos were a little janky and limited and obviously only at the academic paper with impressive video demo stage which is a far cry from fast and reliable enough to be useful in production. But they weren’t categorically different from what we’ve seen with transformers and LLMs. This is one of the reasons I’m more skeptical about claims that transformers + more data and compute is all we need for AGI. After a decade plus of not just MASSIVE compute and data scaling but some fairly clever new techniques I would describe progress as incremental rather than transformational beyond those older results. Honestly, people have forgotten this now, but the biggest change that ignite the LLM hype was the UX decision to present interactions with these models in the framework of a conversation with an agent. This is a trick that goes back at least as far as Eliza and it’s effect is mainly in how it primes the user to think about and relate to the tech. That is also an area where more work can be done (conversational interfaces are not the One Solution to all computing). I recommend googling Interactive Machine Learning, which is its own sub-discipline that specifically studies this problem of how to build UX that is native to, and takes best advantage of, ML/AI techniques to produce software that people can use to accomplish real tasks.

max_ · 2024-05-22T18:11:07 1716401467

Try to study the models cognitive scientists & linguists have developed.

And replicate them in computers.

world2vec · 2024-05-22T17:19:07 1716398347

If all I want is toast, do I need to learn how a toaster works or simply get good at making toasted bread?

thecleaner · 2024-05-22T21:52:36 1716414756

You certainly dont need to study LLMs for it.

dotinvoke · 2024-05-22T17:28:21 1716398901

This is on brand for him, he's been voicing his skepticism of LLMs for a long while now.

23B1 · 2024-05-22T17:20:12 1716398412

Yann LeCun is the Paul Krugman of AI

im3w1l · 2024-05-22T17:51:35 1716400295

If only there was some kind of algorithm telling us how to deal with this exploration-exploitation dilemma...

talldayo · 2024-05-22T17:16:27 1716398187

> This is in the hands of large companies, there's nothing you can bring to the table. You should work on next-gen AI systems that lift the limitations of LLMs.

Translation: "Buy more API tokens!"

thangngoc89 · 2024-05-22T17:19:07 1716398347

I believe this is quite correct. You need the whole WWW internet for training and GPU farms to train this.

qntty · 2024-05-22T17:21:57 1716398517

LaCun works for Meta who gives away models for free

otabdeveloper4 · 2024-05-22T17:31:01 1716399061

They do not. A model is the training set plus feature engineering scripts, not the resulting weights.

qntty · 2024-05-22T17:36:35 1716399395

Whatever terminology you use, it can be run without API tokens.

lofenfew · 2024-05-22T17:35:20 1716399320

If you're interested in building the next generation of that model, sure. If you just want to use it the weights are sufficient.

otabdeveloper4 · 2024-05-22T18:32:59 1716402779

I quibble with the terminology. Meta doesn't "give away" their model. They only let you run it for free. (They don't "give away" Facebook either, they only let you use it for free.)

jaggs · 2024-05-22T19:04:04 1716404644

I genuinely would like to know what your full definition of a model is? There seems be to so much confusion...

otabdeveloper4 · 2024-05-23T04:22:31 1716438151

"An abstract description of a concrete system using mathematical concepts and language", at least according to Wikipedia.

CuriouslyC · 2024-05-22T17:19:33 1716398373

Or download the weights and run it yourself?

user90131313 · 2024-05-22T17:18:23 1716398303

He also lies and gives wrong information on twitter about other companies. So not a honest or good person. No point listening to whatever he say.

Jyaif · 2024-05-22T17:20:05 1716398405

I don't know what you are referring to, but isn't he right about LLMs?

user90131313 · 2024-05-22T17:56:03 1716400563

https://x.com/ylecun/status/1742563244949078369

This one. He deliberately writes wrong information. Even he is right in the future, 0 reason to follow his word.

em-bee · 2024-05-22T19:08:15 1716404895

can't access that message. please quote it.

user90131313 · 2024-05-22T20:05:42 1716408342

"A number of reasons: 1. Google had 600 employees and no revenue at the time (this was January 2002, before ads, gmail, etc). You can't really do real research at that stage. My job would have involved a lot of corporate strategy, technology development for products, management, etc. I wanted to refocus on basic research in ML, vision, robotics, and computational neuroscience. 2. The salary was low. Obviously, the stock option package would have ended up stratospheric. But we had teenage sons getting close to college and needed cash. Housing is more expensive in Silicon Valley than in New Jersey. 3. My family didn't want to move to California. You can't uproot teenagers without them hating you for it. 4. I had just left AT&T and joined the NEC Research Institute in Princeton. I thought I could work on ML/vision/robotics/neuroscience there. It turns out the place was quickly disintegrating into an applied research lab and I left for NYU after 18 months.

Had I joined, I think the research culture at Google would have been different. I might have made it a bit more open and a bit more ambitious a bit earlier.

"

talldayo · 2024-05-22T17:19:06 1716398346

I frankly don't even know what Twitter is for anymore, if not abusing the augur of public consciousness.

ldjkfkdsjnv · 2024-05-22T17:20:49 1716398449

Machine learning as a field is dead for anyone thats not truly in the academic elite. All models will be generalized/dwarfed by the performance of foundational models, which will be controlled by a small cabal of big tech companies.