Hacker News new | past | comments | ask | show | jobs | submit login
Don't study or work on LLMs (twitter.com/ylecun)
68 points by max_ 7 months ago | hide | past | favorite | 53 comments



He said if you're a student interested in building the next generation of AI systems not to work on LLMs. Which is different than telling no one to study or work on them.

He's been pretty consistent in saying that he doesn't think they're the future because they lack a world model.


To be fair, designing a world model or a proper knowleddge representation system would go more into the foundations of mathematics that would likely require a literal genius to figure out. We'd be right back to the failures of symbolic systems during the AI Winter in the 90s.


I promped ChatGPT as follows

    I have a logical puzzle. I want you to write code for solving it using an SAT (or SMT) solver of your choice.

    "Question 2: Amit, Bharati, Cheryl, Deepak, and Eric are five friends sitting in a restaurant. They are wearing caps of five different colours — yellow, blue, green, white and red. Also, they are eating five different snacks — burgers, sandwiches, ice cream, pastries, and pizza.

        The person wearing a red cap is eating pastries.
        Amit does not eat ice cream, and Cheryl is eating sandwiches.
        Bharati is wearing a yellow cap and Amit wearing a blue cap.
        Eric is eating pizza and is not wearing a green cap."
It gave me python code for solving the problem using pysmt. Each constraint it added had a nice little comment referring back to the problem statement. After correcting a trivial typo, the code ran and produced the correct answer.

So in other words, LLM's are already almost powerful enough to use and integrate with a symbolic approach.


Our existing knowledge representation systems can already deal with this. This is beyond my field of knowledge, but I guess a better example of a problem we can't solve easily yet would be achieving paraconsistent logic; A logical system that can deal with contradictions without falling into the principle of explosion.

Like I said, this is some very abstract stuff that delves more into philosophy and mathematics that not many people are going to doing. The kind of system that people are trying to build here would be close to a "Theory of Everything".


I asked: show me tbe nearest places to xxx of historical or natural beaty interest and add coordinates wikipedia, and google maps links. Create a python program that uses the coordinates and optimizes visiting them.

It created a functioning python program and I learnt scipy has a function that can solve tsp problems with a different name, Then it run it but timed out but ok


Does a human's inability to do that in less than 20 seconds prove that we aren't intelligent?


no. thinking takes time. the reason a computer is faster at this is because a computer doesn't think.


I have the feeling that both, symbolic methods and Reinforcement Learning, will be play a major role in more intelligent AI systems.


I continue to find him a clown and blowhard.

They certainly have a world model. What that is and how it compares to ours is the interesting part.


I've listened to hours of Hinton and Yan LaCun on their tacit debate on this. And over and over I come away with Hinton explaining mathematically and conceptually whats going on in LLMs and why it's a useful simulation of intelligence and will continue to grow, also while the world backs him up.

While Yan LaCun's arguments are just 'it doesn't work that way' and 'thats not intelligence' and at point literally quoting stories of failed LLM tests that are patentently untrue today.

I dunno I find the whole thing REALLY wierd... beyond the explanation that LaCun still adheres to a really outdated form of saying we must teach AI logic explicitly for it to be intelligent.


Hinton is the guy saying LLMs are conscious, do you agree with that as well?

https://asia.nikkei.com/Business/Technology/Godfather-of-AI-...


I think when he uses the term he means a good simulation of what we would call consciousness -- so yes I agree. It can pay attention at many layers of abstraction and create from it.


He said: "I think multimodal chatbots are already having subjective experiences."

You changed this in your paraphrasing to something along the lines of "I think multimodal chatbots are already good simulations of subjective experiences".

I don't think the meaning is the same. A simulation of the weather wouldn't be expected to blow my house down. It isn't interchangable with the word weather.


I also don't disagree with that. Attention layers can alter vectors derived from learned/read/stored experiences either by others or itself, then alter its vectors to output new relative responses based on its unique and subjective experiences. I think it a consciousness that can be snapped in and out of existence -- but yes... in a way...so can humankind.


> And over and over I come away with Hinton explaining mathematically and conceptually whats going on in LLMs

Every time I see Hinton talking about LLMs he's just anthropomorphizing whatever 'mathematics' is going on there. He's a great researcher but tbh I think he's a really silly guy


I'd love to know more about what he says which is now untrue. Could you give specific examples of false claims? Thanks.


In his lex interview he repeatedly stated that GPT4 failed basic reasoning tests that if you sat down and ran, like the monte test with modified elements, it can pass just fine. It can reason on new applications of older observations.

This BASIC misunderstanding of GPT was repeated constantly to a degree that made me question if he even understood them at all.


Is this something that changed much between versions of GPT-4? Do you think Yan meant that GPT-4 failed with some special private test set in the same style (e.g: to avoid contaminated tests)? I couldn't identify a "monte test with modified elements", do you mean the "Monty Hall problem"? (Not sure that's a basic reasoning test, but it's the only thing I could think of). I didn't see that in the transcript either though: https://lexfridman.com/yann-lecun-3-transcript Or do you mean that by adding some Monte Carlo magic to the test's completion sampling that it can pass just fine?

For the Lex interview I only managed to catch the first parts about image processing, which did seem perhaps a bit dated. I mean to watch the podcast soon though.

Thanks for your thoughts, sorry I didn't get it yet!


Do you have any tips for where to find these discussions? I’d be interested to learn more


The two basic places to start are Yan LaCun on Lex Podcast and Hintons lecture at University of Toronto on YouTube -- both are easy to find.


thanks, I’ll check them out


Work on the next big thing is pretty useless advice though. If people knew what that was they wouldn’t be going for LLMs


"Don't work on LLM. This is in the hands of large companies, there's nothing you can bring to the table. You should work on next-gen AI systems that lift the limitations of LLMs."

Even if, or especially if the technology is in the hands of large companies, understanding/studying them is important, not futile.


The other comment stating "Yann LeCun is the Paul Krugman of AI" does resonate with me. There is a lot to be criticized about his takes on AI in general, and the need for a "worldview" in particular.

The longer version at: https://www.lokad.com/blog/2024/3/18/ai-interview-with-yann-...


Context, I shared this because it sounded consistent with what I said previously about potential areas to explore in AI in case we want to make new progress. [0]

[0]: https://news.ycombinator.com/item?id=40333962


Not quite sure the hate on this thread. He's right, theres not much you can contribute to LLMs. LLMs are unlike traditional software, they require large expensive machines and access to these is difficult to come by. Compare that with studying databases for example you can get quite far just with a laptop, the bottleneck is the knowledge.

Take for example even simple systems like wav2vec 2.0. The original model was trained on 128Gpus and if one were to try and reproduce the paper on normal hardware it will take months to get to a result. Not just individuals, these applications are put of the reach of all but the most well funded of companies.


Victory has many fathers. Defeat is an orphan.

I guess that explains the number of people being touted as "The Godfather" of AI.


what are some of the other directions then? any ideas?


There are a ton of different promising AI approaches explored by researchers. When I was at MIT in the early 2010s, when Deep Learning was just taking off, it was seen as one of a suite of new exciting techniques. For example, some of the grad students that taught the AI classes I took were hyped on an approach called Probabilistic Programming, which adds a complete suite of programming concepts (eg. If-statements) to Bayesian networks allowing you to write extremely concise and powerful programs that can learn based on data and handle uncertainty during an execution. Also while I was there Geoff Hinton gave a series of master lectures on the future of AI after Deep Learning and he talked a lot about an approach called Inverse Graphics: basically treating images as one output from a graphics rendering pipeline that includes a scene with geometry and lighting, projection transformations, etc and then trying to learn all the parameters of that pipeline from images so that you produce not just a classifier output but a whole scene description. Both really cool and exciting approaches that build on top of deep neural nets but aren’t bound to them.

One of the negative effects of the huge hype wave (hype tsunami is maybe more appropriate) around LLMs and genAI generally is that it starves these other approaches of resources (as well as discouraging people from exploring other new approaches). This is what LeCunn is responding to. I know some zealots believe that “bigger LLMs” is all we need for AI progress forever, but based on the entire history of the field, a number of technical issues with LLMs, and the nature of progress of LLMs in the last few years I would described this view as blinkered and risky at best. The field often advances fastest from the early years of new approaches rather than massive over-investment in a single approach based on some early promising results. Historically the later approach tends to lead to AI winters.


Also, FWIW, I saw a bunch of demos of “token sequence learning” that did a lot of the applications that people have been so excited about with LLMs: producing text descriptions of video and images, text summarization with question answering, etc. Those demos were a little janky and limited and obviously only at the academic paper with impressive video demo stage which is a far cry from fast and reliable enough to be useful in production. But they weren’t categorically different from what we’ve seen with transformers and LLMs. This is one of the reasons I’m more skeptical about claims that transformers + more data and compute is all we need for AGI. After a decade plus of not just MASSIVE compute and data scaling but some fairly clever new techniques I would describe progress as incremental rather than transformational beyond those older results. Honestly, people have forgotten this now, but the biggest change that ignite the LLM hype was the UX decision to present interactions with these models in the framework of a conversation with an agent. This is a trick that goes back at least as far as Eliza and it’s effect is mainly in how it primes the user to think about and relate to the tech. That is also an area where more work can be done (conversational interfaces are not the One Solution to all computing). I recommend googling Interactive Machine Learning, which is its own sub-discipline that specifically studies this problem of how to build UX that is native to, and takes best advantage of, ML/AI techniques to produce software that people can use to accomplish real tasks.


Try to study the models cognitive scientists & linguists have developed.

And replicate them in computers.


If all I want is toast, do I need to learn how a toaster works or simply get good at making toasted bread?


You certainly dont need to study LLMs for it.


This is on brand for him, he's been voicing his skepticism of LLMs for a long while now.


Yann LeCun is the Paul Krugman of AI


If only there was some kind of algorithm telling us how to deal with this exploration-exploitation dilemma...


> This is in the hands of large companies, there's nothing you can bring to the table. You should work on next-gen AI systems that lift the limitations of LLMs.

Translation: "Buy more API tokens!"


I believe this is quite correct. You need the whole WWW internet for training and GPU farms to train this.


LaCun works for Meta who gives away models for free


They do not. A model is the training set plus feature engineering scripts, not the resulting weights.


Whatever terminology you use, it can be run without API tokens.


If you're interested in building the next generation of that model, sure. If you just want to use it the weights are sufficient.


I quibble with the terminology. Meta doesn't "give away" their model. They only let you run it for free. (They don't "give away" Facebook either, they only let you use it for free.)


I genuinely would like to know what your full definition of a model is? There seems be to so much confusion...


"An abstract description of a concrete system using mathematical concepts and language", at least according to Wikipedia.


Or download the weights and run it yourself?


He also lies and gives wrong information on twitter about other companies. So not a honest or good person. No point listening to whatever he say.


I don't know what you are referring to, but isn't he right about LLMs?


https://x.com/ylecun/status/1742563244949078369

This one. He deliberately writes wrong information. Even he is right in the future, 0 reason to follow his word.


can't access that message. please quote it.


"A number of reasons: 1. Google had 600 employees and no revenue at the time (this was January 2002, before ads, gmail, etc). You can't really do real research at that stage. My job would have involved a lot of corporate strategy, technology development for products, management, etc. I wanted to refocus on basic research in ML, vision, robotics, and computational neuroscience. 2. The salary was low. Obviously, the stock option package would have ended up stratospheric. But we had teenage sons getting close to college and needed cash. Housing is more expensive in Silicon Valley than in New Jersey. 3. My family didn't want to move to California. You can't uproot teenagers without them hating you for it. 4. I had just left AT&T and joined the NEC Research Institute in Princeton. I thought I could work on ML/vision/robotics/neuroscience there. It turns out the place was quickly disintegrating into an applied research lab and I left for NYU after 18 months.

Had I joined, I think the research culture at Google would have been different. I might have made it a bit more open and a bit more ambitious a bit earlier.

"


I frankly don't even know what Twitter is for anymore, if not abusing the augur of public consciousness.


Machine learning as a field is dead for anyone thats not truly in the academic elite. All models will be generalized/dwarfed by the performance of foundational models, which will be controlled by a small cabal of big tech companies.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: