I think a lot of people are really bad at evaluating world models. Feifei is right here that they are multimodal but really they must codify a physics. I don't mean "physics" but "a physics". I also think it's naïve to think this can be done through data alone. I mean just ask a physicist...[0].
But why people are really bad at evaluating them is because the details dominate. What matters here is consistency. We need invariance to some things and equivariance to others. As evaluators we tend to be hopeful so the subtle changes frame to frame are overlooked though thats kinda the most important part. It can't just be similar to the last frame, but needs be exactly the same. You need equivariance to translation, yet that's still not happening in any of these models (and it's not a limitation of attention or transformers). You're just going to have a really hard time getting all this data even though by doing that you'll look like you're progressing because you're better fitting it. But in the end the models will need to create some compact formulation representing concepts such as motion. Or in other words, a physics. And it's not like physicists aren't know for being detail oriented and nitpicky over nuances. That is breed into then with good reason
The YouTube video tells a fascinating story. Who would be our Fermi today who can tell the truth and save five years of work, billions of dollars and careers of Ph.D. students?
We wouldn’t expect LLM to review a paper and tell us the truth like Fermi did. That is super-intelligence.
My problem with the current environment is that we are too rushed. I think in today's culture no one would have told Dyson to not continue. Or just not care. You got to publish, or perish.
Hard things take time and deep thought. But in our age we seem to not want to think deep. The environment discourages it because it takes longer. It's incredibly difficult to have both speed and quality. They are always in contention.
Mind you, there are a number of Nobel laureates who have claimed they wouldn't have succeeded in today's environment because of this[0]. I'm confident we're so concerned with finding the best that we hinder our ability to.
I was trained as a Physicist but went to software engineering. These days I would describe my job as a digital plumber. There are two kinds of holes we are dealing with: rabbit holes and potholes. We tend to get into the rabbit holes because of the love of tools and fall into potholes because of not paying attention.
It turned out the industry might do the same. But the cost would be billions of dollars and decades of work of young talented people.
I was originally a physicist too and then went into ML. I absolutely love research, yet this is my biggest gripe with the community. Everything is so product focused that I don't think we're even capable of making good products. Those small problems compound and add up to big problems.
The thing I don't understand is that we're really good at recognizing that big problems can be broken down into small ones and that's how we get them solved. So why is it so difficult to understand that small problems compound to create big ones? It's just going in the other direction, where you didn't explicitly do the breakdown. We're just too quick to dismiss rather than asking "does this lead to a big problem?" It is no wonder so much software seems so broken these days.
Given your comment I get the sentiment that you're also frustrated by the environment. I would also add that every pothole has a rabbit hole underneath. We can go around patching everything but we should be asking what causes the potholes to be created in the first place. If we can prevent them from happening to begin with then that's a far better solution than any amount of patching. I'm absolutely certain it will also be far cheaper. Though I'm also certain you'll be able to patch quite a few potholes before you're able to formulate the new formula of asphalt and start pouring it down. Though nothing prevents you from doing both....
I like your metaphor and will look into asphalt, from physics to chemistry…
This also reminds me of another famous metaphor: “Attention Is All You Need”. The question is: are we paying attention to the wrong things as a society? Could it be that we are pouring all the resources into to getting the perfect solution to the wrong problem?
Attention is the most valuable resource we have, as individuals and society.
I think something to consider is that you can't have a paradigm shift by following the current paradigm. Over and over we have stories of scientists, inventors, leaders, and so on that went against the grain and that their resilience is what led to success and massive change. How many examples do we need to account for this phenomena?
I'm not saying we should just have a free for all but that the frequency of so called "dark horses" should make us really question if we're doing things the right way. Sure, maybe it's the right way 99% of the time, but if that 1% is extremely impactful then we shouldn't discount it. And in certain areas, like academia, I think this should probably be actively encouraged rather than discouraged. It's a matter of what our objectives are. Is it to produce impact or is it simply to produce?
Another important factor, and I believe this ultimately leads to a "Great Filter" (longer discussion), is that as we advance complexity grows faster. As just a naïve example we can look at a Taylor approximation of some given function. The trend will be that low order approximations are simple to calculate while computational complexity explodes as we move to higher order ones. It's not difficult to see how this is happening in our world. The first airplane was much easier to improve upon than any modern one. Just like the first computers were easier to improve than current ones. We could say the same about almost everything, from physics to chemistry. The barrier to entry for contributing is growing, even if some low hanging fruit exists there's certainly fewer of them than before.
So I agree. Attention, and specifically attention to detail, are a critical resource. But it's becoming more critical. The problem I see is we're actively discouraging this attention and such attention is difficult in the first place. Low order approximations don't help us is a world where higher order effects dominate.
I really appreciate your wisdom hidden in this long thread embedded in a promotional post.
A lot of things changed between your last reply. I left the enterprise potholes and rabbit holes. (Ironically I got that job via a HN hire thread three years ago. I was very lucky to have a manager who would hire someone like me.)
Now I am relieved to explore what I see on the horizon. The first thing I did was going to the Apple Store to buy a latest M5 MacBook Pro with the highest 24G memory and get MLX running.
It was surprisingly good.
Now I am watching all the vibe coding videos by Andrej Karpathy and see what the trendy kids are doing.
Then I will sit down and walk through all the NN and LLM papers, dive deep to MLX, and imagine what Alan Turing would do if he had my M5.
But why people are really bad at evaluating them is because the details dominate. What matters here is consistency. We need invariance to some things and equivariance to others. As evaluators we tend to be hopeful so the subtle changes frame to frame are overlooked though thats kinda the most important part. It can't just be similar to the last frame, but needs be exactly the same. You need equivariance to translation, yet that's still not happening in any of these models (and it's not a limitation of attention or transformers). You're just going to have a really hard time getting all this data even though by doing that you'll look like you're progressing because you're better fitting it. But in the end the models will need to create some compact formulation representing concepts such as motion. Or in other words, a physics. And it's not like physicists aren't know for being detail oriented and nitpicky over nuances. That is breed into then with good reason
[0] https://m.youtube.com/watch?v=hV41QEKiMlM