Fun fact (probably well-known fact for HN audience): Both Lua and Elixir were created by Brazilians. Lua by Roberto Ierusalimschy and team at PUC-Rio in 1993, and Elixir by José Valim in 2011
This BLT approach is why "AI research is stalling" takes are wrong. Dynamic byte-level patches instead of tokens seems genuinely innovative, not just scaling up the same architecture. Better efficiency AND handling edge cases better? Actual progress. The field is still finding clever ways to rethink fundamentals.
This paper is very cool, comes from respected authors, and is a very nice idea with good experiments (flop controlled for compute). It shouldn't be seen as a wall-breaking innovation though. From the paper:
> Existing transformer libraries and codebases are designed to be highly efficient for tokenizer-based transformer architectures. While we present theoretical flop matched experiments and also use certain efficient implementations (such as FlexAttention) to handle layers that deviate from the vanilla transformer architecture, our implementations may yet not be at parity with tokenizer-based models in terms of wall-clock time and may benefit from further optimizations.
And unfortunately wall-clock deficiencies mean that any quality improvement needs to overcome that additional scaling barrier before any big runs (meaning expensive) can risk using it.
Absolutely, I have seen so many good ideas that have not yet made it into notable trained models.
A lot of that is because you need to have a lot more faith than "seems like a good idea" before you spend a few million in training that depends upon it.
Some of it is because when the models released now began training, a lot of those ideas hasn't been published yet.
Time will resolve most of that, cheaper and more performant hardware will allow a lot of those ideas to be tested without the massive commitment required to build the leading edge models.
The big guys are almost certainly incinerating millions a day on training "maybe it could show some promise" techniques. With the way things are right now, they are probably green lighting everything to find an edge.
I don't think you're understanding what the "stall" arguments are saying.
Certainly tweaks to performance continue but as understand it, the stalling argument looks at the tendency of broad, "subjective" llm performance to not get beyond a certain level. Basically, that the massive projects to throw more data and training at the thing results in more marginal apparent improvements than the jump(s) we say with GPT 2-3-3.5-4.
The situation imo is that some point once you've ingested and trained on all the world's digitized books, all the coherent parts of the Internet, etc., you a limit to what you get with just "predict next" training. More information after this is more of the same on a higher level.
But again, no doubt, progress on the level of algorithms will continue (Deep Seek was indication of what's possible). But the situation is such progress essentially allows adequate LLMs faster rather than any progress towards "general intelligence".
I think the sentiment (at least my sentiment) is that "mainstream ML" has fallen into the transformer local minimum, and given the weight of the players in that space it will take a huge amount of force to move them out of it.
The likes of this, Mercury Coder, and even RKWV are definitely hopeful - but there's a pitch black shadow of hype and speculation to outshine.
I disagree. Most AI innovation today is around things like agents, integrations, and building out use cases. This is possible because transformers have made human-like AI possible for the first-time in the history of humanity. These use-cases will remain the same even if the underlying architecture changes. The number of people working on new architectures today is way more than were working on neural networks in 2017 when 'attention is all you need' came out. Nevertheless, actual ML model researchers are only a small portion of the total ML/AI community, and this is fine.
If you consider most of the dominate architectures in deeplearning type approaches, transformers are remarkably generic. If you reduce transformer like architectures to "position independent iterated self attention with intermediate transformations", they can support ~all modalities and incorporate other representations (e.g. convolutions, CLIP style embeddings, graphs or sequences encoded with additional position embeddings). On top of that, they're very compute friendly.
Two of the largest weaknesses seem to be auto-regressive sampling (not unique to the base architecture) and expensive self attention over very long contexts (whether sequence shaped or generic graph shaped). Many researchers are focusing efforts there!
Transformers are very close to some types of feed forward networks. The difference is that transformers can be trained in parallel without the need for auto-regression (which is slow, for training, but kind of nice for streaming , low-latency inference). It's a mathematical trick. RWKV makes it obvious.
It's true, but you can't deny the importance of the architecture. It's pretty clear that using simple perceptrons would not have led us down the same path.
Sure, but I think a reasonable corollary is that new algorithms and architectures will show their strengths when new realms of computation become available.
I've secretly wondered if the next (ahem) quantum leap in output quality will arrive with quantum computing wherein answering 10,000 if statements simultaneously would radically change the inference pipeline
But I am also open to the fact that I may be thinking of this in terms of 'faster horses' and not the right question
It's not clear how your perception of quantum computing would lead to 'faster horses' in the current view of NN architectures - keep mind that the common view of 'exploring many paths simultaneously' is at best an oversimplification (https://scottaaronson.blog/?p=2026).
That said, perhaps advances in computing fundamentals would lead to something entirely new (and not at all horselike).
If you can tie in a loss function for a neural network into the quantum excitement state of a quantum system, then presumably, letting the system settle at the energy minimum would be equivalent to a training step, but perhaps much faster.
There is more fundamental ML research today than at any other point in history, including in non-transformer architectures. That is my point. It doesn't seem that way because 90%+ of 'ML research' has nothing to do with fundamental ML and is instead research around applications, which are indifferent to the underlying model at the end of the day. That was the point of my comment.
> are now done in hours to a higher level of quality
However, I feel that there is a big difference between the models. In my tests, using Cursor, Clause 3.7 Sonnet has a much more refined "aesthetic sense" than other models. Many times I ask "make it more beautiful" and it manages to improve, where other models just can't understand it.
The LoveSims research brilliantly mirrors "Hang the DJ" [1] from Black Mirror (comparison inevitable). Technically impressive, just a bit creepy to reduce love's beautiful unpredictability to computational models. Or that's just my romantic view of the world.
A moving story, but also very interesting in many ways. It makes me think about Edgar Morin and his thesis on complexity, how current science has become so specialized that it has great difficulty in looking at the whole. In this case, these are very similar problems, but still. Of course, it is also worth mentioning that the potential of AI for research is enormous.
Agent-based modeling offers a more realistic approach to economic systems than traditional equilibrium models. New approachs including generative agents (ABM+LLMs) are promising. J. Doyne Farmer's recent book "Making Sense of Chaos: A Better Economics for a Better World" is a great reading for those interested in this field.
No. There is no macroeconomist who wouldn’t adopt these approaches if they were “better”.
Agent based models have been around since the 1980’s at least. No one uses them in central banks, no one uses them in industry, and you can be very confident that they’ve tried.
Neural network based computer vision models also existed for decades. They weren't very good and weren't really used, till early 2010s when people figured out how to make them work. Now they are vastly superior to all the other ways.
This is quite common across the sciences. Some technique doesn't seem to work because of missing crucial insights or technology. Then somebody fills the gaps, and the technique works.
These types of models in economics might or might not become viable at some point.
"The models might become viable at some point" is very different from the original claim at the top level of this comment chain, though, which made a positive assertion that agent-based models are better. The parent comment to yours was right to call out that claim, and it offered a reasonable basis for that counterargument at the same time.
"realistic" is an interesting word, because it can have different connotations.
We can write a very accurate quantum mechanical model for the oxygen atom. But you can't actually simulate it without a galaxy sized classical computer. But it is very realistic=accurate model.
Or you can write and easily simulate a non-realistic semiclassical model. This one is a realistic=efficiently simulatable model.
Obviously a fully agent made model is more realistic in the "accurate" sense, because it correctly models the underlying reality. But if you make realistic agents, you have something inefficient (non realistic in the "simulatable" sense).
There are plenty of people working on the application of ML in economics, including NN methods, and including in macro.
That’s a totally different class of model to this agent based approach. People in the (small) agent based modeling community have been pushing their stuff for decades to no effect.
Sure it’s possible that there’s some amazing advance I can’t see coming in the future but as of now I would not recommend anyone pursue ABM.
You're taking this from the perspective of people who are stuck in a specific mental framework who want to prove that their mental framework is the right one, no matter how impossible it is in practice.
What if you don't care about tuning against a real macro-economy? What if the economy being fictional was the entire point?
Let's suppose you wanted to make a game that simulates a realistic economy as a gameplay element no different from say a physics engine. Why wouldn't you do it using agent based modeling? What you're saying sounds purely dogmatic now. It's more about thought termination than actually accomplishing something. After all, central banks and businesses don't give a damn that agent X did action Y at time Z for all agents, actions and times. Meanwhile in a game? It's actually essential, because the model is the reality inside that fictional world. The model is "perfect".
Its not like the classical models are any good in terms for performance in real situations, as they have proven to be unable to predict anything over the past 40 years better than a basic linear extrapolation from the trend would have.
Current macro-economics models are arguably not much better than a broken clock in terms of predicting power.
Right, Agent-based models are only useful as “exploration” tool, you cannot really use them for forecast because there's an impractically high number of parameter to tune.
Micro-founded macro economics models (say DSGE) are much easier to tune based on available historical data so they are much preferred, and nobody seams to care that they have the same predicting power as astrology.
Thanks for the link. If anyone has additional links, including scholarly references, this is an area of huge interest to me but I wouldn't know where to turn to for the latest research and models.
Demis Hassabis seems to describe a process whereby an AI can accelerate the results of billions of simulations by efficiently encoding that predictive behavior.. making something that is computationally expensive to simulate perhaps several orders of magnitude more tractable.
This has proven out in the acceleration of actual weather prediction using AI which means it can be feasibly run on a single desktop machine.
I think its not a stretch to imagine that a) there is a way to simulate the whole economy at the same level of quality as a weather or climate simulation b) AI can accelerate the computations to the point they can run on accessible hardware.
We need this whole economy simulation ... to answer practical questions such as - if we dole out UBI to everyone to cover basic living costs, will that simply result in the cost of rent going up to absorb the whole amount ?
> We need this whole economy simulation ... to answer practical questions such as - if we dole out UBI to everyone to cover basic living costs, will that
"How will humans (in aggregate) behave under novel conditions?"
Models tend to behave poorly when asked about things outside their training distribution.
> if we dole out UBI to everyone to cover basic living costs, will that simply result in the cost of rent going up to absorb the whole amount?
For that to be true, rent-seeking would have to literally capture all surplus. In which case UBI wouldn’t be an option in the first place.
The marginal increase in the purchasing power of someone who went from having $0 to $n would always be greater than the increase in purchasing power for someone who went from $1 billion to $1 billion + n - even with inflation.
Agent-based models capture the messy reality of economic systems by simulating heterogeneous actors making local decisions with imperfect information, allowing for emergent phenomena, non-linear dynamics, and adaptation over time, precisely the features that equilibrium models abstract away through unrealistic assumptions of perfect rationality, homogeneity, and static optimization that fail to predict or explain crises, bubbles, and technological disruptions that define actual economic evolution. To be honest, ABMs aren't perfect either. They face challenges with calibration, validation against empirical data, computational limitations, etc.
"Gemma Scope is a research tool for analyzing and understanding the inner workings of the Gemma 2 generative AI models. The tool allows you to examine the behavior of individual AI model layers of Gemma 2 models, while the model is processing requests. Researchers can apply this technique to examine and help address critical concerns such as hallucinations, biases, and manipulation, ultimately leading to safer and more trustworthy AI systems." [1]
Would it be a stretch to say that this type of output is the "abstraction" mode of a model? In other words, linking the semantics of a text or word to more abstract concepts (e.g.: cat -> animal, beans -> food).
For example, the capacity for abstraction is fundamental for scientific and creative development.
reply