I'll be honest, why even fucking with any other kind of geo-engineering other than high altitude sulfur dioxide injection. We literally have seen a big jump in warming from problably removing the sulfuric byproducts of cargoship fuel. At its heart global warming is an issue of energy in vs energy out. Its a lot harder to remove billions of tons of co2 to increase energy out versus using a couple thousand tons of sulphur dioxide to reduce energy in. Maybe not as a permanent fix, but a better fix than this nonesense.
Because (1) sulfur-dioxide injection is a short term solution that doesn’t solve the problem long term, (2) there is a huge risk of termination shock if civilization ever stops injecting it, (3) the only long term route to a stable client is to stop emitting and to remove the CO2 excess we’re adding, (4) all of this assumes that solar radiation management doesn’t have terrible unexpected effects on the climate.
Why is the goal to get something "stable" in the first place?
Climate on earth never was and never will be stable, so this should not be our goal in the first place.
The goal should be to keep the change fairly slow because most living things have trouble with fast changes. That's it we don't need more than that.
"Fairly slow" on an evolutionary timescale and "stable" across human timescales are functionally the exact same thing.
The difference between the two is negligible compared to the difference between either of them and what we currently have, which is "unprecedented" on human timescales and euphemistically "radical" on evolutionary ones.
You might as well say "look I just don't understand why people say we need to stop the car, obviously slowing down to walking speed would be enough" while the car continues to accelerate at full throttle towards a cliff edge.
But no one knows what "fairly slow" is.
Also the climate collapse theories predict that once a certain tipping point is reached, its game over. If that is true then slow and stable increase still gets us to that point just a bit later.
In other words to make climate change stable do we need to reduce CO2 emissions, completely stop CO2 emission or remove CO2 and reduce the CO2 in the atmosphere?
We don't even know which of these 3 options would lead to the "fairly slow/stable" we want.
It seems like we just do all 3 with no evidence of any real world effect whatsoever.
There's a lot of really interesting work in neuroevolution that has the potential to make some really interesting unsupervised training regimes. I think theres some really interesting possibilities for unique encoding schemes like ACE encoding to speed up training and provide much smarter behavior out the other end. Especially, if "genes" can form reusable elements of neural topology that makes scaling networks faster. Reusing components all over body is how we fit such complexity in the relatively little unique DNA we have. The other interesting thing about using genetic algorithms for a portion of training/network mapping is that allows you to have heterogenous networks, so you can have simulations or representations of astrocyte/glial behaivor easily get integrated with neural networks. With traditional training methods it's a massive fucking pain to train a non-feed forward network.
I do think that languages like Elixir and other cpu concurrent strong tools can really be leveraged to make some dynamite libraries.
Seconding Dyson Sphere Program its great, though it can be a bit of a tale of 2 games. Pre-ILs and Post-ILs which can get a bit samey since most of your production stacks become relatively samey. You can turn off the combat enemies, though honestly I'm enjoying my play through with them because they force a lot more consideration in not only how you expanded but why and when you expand.
We can not even get close to saying our current networks can be even close to synapses in performance or functional because architecturally we still use feedforward networks no recursion, no timing elements, very static connections. Transitors will definitely have some advantages in terms of being able to synchronize information and steps to an infinitely better degree than biological neurons, but as long as we stick with transformers it's the equivalent of trying to get to space by stacking sand, could you get there eventually? Yes, but there's better ways.
This is amazing work, but to me it highlights some of the biggest problems in the current AI zeitgeist, we are not really trying to work on any neuron or ruleset that isnt much different from the perceptron thats just a sumnation function. Is it really that suprising that we just see this same structure repeated in the models. Just because feedforward topologies with single neuron steps are the easiest to train and run on graphics cards does that really make them the actual best at accomplishing tasks? We have all sorts of unique training methods and encoding schemes that don't ever get used because the big libraries don't support them. Until, we start seeing real varation in the fundamental rulesets of neuralnets we are always just going to be fighting against the fact these are just perceptrons with extra steps.
> Just because feedforward topologies with single neuron steps are the easiest to train and run on graphics cards does that really make them the actual best at accomplishing tasks?
You are ignoring a mountain of papers trying all conceivable approaches to create models. It is evolution by selection, in the end transformers won.
Just because papers are getting published doesn't mean its actually gaining any traction. I mean we have known that time series of signals recieves plays a huge role in how bio neurons functionally operate and yet we have nearly no examples of spiking networks being pushed beyond basic academic exploration. We have known glial cells play a critical role in biological neural and yet you can probably count the number of papers that examine using an abstraction of that activity in neural net, on both your hands and toes. Neuroevolution using genetic algorithms has been basically looking for a big break since NEAT. Its the height of hubris to say that we have peaked with transformers when the entire field is based on not getting trapped in local maxima's. Sorry to be snippy, but there is so much uncovered ground its not even funny.
"We" are not forbidding you to open a computer, start experimenting and publishing some new method. If you're so convinced that "we" are stuck in a local maxima, you can do some of the work you are advocating instead of asking other to do it for you.
You can think chemotherapy is a local maxima for cancer treatment and hope medical research seeks out other options without having the resources to do it yourself. Not all of us have access to the tools and resources to start experimenting as casually as we wish we could.
As I understand it a local maxima means you’re at a local peak but there may be higher maximums elsewhere. As I read it, transformers are a local maximum in the sense of outperforming all other ML techniques as the AI technique that gets the closest to human intelligence.
Can you help my little brain understand the problem by elaborating?
Also you may want to chill with the personal attacks.
Not a personal attack. These posters are smarter than I am, just ribbing them about misusing the terminology.
"Maxima" is plural, "maximum" is singular. So you would say "a local maximum," or "several local maxima." Not "a local maxima" or, the one that really got me, "getting trapped in local maxima's."
While "local maximas" is wrong, I think "a local maxima" is a valid way to say "a member of the set of local maxima" regardless of the number of elements in the set. It could even be a singleton.
No, a member of the set of local maxima is a a local maximum, just like a member of the set of people is a person, because it is a definite singular.
The plural is also used for indefinite number, so “the set of local maxima” remains correct even if the set has cardinality 1, but a member of the set has definite singular number irrespective of the cardinality of the set.
You can't have one local maxima, it would be the global maxima. So by saying local maxima you're assuming the local is just a piece of a larger whole, even if that global state is otherwise undefined.
No, you can’t have one local maxima, or one global maxima, because it’s plural. You can have one local or global maximum, or two (or more) local or global maxima.
MNIST and other small and easy to train against datasets are widely available. You can try out anything you like even with a cheap laptop these days thanks to a few decades of Moore's law.
It is definitely NOT out of your reach to try any ideas you have. Kaggle and other sites exist to make it easy.
My pet project has been trying to use elixir with NEAT or HyperNEAT to try and make a spiking network, then when thats working decently drop some glial interactions I saw in a paper. It would be kinda bad at purely functional stuff, but idk seems fun. The biggest problems are time and having to do a lot of both the evolutionary stuff and the network stuff. But yeah the ubiquity of free datasets does make it easy to train.
Not to mention not everyone can be devoted to doing cancer research. Some Drs. and Nurses are necessary to you know actually treat the people who have cancer.
Are we sure there’s anything “net new” to find within the same old x86 machines, within the same old axiomatic systems of the past?
Math is a few operations applied to carving up stuff and we believe we can do that infinitely in theory. So “all math that abides our axiomatic underpinnings” is valid regardless if we “prove it” or not.
Physical space we can exist in, a middle ground of reality we evolved just so to exist in, seems to be finite; I can’t just up and move to Titan or Mars. So our computers are coupled to the same constraints of observation and understanding as us.
What about daily life will be upended reconfirming decades old experiment? How is this not living in sunk cost fallacy?
Einstein didn't say that about insanity, but... systems exist and are consistently described by particular equations at particular scales. Sure we can say everything is quantum mechanics, even classical physics can technically be translated as a series of wave functions that explain the same behaviors we observe, if we could measure it... But it's impractical, and some of the concepts we think of as fundamental to certain scales, like nucleons, didn't exist at others, like equations that describe the energy of empty space. So, it's maybe not quite a fallacy to point out that not every concept we find to be useful, like deep learning inference, encapsulate every rule at every scale that we know about down to the electrons, cogently. Because none of our theories do that, and even if they did, we couldn't measure or process all the things needed to check and see if we're even right. So we use models that differ from each other, but that emerge from each other, but only when we cross certain scale thresholds.
If you abstract far enough then yes, everything what we are doing is somehow akin to what we have done before. But that then also applies to what Einstein has done.
Do you really think that transformers came to us from God? They're built on the corpses of millions of models that never went anywhere. I spent an entire year trying to scale up a stupid RNN back in 2014. Never went anywhere, because it didn't work. I am sure we are stuck in a local minima now - but it's able to solve problems that were previously impossible. So we will use it until we are impossibly stuck again. Currently, however, we have barely begun to scratch the surface of what's possible with these models.
Who said that we peaked with transformers? I sure hope we did not. The current focus on them is just institutional inertia. Worst case another AI winter comes, at the end of which a newer, more promising technology would manage to attract funding anew.
His point is that "evolution by selection" also includes that transformers are easy to implement with modern linear algebra libraries and cheap to scale on current silicon, both of which are engineering details with no direct relationship to their innate efficacy at learning (though indirectly it means you scale up the training data for more inefficient learning).
I think it is correct to include practical implementation costs in the selection.
Theoretical efficacy doesn’t guarantee real world efficacy.
I accept that this is self reinforcing but I favor real gains today over potentially larger gains in a potentially achievable future.
I also think we are learning practical lessons on the periphery of any application of AI that will apply if a mold-breaking solution becomes compelling.
They barely work for a lot of cases (i.e., anything where accuracy matters, despite the bubble's wishful thinking). It's likely that something will sunset them in the next few years.
I both think this is a really astute and important observation and also think it's an observation that's more true locally than of people broadly. Modern neoliberal business culture generally and the consolidated current incarnation of the tech industry in particular have strong "tunnel vision" and belief in chasing optimality compared to many other cultures, both extant and past
In neoclassical economics, there are no local maxima, because it would make the math intractable and expose how much of a load of bullshit most of it is.
It seems cloyingly performative grumpy old man once you're at "it barely works and it's a bubble and blah blah" in response to a discussion about their comparative advantage (yeah, they won, and absolutely convincingly so)
I’d say it’s more that transformers are in the lead at the moment, for general applications. There’s no rigorous reason afaik that it should stay that way.
It doesn't seem promising, a one man band has been doing a quixotic quest based on intuition and it's gotten ~nowhere, and it's not for lack of interest in alternatives. There's never been a better time to have a different approach - is your metric "times I've seen it on HN with a convincing argument for it being promising?" -- I'm not embarrassed to admit that is/was mine, but alternatively, you're aware of recent breakthroughs I haven't seen.
RWKV has shown that you can scale RNNs to large parameter counts.
The fact that one person (initially) was able to do it highlights how much low hanging fruit there is for non transformers.
Also, the fact that a small number of people designed, trained, and published 5 versions of a perfectly serviceable (as in has decent summarizing ability. The biggest LLM use case) model which doesn’t have the time complexity of transformers is a big deal.
> the perceptron thats just a sumnation[sic] function
What would you suggest?
My understanding of part of the whole NP-Complete thing is that any algorithm in the complexity class can be reduced to, among other things, a 'summation function'.
If you had to reduce it to one thing, it's probably that language models are capable few shot and zero shot learners. In other words, training a model to simply predict the next word on naturally occurring text, you end up with an tool you can use for generic tasks, roughly speaking.
I don't understand enough about the subject to say, but to me it seemed like yes, other models have better metrics with equal model size i.t.o. number of neurons or asymptotic runtime, but the most important metric will always be accuracy/precision/etc for money spent... or in other words, if GPT requires 10x number of neurons to reach the same performance, but buying compute & memory for these neuros is cheaper, then GPT is a better means to an end.
I'm fascinated by Elixir, but more so for using to do neural network experiments with more unique topologies, training methods, and activation functions. Being able to spin every neuron out into a process just seems too damn elegant for doing work with spiking neural networks. Will it be as fast as the python stacks? Probably not, but not being constrained to the implicit assumptions in a lot of those libraries will be worth a lot. Plus, there's already good neural network and genetic algorithm libraries for Elixir. I really think this langauge is gonna go places.
reply