Lol basically we're saying AI isn't AI if we utilize the strength of computers (being able to compute). There's no reason why AGI should have to be as "sample efficient" as humans if it can achieve the same result in less time.
Let's say an agent needs to do 10 brain surgeries on a human to remove a tumor and a human doctor can do it in a single surgery.
I would prefer the human.
"steps" are important to optimize if they have negative externalities.
I think your logic isn't sound: Wouldn't we want a "intelligence" to solve problems efficiently rather than brute force a million monkies? There's defnitely a limit to compute, the same ways there's a limit to how much oil we can use, etc.
In theory, sure, if I can throw a million monkies and ramble into a problem solution, it doesnt matter how I got there. In practice though, every attempt has a direct and indirect impact on the externalities. You can argue those externalities are minor, but the largesse of money going to data centers suggests otherwise.
Lastly, humans use way less energy to solve these in fewer steps, so of course it matter when you throw Killowatts at something that takes milliwatts to solve.
> Lastly, humans use way less energy to solve these in fewer steps,
Not if you count all the energy that was necessary to feed, shelter and keep the the human at his preferred temperature so that he can sit in front of a computer and solve the problem.
It's kind of the point? To test AI where it's weak instead of where it's strong.
"Sample efficient rule inference where AI gets to control the sampling" seems like a good capability to have. Would be useful for science, for example. I'm more concerned by its overreliance on humanlike spatial priors, really.
ARC has always had that problem but for this round, the score is just too convoluted to be meaningful. I want to know how well the models can solve the problem. I may want to know how 'efficient' they are, but really I don't care if they're solving it in reasonable clock time and/or cost. I certainly do not want them jumbled into one messy convoluted score.
'Reasoning steps' here is just arbitrary and meaningless. Not only is there no utility to it unlike the above 2 but it's just incredibly silly to me to think we should be directly comparing something like that with entities operating in wildly different substrates.
If I can't look at the score and immediately get a good idea of where things stand, then throw it way. 5% here could mean anything from 'solving only a tiny fraction of problems' to "solving everything correctly but with more 'reasoning steps' than the best human scores." Literally wildly different implications. What use is a score like that ?
The measurement metric is in-game steps. Unlimited reasoning between steps is fine.
This makes sense to me. Most actions have some cost associated, and as another poster stated it's not interesting to let models brute-force a solution with millions of steps.
Same thing in this case. No Utility and just as arbitrary. None of the issues with the score change.
Models do not brute force solutions in that manner. If they did, we'd wait the lifetimes of several universes before we could expect a significant result.
Regardless, since there's a x5 step cuttof, 'brute forcing with millions of steps' was never on the table.
Cost has utility in the real world and this doesn't. That's the only reason i would tolerate thinking about cost, and even then, i would never bundle it into the same score as the intelligence, because that's just silly.
It's an interesting point but I too find it questionable. Humans operate differently than machines. We don't design CPU benchmarks around how humans would approach a given computation. It's not entirely obvious why we would do it here (but it might still be a good idea, I am curious).
HN in general always does this. I got a lot of push back when I said that in general consumers don't care at all about open source, and the majority of them probably have no clue what it even means.
You can really sense the SF-centric bubble HN lives in.
Open source is a supply chain specific issue and consumers don’t care about supply chain.
Anyone with any illusions about this name quickly the top vendor for the third item in the materials itinerary of the first thing with a materials itinerary you get your hands on (for me it’s usually food. Who is the main vendor for citric acid? Or sugar. Or that red dye that causes adhd. I have no clue)
General consumers could not care less about open source.
Like, the whole point of open source is that this thread is not a thing. The whole point is "if this software is taken on by a malevolent dictator for life, we'll just fork it and keep going with our own thing." Or like if I'm evaluating whether to open-source stuff at a startup, the question is "if this startup fails to get funding and we have to close up shop, do I want the team to still have access to these tools at my next gig?" -- there are other reasons it might be in the company's interests, like getting free feature development or hiring better devs, but that's the main reason it'd be in the employees' best interests to want to contribute to an open-source legacy rather than keep everything proprietary.
The leadership and product direction work are at least as hard as the code work. Astral/uv has absolutely proven this, otherwise Python wouldn't be a boneyard for build tools.
Projects - including forks - fail all the time because the leadership/product direction on a project goes missing despite the tech still being viable, which is why people are concerned about these people being locked up inside OpenAI. Successfully forking is much easier said than done.
I had a lot of trouble convincing people that a correct Python package manager was even possible. uv proved it was possible and won people over with speed.
I had a sketched out design for a correct package manager in 2018 but when I talked to people about it I couldn't get any interest in it. I think the brilliant idea that uv had that I missed was that it can't be written in Python because if is written in Python developers are going to corrupt its environment sooner or later and you lose your correctness.
I think that now that people are used to uv it won't be that hard to develop a competitor and get people to switch.
Off of taxpayer money sadly. Imo we really need a fix for this. When cops are grossly negligent the money should come out of their aggregate pension fund (or at least partially).
> we really need a fix for this. When cops are grossly negligent the money should come out of their aggregate pension fund
This is on us as voters. If we didn’t piss our pants every time a police union sneezed, we’d realize wholesale restarting police departments is precedents in even our largest cities.
Yes, this is the key point. Tax payers get a nice big bill while the people who caused the problem get a nice paid vacation while they conduct an internal "investigation" that typically finds they did nothing wrong.
Yeah, of course they need to held accountable, and we need to vote in people who will do so. What I'm suggesting is an alignment of incentives that will ensure that police will try to do their best to not be negligent.
Of course there's a balance that has to be struck so that police are empowered enough to act. So perhaps something like settlements against the police being 30% borne by the police pension fund and 70% by taxpayers is sufficient. I think this will also make police very enthusiastic about bodycams and holding each other accountable.
I'm usually a big supporter of labor unions, but police unions in the US generally have an outsized amount of power, and even when mayors etc. want to hold police accountable, the union ends up bending the mayor over a barrel.
I'm not sure what the solution is here. Forbid police from unionizing? That would probably have some bad consequences too.
despite this being something practically everybody wants, the fact that it hasn't happened is not a coincidence and speaks to the power of police unions/guilds and their lobbying arms. outside a few toothless instances, those groups are extremely good at reframing these attempts and mobilizing their bases to vote against the broader public interest.
> despite this being something practically everybody wants,
No, everybody does not want police accountability. Half the population will fall on a grenade to prevent that. They know that the purpose of the police is to keep the undesirables in line, and they never envision that they will ever fall in that category.
oh, i generally don't disagree with you on that point; i specifically meant that when presented with the question "do you want your tax dollars to pay for police liabilities?" the answer is probably almost always "no".
Sure. But when you ask "Do you want the police to be unable to do their job and live in a lawless hellscape ran by gangbangers and ISIS cartels?, the answer is also 'No.'
The problem is that the mass media sets the framing of acceptable discourse, and that mass media is in large part an ideological monoculture. And even when it's not, it is happy to present absolutely insane batshit lunacy as 'one of the two sides' of an issue.
yeah - i think the media is certainly culpable, but i also think this speaks to the power of police unions like i mentioned earlier. media is happy to present stories presented to them on silver platters by "respected" institutions because they carry all the hallmarks of legitimacy.
Almost all taxpayer funded pension funds are already underfunded. It makes no difference if the funding decreases or increases, the government employee will still get their benefit. The government would have to go through bankruptcy to get the benefit amount reduced.
Less than 5% of the population knew what it meant to install an app when the iPhone launched. I believe Steve Ballmer ridiculed the idea when asked about it.
A great many amount of people use Android to this day because of its more open nature, and that's despite Google's involvement. If Motorola could go back to its native roots, shake the idea of Chinese influence, and do open source proper, I bet there's a lot more than 5% of the market ready for it.
Try "aware, even vaguely, of the privacy issues standard smartphones pose".
(I would bet more than 5% have at least a vague notion of open source though, and a positive a priori - also possibly mixing it with source-available, which would be on par with some people we can read on HN)
I'm not arguing against that, I'm just saying that open source labelling isn't a feature to users.
The downstream effects of something being open source might acquire users, but being open source in of itself doesn't do anything except for a very tiny slice of the population. I'd say (in the US) more than half of the software developers I know use an Apple phone despite Android being much more open.
Whenever I'm on HN I feel like most of the posters here live in a bubble where they think most people are anywhere near as tech literate as they are. (You can really feel how this forum is SF-coded).
I actually like rust more than Haskell, but `You can even have most of your code base be sans-IO, which is the exact same pattern you'd use in Haskell.` glosses over the fact that in Haskell it's enforced at compile time.
Another argument as to why rust isn't the forever-language. My forever language should include effects!
Even rust has need of this. For example, I want a nopanic effect I can put on a function which makes it a compile error for anything that function calls to panic.
Though I think it's the closest language right now, ideally you have something that is close to "zero-overhead" as your forever language.
I really like how flix.dev looks, but there's always a little nagging at the back of my head that something like rust will always produce more performant software.
> Even rust has need of this. For example, I want a nopanic effect I can put on a function which makes it a compile error for anything that function calls to panic.
This!
This apart from build times is my biggest request for the language.
I think relative to the typical Rust code it likely does worse than AI relative to the typical Python code. But due to the compiler, it's possible you might get more correctness out of AI-generated rust code on average.
No, they raise price because they can and demand isn’t showing signs of stopping despite increased prices. This won’t affect whether there’s a shortage or not, besides we’re not talking direct to consumer float product, they inked commitments.
If they didn’t have a documented history of running cartel price fixing schemes for LCD/OLED display tech, NAND, and DRAM, I’d maybe agree with you but we have the history. They cry every time about China ‘dumping’ for not going along with the racket.
Can chose to increase production or embrace the scarcity. The later might look delicious on paper. More profit, less effort, less short term risk. All you have to do is ignore the whale in the room.
They are increasing production. Fabs take multiple years to come online. The modern semiconductor industry moves only very slowly and at very great cost.
reply