More

bwestergard · 2026-03-19T21:28:50 1773955730

I'm shocked to see how poorly these models, which I find useful day to day, do in solving virtually any of the problems in Unlambda.

Before looking at the results my guess was that scores would be higher for Unlambda than any of the others, because humans that learn Scheme don't find it all that hard to learn about the lambda calculus and combinatory logic.

But the model that did the best, Qwen-235B, got virtually every problem wrong.

joshmoody24 · 2026-03-20T19:07:49 1774033669

This surprises me too. I've experimented with using LLMs to convert lambda calculus expressions into combinatory logic. There is a simple deterministic way to do this, and LLMs claim to know it, and then they confidently fail.

__alexs · 2026-03-19T21:31:40 1773955900

They are also weirdly bad at Brainfuck which is basically just a subset of C.

astrange · 2026-03-19T23:59:40 1773964780

BF involves a lot of repeated symbols, which is hard for tokenized models. Same problem as r's in strawberry.

bwestergard · 2026-03-20T00:34:43 1773966883

Interesting. So why do the models seem to handle deeply nested Lisp expressions just fine?

kgeist · 2026-03-20T01:17:49 1773969469

Probably because there's a ton of code that deals with nested parentheses across languages in the training data, and models have learned how to work around tokenization limitations, when it comes to parentheses.

culi · 2026-03-20T05:17:59 1773983879

Yeah well they also still struggle with "4 + 6 / 9" so I'm not sure why anyone is surprised with these findings

bwestergard · 2026-03-19T21:16:43 1773955003

This is how I would deal with the problem if I maintained node: "Please, use your tokens and experimental energies to port to Rust and pass the following test suite. Let us know when you've got something that works."

bwestergard · 2026-03-19T20:03:23 1773950603

Not only is it pushing production down, but the resulting high prices are almost certainly going to cause permanently lower demand in certain sectors and countries ("demand destruction").

I would love to see a complete accounting in a year or so.

pseudohadamard · 2026-03-20T07:11:56 1773990716

Not necessarily. It's going to massively drive up the demand for coal and wood consumption where it used to be (comparatively) less-polluting gas. We won't really know until 6-12 months have passed and we've collected the data.

bwestergard · 2026-03-18T18:29:24 1773858564

That can't be the whole story, right? Because there are an arbitrarily large number of (e.g.) Rust programs that will implement any given spec given in terms of unit tests, types, and perhaps some performance benchmarks.

But even accounting for all these "hard" constraints and metrics, there are clearly reasons to prefer some possible programs over others even when they all satisfy the same constraints and perform equally on all relevant metrics.

We do treat programs as efficient causes[1] of side effects in computing systems: a file is written, a block of memory is updated, etc. and the program is the cause of this.

But we also treat them as statements of a theory of the problem being solved[2]. And this latter treatment is often more important socially and economically. It is irrational to be indifferent to the theory of the problem the program expresses.

[1]: https://en.wikipedia.org/wiki/Four_causes#Efficient

[2]: https://pages.cs.wisc.edu/~remzi/Naur.pdf

MeetingsBrowser · 2026-03-18T18:59:15 1773860355

> there are clearly reasons to prefer some possible programs over others even when they all satisfy the same constraints

Maintainability is a big one missing from the current LLM/agentic workflow.

When business needs change, you need to be able to add on to the existing program.

We create feedback loops via tests to ensure programs behave according to the spec, but little to nothing in the way of code quality or maintainability.

bwestergard · 2026-03-18T18:22:00 1773858120

Looks like a great implementation. I want to question the basic user story, which seems to be: "I am a software developer who wants to improve productivity by running multiple simultaneous agents that are roughly isomorphic to a human software developer team."

I am burning a lot of tokens every day at work and on personal projects. It's helpful. I generally work in tmux with github copilot in one pane, and a few other terminal panes showing tests and current diff.

I find it really important to avoid the temptation to multi-task by running multiple agents. For quite varied tasks, productivity gains from multi-tasking have proven to be illusory. Why would it be different with writing software?

https://en.wikipedia.org/wiki/Human_multitasking

bwestergard · 2026-03-17T21:07:49 1773781669

"exporting your own oil and gas to be able to have a 'clean' (and up to recently heavily subsidized) transportation network is in a way just a gigantic bookkeeping trick"

How so?

If every oil exporter used some of their oil revenue to switch to EVs, that would, all things equal, hasten the transition to EVs. The U.S. is not doing that.

yndoendo · 2026-03-17T21:55:34 1773784534

I still find it funny when it comes to oil between the USA and Saudi Arabia.

Saudi Arabia started moving the electrical system to renewables where USA is doubling down on fossil fuels.

Saudi Arabia is the drug dealer that knows you don't consumer your own supply unless you must were the USA consumes the crack they sell.

My next vehicle will 100% be pure EV, not Tesla.

appreciatorBus · 2026-03-17T22:06:55 1773785215

> the drug dealer that knows you don't consumer your own supply unless you must

So true. There's nothing incompatible at all with: a) realizing that earth has gifted you with a valuable but limited & polluting energy source b) realizing that you'd be foolish to get you own country hooked on it, but it's not a bad business if you can get other countries hooked on it.

Instead we get oil rich areas seemingly determined to show off how much of their oil they can waste.

rob74 · 2026-03-17T23:04:41 1773788681

Wow, so now the US oil barons who lobbied Trump to kill renewables and EVs are even worse than Mohammed "Bonesaw*" bin Salman Al Saud? That's really something, if you look at it that way...

* https://en.wikipedia.org/wiki/Assassination_of_Jamal_Khashog...

jacquesm · 2026-03-17T23:11:05 1773789065

Either you're too smart for me or I just can't follow you, but could you please expand a bit on your comment? I find it hard to link it to the parent, but I realize that may be on me.

rob74 · 2026-03-17T23:24:51 1773789891

Sorry, it was referring more to the grandparent comment, that referred to Saudi Arabia behaving more responsibly than the US, and Mohammed bin Salman is of course the crown prince and prime minister of Saudi Arabia.

svpk · 2026-03-18T00:58:24 1773795504

They're comparing Saudi Arabia to a drug dealer; I don't think they're ascribing any moral virtue to the Saudi regime. They just believe the Saudis are acting more intelligently.

Retric · 2026-03-18T00:04:43 1773792283

How you use worse implies a wider judgment than how someone behaves on a single issue. Real people are more complicated than Disney characters.

deaux · 2026-03-18T11:21:18 1773832878

Yes? I don't think you can argue in good faith that the latter causes more total harm and damage than the former. It's really quite something to look at it in a different way..

raw_anon_1111 · 2026-03-18T00:25:00 1773793500

How many people have Trump’s wars in Venezuela and Iran killed?

Tagbert · 2026-03-18T00:26:07 1773793567

The funny thing is the US doesn’t really consume much Saudi Oil. The US is a net exporter of oil, though they do import some specific types of oils and export more of others.

The US’s interest in the Middle East oil is a lot about stabilizing oil prices. At least it used to be when there was a rational policy and competent executors.

laughing_man · 2026-03-18T00:22:40 1773793360

Transitioning to renewables makes economic sense for the Saudis because they make more money selling a barrel of oil for transportation fuel and generating power with wind and solar.

The US has vast reserves of coal and natural gas. We generally don't use oil to generate power either -- oil is something like 0.4% of the total power generated, because we have vast amounts of natural gas and coal to use instead.

The situation isn't the result of some crafty master plan on the part of the Saudis. It's jusut what makes sense.

ZeroGravitas · 2026-03-18T09:11:09 1773825069

But in the context of the current topic, USA could be demonstrating their technical prowess and running EVs off this amazing coal and gas bounty.

Instead they seem to be in a cycle of buying massive inefficient vehicles and then getting annoyed at gas prices.

Oil is 2/5ths of US energy use.

ericmay · 2026-03-17T23:26:28 1773789988

The oil market is global and the US is a big part of that but it’s not the only one. You can always make changes to energy sources later and as new technologies are unlocked perhaps we can even skip some headaches now. Obviously there’s the geostrategic angle now which you see play out in Iran and Venezuela.

As other countries move to reliance on Chinese rare earth processing for renewable technology, it drives their oil and gas consumption down which means more oil and gas for those who are still using it.

If you really want to look at this analogy about drug dealers then really what you see is that America is the big boss here and an energy and military super power, and Saudi Arabia is just another dealer under American protection and if they don’t do what we tell them to do they’ll get the boot.

spicymaki · 2026-03-17T23:59:52 1773791992

Like the drug dealers where I grew up they are making the neighborhood a really terrible place to live. They might have a nice house right now, but the homes around them are burning.

kortilla · 2026-03-18T06:11:25 1773814285

The electrical system is unrelated to oil for transportation.

bluGill · 2026-03-17T22:59:42 1773788382

The US is moving the grid renewable. The guys at top might not think so and yell loudly not to, but they can't stop things, only put the brakes on a little.

ourmandave · 2026-03-17T23:24:40 1773789880

They've pumped the brakes pretty hard by cutting EPA standards, subsidizing coal, suing to stop wind and solar projects, cutting green energy grants by $8B, yoinking solar tax credits, trying to rewrite the Clean Air Act to block states from regulating emissions, shield Big Oil from litigation for climate deception, and repeating Big Oil's lies and disinformation.

jdlshore · 2026-03-17T23:46:55 1773791215

The economics are against them nonetheless. Solar + battery is seeing massive rollouts.

deaux · 2026-03-18T11:23:19 1773832999

Those rollouts are seeing massive cutbacks from what I've read, as half the country is straight up banning new solar. Good luck ever getting that off the books.

bluGill · 2026-03-18T13:14:57 1773839697

I don't think it will be that hard. Banning solar is a feel good thing now that doesn't affect many people - but that means when the next election is gone it won't be opposed when lobbyists (and greens) try to roll it back. Of course each state is different, so some it will take more than a few elections. In some states solar is already widespread enough that you can't ban it because too many people already have it and know enough about it to tell their friends. Those friends who live in other states will start to ask why they don't.

Remember you need to keep the 20 year plan in mind. If you only look to the end of 2026 things are hopeless, but look to 2050 (and compare to 2000) and things look much better.

deepsk · 2026-03-19T10:01:55 1773914515

https://news.ycombinator.com/item?id=47034087

Sorry for an absolute offtop, YC cuts reply date to two weeks. You wrote a bit lower in the discussion from the linked thread:

>Because the AGENTS.md, to perform well, needs to point out the _non_-obvious.

Could you briefly elaborate on how to do this?

deaux · 2026-03-19T11:02:02 1773918122

As I said there, it's inherently something the LLM can't do, at least not without lots of engineering. So I'm assuming you're talking about "as a human" here.

Some of it is just trial and error. You notice it makes an incorrect assumption, it takes longer to find something than it should, and so on. Some of that can be predicted, simply by you knowing the codebase. If you sat down with a new hire to walk them through it and get them up to speed, what would you tell them? It'd be a waste of time to tell them about things they can easily figure out on their own within a minute by looking at filenames and so on. It's the low effort thing to do, but it also achieves nothing.

For example, "A's B component has a default C which should be overridden unless desired". If A is an internal library then you could just fix that if it goes against the LLM's common assumptions, but maybe it's an external dependency and it's not worth it.

Or maybe you're building a game, and there are a few core mechanics that are relevant to much of the logic. Then you can likely explain in a few sentences what would otherwise need hundreds of lines of code read across multiple files. So you put that in an AGENTS.MD file in a relevant folder so it gets autoloaded when touching any of that code.

deepsk · 2026-03-20T07:39:37 1773992377

Thank you.

jasonfarnon · 2026-03-17T23:14:16 1773789256

"If every oil exporter used some of their oil revenue to switch to EVs, that would, all things equal, hasten the transition to EVs."

The premise is all things aren't equal. The oil Norway would have used just gets used somewhere else so what difference does it make what Norway does instead. I don't know if that's the reality of the situation but if it is just an offset, it does sound like a bookkeeping trick doesn't it?

blargey · 2026-03-17T23:37:59 1773790679

Norway switching from ICEs to EVs objectively reduces global oil consumption+burning by exactly that much.

Norway exporting oil increases oil supply, but doesn't increase consumption. The world's oil consumers are not supply-constrained; the producers are not running at 100% capacity, and they'll happily pick up the slack if Norway just stopped exporting oil for no reason. And there's a large amount of consumption that can't be offset by electrification in the first place (petrochemicals, long distance flight, etc) so there's not even a theoretical future end-state where they require a non-EV-using counterparty to buy their oil to fund their EV usage.

Calling it a "bookkeeping trick" is just verbal sleigh-of-hand.

jasonfarnon · 2026-03-19T23:09:50 1773961790

"Norway switching from ICEs to EVs objectively reduces global oil consumption+burning by exactly that much."

Meaning what they are in fact doing has the same effect as if they stopped producing/exporting oil exactly to the extent that it gets replaced by EVs over there? I could only see that happening if they undersell everyone in the world so they create no new consumers. I guess the truth is somewhere in the middle. I imagine the truth be known though? When Norway enters the market, how much other producers' sales go down?

patmorgan23 · 2026-03-18T00:24:13 1773793453

Increases in supply also increase consumption, we use lots of cheap stuff, but not very much of expensive stuff.

jonasdegendt · 2026-03-18T07:47:12 1773820032

This would be true but you're not accounting for OPEC and other groups (e.g. historically the Texas Railroad Commission in the United States, not sure how relevant they still are) to balance production and price per barrel to what they think is agreeable.

Oil hasn't been supply constrained since the 50's, it's price is largely based on what producing countries agree on, as well as geopolitics.

Additionally, governments levy a decent amount of taxes on certain end products such as gasoline. They might very well, as they have in the past, decide to simply up their tax revenue as prices of crude and derivatives go down.

paulryanrogers · 2026-03-17T23:38:23 1773790703

Only if Norway's lack of internal consumption must be met with equal and similarly destructive consumption elsewhere.

Consider if others followed their lead. Then oil would be used less for transportation, one of its most destructive and singular uses, and more for manufacturing or medical or less wasteful uses.

bwestergard · 2026-03-17T13:05:55 1773752755

Your job sounds really different from what's typical here on HackerNews. I'm really curious - can you tell us more about it?

metalman · 2026-03-17T20:15:39 1773778539

metal, the bending, joining, of metal for humans to use for something™,who find me through the interwebs , which I have been useing since the dawn, off and on, clumsily, but since grade school. the apple store was one room above a chinese resturaunt and had painted chip board walls. I have two web sites, one is a rental and I own the other, but I am focusing more and more on my core strengths in dealing with physical realities, which sometimes I call "applied geometry", though often there are curves and shapes that dont realy have names. But as a good deal of the work is designed and comunicated about with the use of computers and phones, I also spend a lot of time thinking about how that could be better, so hanging out here , trying to fight the good fight, is part of most days.

bwestergard · 2026-03-16T18:09:03 1773684543

Let's ask the LLM to score how good it would be at scoring jobs from LLM exposure... /s

bwestergard · 2026-03-16T04:20:38 1773634838

Yes, this is from Niko Tinbergen's classic monograph "The Herring Gull". It's the origin of the term "supernormal stimulus".

https://en.wikipedia.org/wiki/Supernormal_stimulus

gavmor · 2026-03-16T06:01:59 1773640919

Thank you, I've been looking for this term for twenty five years.

casey2 · 2026-03-16T18:28:39 1773685719

Impressive, they must be using some optimizing algorithm to get that many pseudoscientific claims per word.

bwestergard · 2026-03-14T19:53:47 1773518027

The framing of this makes it seem like this is a sharp change in trend, but this long-running layoff tracker shows no evidence of this.

2020 and 2023 both had serious layoff spikes, but the 2023 spike trailed off to an asymptote that we're still hovering around.

https://layoffs.fyi/

phyzix5761 · 2026-03-14T21:18:44 1773523124

This is missing a lot of data. Companies that I know for a fact are doing mass tech layoffs are not listed here.

bwestergard · 2026-03-15T16:46:18 1773593178

Then please do add your data through the submission form.

I'm sure some underreporting occurs, but what evidence is there that underreporting is worse this year than last year or the year before?

cyanydeez · 2026-03-14T20:34:01 1773520441

If you go into the larger field, the trend since 2021 is overall concerning, particularly if you factor in Trump's desire to just stop reporting: https://www.macrotrends.net/3208/us-layoffs-and-discharges

corysama · 2026-03-14T21:00:26 1773522026

According to that chart 2021 was anomalously low and it has been linearly returning to normal for the past four years.

AFAICT, the general populace is anxious about AI. So, the news knows they can get clicks with “You are right to be afraid. AI bad.” Meanwhile, CEOs know they can get stock boosts by saying “We are so AI we don’t need expenses. Infinite ROI!”

Put together we’re getting a ton of scary reporting on what looks like a quite normal business cycle (at least as far as layoffs go). And, everyone being afraid to hire is the only thing actually making it self-fulfilling.

Forgeties79 · 2026-03-14T21:19:31 1773523171

I wouldn’t call the massive levels of investment by both private equity and municipal/state governments “business as usual.” The sums being thrown down and/or promised are staggering. People/groups that lose are going to lose big.