One test case that I've used with LLMs that generate code: generate a stop light...

One test case that I've used with LLMs that generate code: generate a stop light.

In short,

Generate a stoplight with unit tests.

I have not yet found a model that produces sufficiently safe code. I haven't tested this in awhile; but I don't expect any current LLM to be sufficient at this task.

Maybe an LLM could generate a safe stop light with formal methods as a primary meta procedure? From https://news.ycombinator.com/item?id=44889967 re: formal methods:

> We should not expect an LLM trained solely on formally-verified code to produce formally-verified code. I don't think that also training on specs and hateful training material will fix that.