> You’re going to stop naturally so carry on smoking as usual until then.
> Avoid cutting down beforehand because reducing cigarettes often makes them seem more valuable rather than less. Setting a quit date helps you prepare mentally and physically to stop smoking on that day.
> As a countermeasure to key disclosure laws, some personal privacy products such as BestCrypt, FreeOTFE, and TrueCrypt have begun incorporating deniable encryption technology, which enable a single piece of encrypted data to be decrypted in two or more different ways, creating plausible deniability.
Doesn’t this make the whole idea of legislating this a non-starter?
On the issue of “are LLMs good at lisp” I have a bit of a tangential response/observation.
I saw this [paper](https://ai.meta.com/research/publications/logic-py-bridging-...) awhile ago. Long story short they made a python looking DSL for LLMs to convert natural language logic puzzles to. Then they converted the DSL expression to something a SAT/SMT solver could munch on.
My initial reaction was “why don’t they just have the LLM write smtlib2.” And I guess the answer is that LLMs are probably better at writing python-looking smtlib2. Probably an oversimplification of their work on my part. But I didn’t see any comparison between their work and a direct encoding into smtlib.
Makes me wonder if your idea could work along similar lines. Instead of using lisp directly, could you use a DSL that looks like more traditional languages? Would that help?
Interesting, I'm not familiar with that paper, but I guess the performance gain comes from raising the abstraction level (hiding solver boilerplate).
PTC-Lisp could achieve the same thing by defining constraint-building functions as tools, and the LLM writes high-level PTC-Lisp that calls a solver. In fact,
Lisp has a natural advantage here: instead of building a separate DSL with its own compiler (as I guess Logic.py did), Lisp can extend itself with macros — code is data, data is code - however this is not impl. yet in ptc-lisp.
I think the translation from L0 to L1 is going to become more and more important. There have been a lot of discussions here on HN about how natural language specs “aren’t code” and how LLMs provide no formal relationship between their inputs and outputs.
One way to side step this is to have the LLM translate the NL into a formal language and persuade the human that the formal language captures their intent. This reduces the burden because the user only has to look at and understand the formal language spec, rather than all the code produced by the LLM.
Also once a formal spec is obtained, you can do lots of interesting computation on it. Property based testing comes to mind. Or even full-blown verification of the formal spec. Or, LLMs might be good at recognizing ambiguity. An LLM could generate two formal specs, use an SMT solver to find an input where the specs differ, and help the user use this diff to ask clarifying questions and resolve the ambiguity.
One comment I have is that layers L1 and L2 _might_ be reinventing the wheel slightly. Your ensure statements remind me of Dafny or Verus, for instance, which have a lot of tooling behind them.
Indeed, both Verus and Dafny are very close to home here — thanks for the pointer. I did feel at times that I might be reinventing the wheel a bit.
I also thought about pushing more toward Lean and theorem proving instead of a lighter SMT-style direction, but programming and mathematics diverge quickly unless that abstraction boundary is handled carefully. To me, though, that mostly concerns the L1 -> L2 step.
L2 is still the least settled layer and the one I am actively exploring. The idea there is to let relations acquire semantic interpretations: this relation may be realized as a class, an embedding, a parser, etc. Since there is often more than one valid implementation route, I want that choice to be explicit.
And yes, part of the point is exactly to constrain and verify that the end result still matches the original intent at some IR/formal level.
So I am beginning to think the layering does offer something beyond plain Markdown specs. L1, at least as I currently see it, is intentionally very small: just types and relations, in a logic-oriented style somewhat closer to Prolog — or even SQL — where the world is modeled relationally and constraints are expressed over that. Essentially having more than one targets on L2 level (at the same time in one module) will help to generate both the code and the semantical verification. That's actually a very fruitful observation - big thanks!
> I also thought about pushing more toward Lean and theorem proving instead of a lighter SMT-style direction
I think your intuition here is good. For these settings I think you want formal methods that are highly automated.
Question about your calculator example: is the intended use case that the user would write these L0-L2 files? Or is it expected that an LLM would write them with user intervention? And the go program, how is that obtained from L2? Is that a symbolic transformation or is the LLM doing it?
Apologies if this is written somewhere and my skimming missed it.
> The proofs stop at the language boundary. The bugs don’t.
In formal verification, you have to model everything you care about. I suspect we’ll see large fragments of popular languages being more thoroughly modeled in languages like Dafny and Lean.
An alternative that side steps all of this is to not use an external language at all. ACL2 might be a better fit in this regime than Dafny or Lean because of how close it sits to SBCL.
I suspect the same but I wonder how effective it will be in reality. I mention a group that has been trying to verify SQL for 12 years, and they haven't fully formalised all basic queries.
I think LLMs will speed things up but will it result in popular libraries becoming verified or an entirely new class of libraries being built and verified from the ground up?
I do use ACL2, although I don’t do many proofs. When I do, the proofs usually go through automatically or require me to state only a few lemmas or tell the tool how to do the induction.
This is partially a commentary on how good the automation is in ACL2 and partially a commentary on the fact that I don’t use it for hard problems :)
I use it more for property based testing. The counterexample generator is very powerful because it leverages the knowledge base of the theorem prover (all the accumulated lemmas, etc.) among other things.
> The counterexample generator is very powerful because it leverages the knowledge base of the theorem prover (all the accumulated lemmas, etc.) among other things.
That's a really interesting application. Could be very powerful for what I'm doing. Thank you!
Happy to help. In general, I think counterexample generation is more important than proofs when it comes to software (most of our software is “trivially” wrong). The world might not be ready for full-blown formal verification, but I think it is ready for formal specification and counterexample generation.
> Recently I’ve been thinking about coding with AI in terms of it being a process of navigating a tree of probabilistic outcomes.
I think this is the correct way to think about automated coding. The natural question is: why are we using a statistical model to control this search, and not a symbolic model?
Modern SAT and SMT solvers are excellent at searching spaces. There has already been work on biasing them with machine learning. Symbolic models for control, statistical models for bias. We should be thinking about how to apply this idea to practical, industrial software engineering tasks. There is already some work on program synthesis from formal specifications, but it has yet to scale.
Maybe you’re right about modern LLMs. But you seem to be making an unstated assumption: “there is something special about humans that allow them to create new things and computers don’t have this thing.”
Maybe you can’t teach current LLM backed systems new tricks. But do we have reason to believe that no AI system can synthesize novel technologies. What reason do you have to believe humans are special in this regard?
After thousands of years of research we still don’t fully understand how humans do it, so what reason (besides a sort of naked techno-optimism) is there to believe we will ever be able to replicate the behavior in machines?
Well, understanding how it works is not a prerequisite to being able to do it.
People have been doing thigs millenia before they understood them. Did primitive people understood the mechanism behind which certain medicinal plants worked in the body, or just saw that when they e.g. boil them and consume them they have a certain effect?
We've only had the tech to be able to research this in some technical depth for a few decades (both scale of computation and genetics / imaging techniques).
And then we discover that DNA in (not only brain) cells are ideal quantum computers, DNA's reactions generate coherent light (as in lasers) used to communicate between cells and single dendrite of cerebral cortex' neuron can compute at the very least a XOR function which requires at least 9 coefficients and one hidden layer. Neurons have from one-two to dozens of thousands of dendrites.
Even skin cells exchange information in neuron-like manner, including using light, albeit thousands times slower.
This switches complexity of human brain to "86 billions quantum computers operating thousands of small neural networks, exchanging information by lasers-based optical channels."
The Church-Turing thesis comes to mind. It would at least suggest that humans aren’t capable of doing anything computationally beyond what can be instantiated in software and hardware.
But sure, instantiating these capabilities in hardware and software are beyond our current abilities. It seems likely that it is possible though, even if we don’t know how to do it yet.
The church turing thesis is about following well-defined rules. It is not about the system that creates or decides to follow or not follow such rules. Such a system (the human mind) must exist for rules to be followed, yet that system must be outside mere rule-following since it embodies a function which does not exist in rule-following itself, e.g., the faculty of deciding what rules are to be followed.
Church turing is about computable functions. Uncomputable functions exist.
For example how much rain is going to be in the rain gauge after a storm is uncomputable. You can hook up a sensor to perform some action when the rain gets so high. This rain algorithm is outside of anything church turing has to say.
There are many other natural processes that are outside the realm of was is computable. People are bathed in them.
Church turing suggests only what people can do when constrained to a bunch of symbols and squares.
That example is completely false: how much rain will fall is absolutely a computable function, just a very difficult and expensive function to evaluate with absurdly large boundary conditions.
This is in the same sense that while it is technically correct to describe all physically instantiated computer programs, and by extension all AI, as being in the set of "things which are just Markov chains", it comes with a massive cost that may or may not be physically realisable within this universe.
Rainfall to the exact number of molecules is computable. Just hard. A quantum simulation of every protein folding and every electron energy level of every atom inside every cell of your brain on a classical computer is computable, in the Church-Turing sense, just with an exponential slowdown.
The busy beaver function, however, is actually un-computable.
You just compute the brains of a bunch of immortal mathematics. At which point it's "very difficult and expensive function to evaluate with absurdly large boundary conditions."
One of the most consequential aspects of the busy beaver game is that, if it were possible to compute the functions Σ(n) and S(n) for all n, then this would resolve all mathematical conjectures which can be encoded in the form "does ⟨this Turing machine⟩ halt".[5] For example, there is a 27-state Turing machine that checks Goldbach's conjecture for each number and halts on a counterexample; if this machine did not halt after running for S(27) steps, then it must run forever, resolving the conjecture.[5][7] Many other problems, including the Riemann hypothesis (744 states) and the consistency of ZF set theory (745 states[8][9]), can be expressed in a similar form, where at most a countably infinite number of cases need to be checked.[5]
"Uncomputable" has a very specific meaning, and the busy beaver function is one of those things, it is not merely "hard".
> You just compute the brains of a bunch of immortal mathematics. At which point it's "very difficult and expensive function to evaluate with absurdly large boundary conditions."
Humans are not magic, humans cannot solve it either, just as they cannot magically solve the halting problem for all inputs.
That humans come in various degrees of competence at this rather than an, ahem, boolean have/don't have; plus how we can already do a bad approximation of it, in a field whose rapid improvements hint that there is still a lot of low-hanging fruit, is a reason for techno-optimism.
Something I think about frequently is that 20 years ago, there weren’t machines that could do visual object recognition/categorization and we didn’t really have a clue how humans did it either. We knew that neuron built fancier and fancier receptive fields that became “feature detectors”, but h the ere was a sense of “is that all it takes? There has to be something more sophisticated in order to handle illumination changes of out of plane rotation?”
But then we got a neural wr that was big enough and it turns out that feedforward receptive fields ARE enough. We don’t know whether this is how our brains do it, but it’s a humbling moment to realize that you just overthought how complex the problem was.
So ive become skeptical when people start claiming that some class of problem is fundamentally too hard for machines.
Are modern visual recognition & categorisation systems comparable to human capabilities? From what I can tell, they aren't even close (although still impressive!).
In the grand scale of things, a computer is not much more than a fancy brick. Certainly it is much closer to a brick than to a human. So the question is more 'why should this particularly fancy brick have abilities that so far we have only encountered in humans?'
The claim being made is not "no computer will ever be able to adapt to and assist us with new technologies as they come out."
The claim being made is "modern LLMs cannot adapt to and assist us with new technologies until there is a large corpus of training data for those technologies."
Today, there exists no AI or similar system that can do what is being described. There is also no credible way forward from what we have to such a system.
Until and unless that changes, either humans are special in this way, or it doesn't matter whether humans are special in this way, depending on how you prefer to look at it.
Note that I prefaced my comment by saying the parent might be right about LLMs.
> That's irrelevant.
My comment was relevant, if a bit tangential.
Edit: I also want to say that our attitude toward machine vs. human intelligence does matter today because we’re going to kneecap ourselves if we incorrectly believe there is something special about humans. It will stop us from closing that gap.
Its not an assumption, it is a fact about how computers function today. LLMs interpolate, they do not extrapolate. Nobody has shown a method to get them to extrapolate. The insistence to the contrary involves an unstated assumption that technological progress towards human-like intelligence is in principle possible. In reality, we do not know.
As long as agnosticism is the attitude, that’s fine. But we shouldn’t let mythology about human intelligence/computational capacity stop us from making progress toward that end.
> unstated assumption that technological progress towards human-like intelligence is in principle possible. In reality, we do not know.
For me this isn’t an assumption, it’s a corollary that follows from the Church-Turing thesis.
That certainly doesn’t follow from the Church-Turing thesis because the Church Turing thesis doesn’t demonstrate that human intelligence is computational. That it is still an unstated assumption.
First, I have some advice if you are open to it. Apply a spell/grammar checker to your post before sharing it with other people. Your post has several typos and many people will stop reading after the first or second typo. “The author didn’t care enough to proofread, why should I bother reading.”
> Remember, this is important: do not look at the tests. If you let them into your context
This is a bad idea. You are trusting the LLM’s ability to follow instructions. Worse, depending on your harness the LLM might not even be able to follow these instructions. The harness may indiscriminately place code into the context in a way that is uncontrolled by the LLM.
A better idea is to modify your harness so that certain files are excluded from the context.
> I asked another AI to carefully review the tests and identify those that don't make sense.
Test validation is an entire area of research and I’m yet to be convinced that this is a task for LLMs.
> AI must control the proof search, not delegate to a black box.
Cannot disagree more. Take a look at modern SAT solvers, SMT solvers, and automated theorem provers like ACL2. We should _not_ be letting the AI’s white knuckle any search process. AI is not going to give us completeness, even if we can tack on a post-hoc soundness check. Moreover, AI can act as a heuristic but it is not going to automatically exploit symmetries and other symbolic contours of the space. Exploiting such contours is what makes these symbolic tools so powerful.
There has been a lot of work on combining symbolic search with machine learning as a heuristic. To me, this is the most promising route. My take is that we should use the statistical models to bias, and the symbolic models to control.
> Verifying software is mathematics: the same reasoning that proves a theorem in abstract algebra proves that a cryptographic library correctly implements its specification.
I also want to push back on this. If you have written a mechanical proof about software and a mechanical proof about a mathematical object, you’ll note that they feel qualitatively different.
In my experience, proofs about software have way more case splits involving thousands or even millions of cases. Proofs about more traditional mathematical objects tend to have far fewer case splits of this nature. The obligations in proofs about software tend be easier to automatically. Mathematical objects tend to require a lot more insight and infrastructure around them.
Of course proofs about software are still mathematics and the same infrastructure can found both. I am merely suggesting that proofs about software may be amenable to more specialization than proofs about mathematics generally.
I will also say that finding a counterexample tends to be more important in software settings than finding a proof. This is because software tends to have bugs whereas mathematics tends to eliminate such bugs before we bother to mechanically formalize it.
> The AI community has already made its choice.
I wonder if this is the wrong choice. Lean is great for reasoning about mathematics. I don’t think it’s the right choice for reasoning about industry scale software. I hate the ACL2 user experience, but I think it is fundamentally more fit to reason about software. I wish the AI community would take a broader look at what formal methods tools are available instead of just picking up the hottest most popular tool and shoe horning everything into it.
https://www.allencarr.com/easyway-stop-smoking/top-tips-to-s...
> You’re going to stop naturally so carry on smoking as usual until then.
> Avoid cutting down beforehand because reducing cigarettes often makes them seem more valuable rather than less. Setting a quit date helps you prepare mentally and physically to stop smoking on that day.
reply