>> Program synthesis has been mentioned as a promising approach by François Chol...

bubblyworld · 2024-06-18T11:47:57.000000Z

Bogosort is driven by 0 heuristics - just shuffle and play. Using an LLM as a high-level prior over your search is very different, and the author had to do a lot of problem-specific tuning to make it work well.

YeGoblynQueenne · 2024-06-18T17:35:32.000000Z

But he tuned the "test" side of the generate-and-test loop, not the "generate" side. The "generate" side remains a big permutation generator that is also very hard to control. The current highest-ranked system on the private test set of the ARC-AGI (at 34%) is another LLM fine-tuned on manually created examples of ARC tasks, so that would indeed be messing with the generator part of the loop. I'm guessing performance will jump when someone puts the two together.

A heuristic btw, is something completely different than fine tuning, or filtering. Heuristic search is the closest thing we have to an approximation of the kind of goal-driven behaviour we see in animal intelligence.

I think you could argue that gradient optimisation or any kind of optimisation of some kind of objective function is the same (Rich Sutton has a paper titled "Reward is all you need"). I'm not sure where I stand with that.

bubblyworld · 2024-06-19T06:37:48.000000Z

There's a list of things the author did to change the "generate" side in the first two paragraphs of the article.

The heuristic isn't the fine-tuning, it's the actual LLM, which is clearly pruning the set of possibilities massively. That's a reasonably common usage of the word. I agree combining it with some kind of search would be interesting, but still I think you're being overly negative about the results here.

I'm actually busy training an alphazero for the arc problems, which I plan to try and hook up to a language model for reward generation, so we'll see how that fares!

I've read that paper, but thanks for the reference, this comment section is a goldmine.

YeGoblynQueenne · 2024-06-19T09:41:57.000000Z

>> There's a list of things the author did to change the "generate" side in the first two paragraphs of the article.

I can't see where that is. All I can see the author saying they did is prompting and filtering of returned answers, none of which is going anywhere near the weights of the language model (that's where I'm claiming the "generator" is residing).

>> I'm actually busy training an alphazero for the arc problems, which I plan to try and hook up to a language model for reward generation, so we'll see how that fares!

That sounds exciting. Good luck with your effort!

bubblyworld · 2024-06-19T16:59:46.000000Z

Yeah, you don't play with the weights in language models, you play with the residual stream by prompting (and occasionally by direct modification if you're being clever). But that does affect the model's generation! (obviously? otherwise there would be no need for a prompt in the first place, and all the recent residual stream modification research wouldn't work).

But I think if we just banned the word "generator" we probably wouldn't disagree on much here.

> Good luck with your effort!

Thanks =)