Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Linguistic RL – A 7B model discovers Occam's Razor through reflection (github.com/drawson5570)
2 points by drawson5570 29 days ago | hide | past | favorite
Author here. I built a system where a small language model (qwen2.5:7b) learns through reflection rather than weight updates.

The unexpected finding: the model discovered Occam's Razor on its own.

Starting accuracy: 51.3% (zero-shot baseline) After learning: 78.0% (+26.7 percentage points)

But the numbers don't tell the full story. The learning journals reveal something profound:

Phase 1: The model hallucinated complex solutions ("use interval trees!", "apply graph theory!"). Accuracy stayed low (~35%).

Phase 2: Journal entries started showing doubt: "Since the problem is straightforward, focusing on basic interval checking..."

Phase 3: The breakthrough - the model wrote: "This suggests a fundamental misunderstanding of how to handle overlapping intervals."

It admitted it was wrong. From that moment, everything changed.

The distillation process acts as evolutionary selection: simple ideas that work survive, complex ideas that fail get filtered out.

Key advantages: - Fully interpretable (read the complete thought process) - Runs on consumer hardware (no GPU training) - Strategies are transferable text documents - Models learn to doubt themselves (AI safety implication)

All code and papers are open source. The experiment takes ~40 minutes to reproduce on a laptop.

Happy to answer questions about the approach, results, or implementation!



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: