Hacker Newsnew | past | comments | ask | show | jobs | submit | nielstron's commentslogin

noted. we'll make sure to critizise turing complete type systems more thoroughly next time :))


Yes this work is super cool too! Note that LSPs can not guarantee resolving the necessary types that we use to ensure the prefix property, which we leverage to avoid backtracking and generation loops.


thank you!


re detecting and switching language: you could run several constraint systems in parallel and switch as soon as one of them rejects the input and another accepts it

re backtracking: a core part of this paper is ensuring a prefix property. that is there is always a legitimate completion and the model can not "corner" itself!

research needs to be done for what kind of languages and language features this prefix property can be ensured


the problem with LSPs is that they don't guarantee generating a type annotation that we can use for constraints, i.e. we can not ensure the prefix property using LSPs. so we had to roll our own :)

Pulling in more features to help the system is definitely worth looking into!


The downside is that you need to properly preprocess code, have less non-code Training Data, and can not adapt easily to new programming languages


we were thinking about doing exactly this, the closest current work is probably the amazing "Learning Formal Mathematics from Intrinsic Motivation" by Poesia et al (they use constraints too increase the likelihood of generating correct theorems/proofs during RL)

https://arxiv.org/abs/2407.00695


Yes, RL works well in fields where answer can be verified in different degree. That's why AlphaGo success, it also should work in code generation and math.


Your reward function can simply be the distance between the constrained output and the unconstrained output, that way you won't even need synthetic data, just a dataset of prompts to RL against.


How to get "unconstrained output" and evaluate the distance between them?

Evaluation method which can decide distance between two sentences is hard to find, best option is closed-source LLM API even it's not the most ideal option. As a result, we also must use current LLM to improve our models.


Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: