Hacker News new | past | comments | ask | show | jobs | submit | remilouf's comments login

This is actually pretty funny.


That’d be a pretty inefficient way to generate bullshit at scale


automating the creation of false testimonials is inefficient at scale? go on ...

what's the alternative?


LLM evaluations are very sensitive to the details of the prompt's structure. This post shows how using structured generation reduces the results' variance and the ranking shifts.


Looks like it’s quite the opposite: http://blog.dottxt.co/performance-gsm8k.html


What do you mean by "semantic dimension"?


That whole structured generation line of work looks promising. I hope someone else takes this and runs evaluations on other benchmarks. Curious to see if the results translate!


Agreed! While these results are very promising, there's still a lot to explore in this space.

In addition to the "prompt consistency" and "thought-control" ideas mentioned in the post, I'm definitely curious how the performance is on more complex structured data (things like codegen).


Awesome work! I am really impressed by how much structured generation improves model performance.


This article presents a way to make structured generation with LLMs much faster than standard generation, but what I find most interesting is how it highlights the issues that tokenization entails towards the end.


We already support regex-guided generation in the library, and could easily make an API to serve this as well if that's a feature people want!


I need it yes. that would be amazing tbh.


It is currently limited by the time it takes to build the index. There are obvious optimizations we can apply to this, however in a production setting it does not matter much since you only need to build the index once for each (schema, vocabulary) pair.


Is there a rough guide as to how long to wait? I think it's definitely an important thing if building takes 10+ minutes (or hours?) for even very basic models, that's a fundamentally different production architecture (as launching from a blank slate is now not feasible). It's also a big devx issue.

I'd highlight this somewhere on the readme as I wasn't sure if it was just broken or how long to wait.


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: