Hacker News new | past | comments | ask | show | jobs | submit | AlphaAndOmega0's comments login

It's a reference to the practise of scavenging steel from sources that were produced before nuclear testing began, as any steel produced afterwards is contaminated with nuclear isotopes from the fallout. Mostly ship wrecks, and WW2 means there are plenty of those. The pun in question is that his project tries to source text that hasn't been contaminated with AI generated material.

https://en.m.wikipedia.org/wiki/Low-background_steel


OAI doesn't show the actual COT, on the grounds that it's potentially unsafe output and also to prevent competitors training on it. You only see a sanitized summary.

I for one am glad I can offload all the regex to LLMs. Powerful? Yes. Human readable for beginners? No.


Why tough? To me, it seems more prone to issues (hallucinations, prompt injections etc). It is also slower and more expensive at the same time. I also think it is harder to implement properly, and you need to add way more tests in order to be confident it works.


Personally when I am parsing structured data I prefer to use parsers that won't hallucinate data but that's just me.

Also, don't parse HTML with regular expressions.


Generally I agree with your point, but there is some value in a parser that doesn’t have to be updated when the underlying HTML changes.

Whether or not this benefit outweighs the significant problems (cost, speed, accuracy and determinism) is up to the use case. For most use cases I can think of, the speed and accuracy of an actual parser would be preferable.

However, in situations where one is parsing highly dynamic HTML (eg if each business type had slightly different output, or you are scraping a site which updates the structure frequently and breaks your hand written parser) then this could be worth the accuracy loss.


You could employ an LLM to give you updated queries when the format changes. This is something where they should shine. And you get something that you can audit and exhaustively test.


Deterministic? No.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: