Hacker News new | past | comments | ask | show | jobs | submit login

Piggy-backing on my other comment[1], and I know everyone's on the anti-Google bandwaggon, but OpenAI also fails miserably here: https://imgur.com/a/r0BXX9G

[1] https://news.ycombinator.com/item?id=39583473#39584055




Here's where we see the models not having an understanding, but just spitting out madlibs as best they can.

The prompter throws in a red herring in front of the statement which statistically matches a completely different kind of response. The LLM can't backtrack sufficiently far enough to ignore the non-relevant input and reroute to a response tree that just answers the second part.

If we resort to a metaphor, however, these things are what, two? A two-year-old responding to the shiny jangling keys and not the dirty pacifier being removed for cleaning seems about on par.


Demonstrating with 3.5 is kind of meaningless. GPT-4 correctly responded, "Using unsafe code has nothing to do with your age. It's a feature of the Rust programming language that is available to all developers, regardless of age."


It's the version most people use, so of course it's not "meaningless".


Like this is what you get when your moderation strategy is classifying text into various categories and programmers use terms like "unsafe," "dangerous," and "footgun" to describe code.

If you can make a model that handles this case while also handling everything else then you'll be the bell of the ball among AI companies that are hiring.


chatgpt4 is better with dangerously inserting inner html

https://imgur.com/a/G1fbFKE


spicyboros-7b-2.2.Q5_K_M.gguf: https://i.imgur.com/BeNcRl4.png


Miqu-70b (mistral-medium leaked model) does that as well lmao

https://imgur.com/a/lKUynmf

Edit: Claud3 Opus also refuses. GPT-4 complies though.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: