Hacker Newsnew | past | comments | ask | show | jobs | submit | fourthark's commentslogin

Interesting use of evals.

Might help interpretation to say on the front page that it's a five point scale with 0 (or 1?) being the safest score. This can be picked up from colors and the bars in the individual reports, but it takes a minute to figure it out.


Good suggestion thank you! It's between 1-5 but I'll convert that to 1-100


Capitalism doesn't work for big infrastructure projects? Who knew?


This is capitalism working as intended. Only the best run airlines can survive, and investors are collectively subsidizing air travel for non-investors.


Nothing in the article (or in the real world) even remotely suggests that "the best" airlines survive. Simply, the airlines that survive are the ones that survive.


Interesting, I got a completely different result on green/blue on this one, way more green whereas I got average on the individual test. Going between very different colors makes it hard to reset - they might consider breaks between spectra.


> You gasp. You hyperventilate. Your heart rate jumps. Your blood pressure climbs. All of this in a few seconds.

There's something especially creepy about AIs talking in the second person about biological processes they don't experience.


Yes, using nine specialized cameras. Still very impressive but the human is overmatched on equipment alone.


> reset the context

Yes. Do this. These problems likely mean you have muddled the context.

The article too long and I didn't read the whole thing, but I'm glad the author came to understand that arguing won't help.


> But if you are the kind of person who cries out against this abomination we must warn you that people who go through life expecting informal variant idioms in English to behave logically are setting themselves up for a lifetime of hurt.


That's easy to ignore.


I think the point is that you have a better idea of what you want it to remember and even a small hint can have big impact.

Just saying "write up what you know", with no other clues, should not perform any better than generic compaction.


Wish we could downvote articles. Is it legitimate to flag AI slop?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: