Hacker Newsnew | past | comments | ask | show | jobs | submit | NinjaTrance's commentslogin

As far as I remember, SolidGoldMagikarp was a bug caused by millions of posts on reddit by the same user ("SolidGoldMagikarp") in a specific sub-reddit.

There was no problem with the token per se, but the fact it was like a strange attractor in multidimensional space, disconnected from any useful information.

When the LLM was induced to use it in its output, the next predicted token would be random gibberish.


More or less. It was a string given its own token by the tokeniser because of the above, but it did not appear in the training data. Thus it basically had no meaning for the LLM (I think there are some theories that such parts of the networks associated with such tokens may have been repurposed for something else and so that's why the presense of the token in the input messed them up so much)


gpt-oss has similar bad tokens.

https://fi-le.net/oss/


Haha, I don't know why but I also "see" it as a light blue seahorse, and it's facing left.


Is this by any chance what you're seeing? https://images.wikidexcdn.net/mwuploads/wikidex/6/6c/latest/...


I saw it as orange


Green, facing left?


The easy solution would be to use something like Amazon S3 to store documents as objects and let them worry about backup; but governments are worried (and rightly so) about the US government spying on them.

Thus, the not-so-easy-but-arguably-better solution would be to self-host an open source S3-compatible object storage solution.

Are there any good open source alternatives to S3?


I recently learned about https://garagehq.deuxfleurs.fr/ but i have no expirience using it


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: