Hacker News new | comments | ask | show | jobs | submit login

It's not completely clear to me which encoding the blns.txt file uses. Since this project is all about weird/evil bytestrings, the encoding of the file itself is very important.

Using a newline as a delimiter in that file excludes newlines from being part of the strings you are testing - but newlines are an important "naughty" character to consider. Unfortunately the same is true of basically any other common delimiter character.

Maybe base64-encoding the strings would be one way to solve for this? You could use base64-encoded values in JSON, for example.

Fair question. Encoding is UTF-8. This is fine for time being since UTF-8 is ubiquitous.

I had it set as UTF-16 for the two-byte characters when first writing it, but that had caused issues. If there is a demand, a second list can be added.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact