The Big List of Naughty Strings (strings likely to cause issues as input) (github.com)
Creator/Maintainer of the repo here.

I apologize for the lack of updates to the BLNS. (since I'm free today and this is on the HN front page, I'll do a cleanup pass).

Even though it's a GitHub repository with 12.3k stars, there's not much to say or improve on what is effectively a .txt file based around a good idea (I recently removed mentions of my maintainership of the BLNS from my resume for that reason, despite its crazy popularity).

The HN submitter here. :)

I happened across it this afternoon and thought it was great!

Do you know of any automation around this? I was thinking of a script that grabbed your list and then hammered a given input filtering library would be awesome. It's not something you'd want to run all the time but pre-major release, it could useful.

That is the primary purpose of the JSON files and the parser to convert the .txt to JSON; get the list, run it against a text input field, see what happens.

The human injection phrase is priceless

It's a nice collection of text snippets to test against many systems

I like the idea of providing such a list for testing purposes. I also like the idea of storing these as Base64, so you don't trigger issues by accident.

However, I also imagine how such a list could be misused to actually decrease the security of a system:

Imagine this list is handled the same way as virus signatures in so-called anti-virus software. Instead of properly handling user input, an application would check against this list and call itself "secure". Maybe with with partial and/or fuzzy comparison. If you demonstrate that this approach is deeply flawed by showing another unsafe input, they'd simply add that to the list and call themselves "secured" against this attack.

Such an application is not likely to be secure in the first place. If you've gotten as far as trying this list, you're probably well above the median.

If someone uses this list for security purposes I think that someone has a bigger problem.

>Although this is not a malicious error, and typical users aren't Tweeting weird unicode, an "internal server error" for unexpected input is never a positive experience for the user

What would the user expect from inputting "U+200B ZERO WIDTH SPACE" into a form, anyway?

At minimum, no error at all. Ideally, the same behavior you would get from putting in either nothing or a space.

Let's try it on Facebook. Here's what happens when you put only a blank or space into a post and try to submit: http://i.imgur.com/bNtgky8.png

Here's what happens when you put a zero width space and try to submit: http://i.imgur.com/NMgyZqc.png

I've observed ZWSes appearing in user input for an application I maintain. It appears in text pasted from either Outlook or OWA, I believe. In our case, it is necessary that the application handle them gracefully - indeed, the user has no reason to know anything is amiss.

i actually tweeted an zero width space some time ago and it worked. The tweet contained no text though.

Probably a 4xx error not a 5xx.

Well, depends. If copying a table from a technical document, maybe a zero width space?

This is a fun issue https://github.com/minimaxir/big-list-of-naughty-strings/iss...

