
The Big List of Naughty Strings - polm23
https://github.com/minimaxir/big-list-of-naughty-strings
======
folkhack
Solid list for a quick SQL injection and XSS reference with lots of examples.
Even unicode/accents/two-byte characters etc are super useful to check
handling on all the way from the front-end to the persistent storage solution
(DB, etc).

Lost it laughing at "Human Injection" section:

> # Strings which may cause human to reinterpret worldview

> If you're reading this, you've been in a coma for almost 20 years now. We're
> trying a new technique. We don't know where this message will end up in your
> dream, but we hope it works. Please wake up, we miss you.

~~~
Konohamaru
That was a bastardization of the original one on 4chan's /x/. This one is the
real one:

> It has been reported that some victims of torture, during the act, would
> retreat into a fantasy world from which they could not WAKE UP. In this
> catatonic state, the victim lived in a world just like their normal one,
> except they weren’t being tortured. The only way that they realized they
> needed to WAKE UP was a note they found in their fantasy world. It would
> tell them about their condition, and tell them to WAKE UP. Even then, it
> would often take months until they were ready to discard their fantasy world
> and PLEASE WAKE UP.

~~~
yakshaving_jgt
> This one is the real one

Are you sure? I think this idea has been explored in several novels and films
over the past several decades.

~~~
dejj
Never seen it before on film. The Manchurian Candidate isn't it. Can you give
me some pointers?

~~~
yakshaving_jgt
Doesn’t The Matrix explore this idea?

~~~
rsecora
Wake up, Neo... The Matrix has you...

------
afandian
This is deiciously ironic:

> Also, do not send a null character (U+0000) string, as it changes the file
> format on GitHub to binary and renders it unreadable in pull requests.

~~~
Seb-C
For long I had the same problem with Japanese language files being shown as
binary in the GitHub diffs, and it is solved by having something like this in
the .gitattributes file

    
    
        *.php diff
    

Overall I am amazed that everything shows properly in GitHub.
[https://github.com/minimaxir/big-list-of-naughty-
strings/blo...](https://github.com/minimaxir/big-list-of-naughty-
strings/blob/master/blns.json)

------
dang
See also:

2018
[https://news.ycombinator.com/item?id=18466787](https://news.ycombinator.com/item?id=18466787)

2017
[https://news.ycombinator.com/item?id=13406119](https://news.ycombinator.com/item?id=13406119)

Show HN from 2015:
[https://news.ycombinator.com/item?id=10035008](https://news.ycombinator.com/item?id=10035008)

------
harunurhan
OK, seeing "﷽" [1] was unexpected :). For those who does not know, it's very
important for muslims and It's all over the Quran

[1] [https://github.com/minimaxir/big-list-of-naughty-
strings/blo...](https://github.com/minimaxir/big-list-of-naughty-
strings/blob/9c25300f66fd968ad863412608025908c8aa5efd/naughtystrings/internal/resource.go#L899)

~~~
cheez
what does it mean?

~~~
ctdonath
[https://www.urbandictionary.com/define.php?term=%EF%B7%BD](https://www.urbandictionary.com/define.php?term=%EF%B7%BD)

Fun fact: it’s a single Unicode character.

~~~
atomwaffel
Yup, you can put 280 of it into a single tweet.

~~~
robinhouston
I don’t _think_ that’s right. I looked into the way Twitter counts characters
when I was trying to work out the largest prime number that could be written
out in full, in base ten, in a single tweet[1]; the rules are more complicated
than you might expect, and have changed several times.

The current rule seems to be that all Unicode characters count as two, except
for the ranges 0–4351, 8192–8205, 8208–8223 and 8242–8247 which count as one.

[1] In case you’re wondering, I think it’s, arguably:
[https://twitter.com/robinhouston/status/1197294154738544641](https://twitter.com/robinhouston/status/1197294154738544641)

~~~
atomwaffel
Good point! Still, I could swear I saw someone (@FakeUnicode?) do exactly this
once, but of course I can’t find that tweet any more, partly because it turns
out that search engines don’t handle ﷽ well at all, and I don’t feel like
testing it on my own followers somehow.

Edit: it looks like it might count it as two characters, so that’s only 140
per tweet.

~~~
robinhouston
That’s definitely possible! @FakeUnicode mentioned in the discussion that,
when 280-character tweets were first introduced in September 2017, it was
possible to tweet 280 single-codepoint emoji using TweetDeck.

[https://twitter.com/fakeunicode/status/1197282221503041537](https://twitter.com/fakeunicode/status/1197282221503041537)

There are several amusing examples in the thread linked from this tweet.

------
dhosek
I encountered an amusing instance of this recently watching my six-year-old
son playing music on the kitchen Alexa. Alexa felt it was necessary to censor
the name of a children’s song entitled, “Pussy Cat, Pussy Cat.”

~~~
inetsee
When I saw the title I thought it was a list of profanity that one might want
to filter out from an open web application (i.e. a list that also includes
swear words from multiple languages).

------
jzl
Also tangentially related: the big list of usernames that should be disallowed
in any online system: [https://github.com/forwardemail/reserved-email-
addresses-lis...](https://github.com/forwardemail/reserved-email-addresses-
list)

~~~
DominikPeters
Ugh, that list might be why my email address mail@[personal domain] is
forbidden more and more often.

~~~
Mandatum
I use a catch-all and mark some handles as spam once a website has been
hacked, or the email has been caught by spammers.

I trust you've probably adopted a similar workaround for problem websites.

For throwaway websites (where I use a password I have no intention of keeping
track of), I often sign up as "spam@domain".. This is surprisingly blocked in
a lot of instances.

------
montroser
Almost related: [https://github.com/LDNOOBW/List-of-Dirty-Naughty-Obscene-
and...](https://github.com/LDNOOBW/List-of-Dirty-Naughty-Obscene-and-
Otherwise-Bad-Words)

~~~
dorgo
What? only 151 russian words? The russians have an own dedicated sub-language
which consists solely out of bad words. No idea or concept is too complicated
to be expressed in bad words alone. They switch from normal russian to bad
words russian as soon as the situation allowes it.

~~~
egypturnash
[citation needed], please, I wanna read about this beautiful gem of a sub-
language!

All I can find with some quick searching is Wikipedia's page on 'mat'[1],
which seems to be pretty similar to Carlin's list of Seven Words You Can't Say
On TV[2] rather than an entire language of vulgarity.

1:
[https://en.wikipedia.org/wiki/Mat_(Russian_profanity)](https://en.wikipedia.org/wiki/Mat_\(Russian_profanity\))
2:
[https://en.wikipedia.org/wiki/Seven_dirty_words](https://en.wikipedia.org/wiki/Seven_dirty_words)

~~~
dorgo
I don't really have a citation, just personal experiences. But your wiki link
already states: "David Remnick believes that mat has thousands of variations"

The german version of the wiki article has some examples:

пиздеть (pisdet′) = to (tell a) lie, but also possible: to steal.

пиздец (pisdez) = ruin or catastrophe, fubar, fucked up

As you can see, small variations of the same word have different meanings. And
the meaning can vary with context.

Edit:

The russian version of the article has some example verbs (all derived from
the same word): I tried to translate as far as I understand.

ебануть = ?,

ебануться = to get stupid ( to say or do something stupid ),

ебаться = to do something - very generic,

ебиздить = ?,

ёбнуть = ?,

ёбнуться = ?,

ебстись = ?,

въебать = to get something into something?,

выебать = to get something out of something?,

выёбываться = to fuck around. To decline something to somebody.

доебать = ?

доебаться = to annoy somebody by asking too many questions or trying to get
something out of somebody.

доёбывать = to bring something to an end. to finish something,

заебать = to annoy somebody, to get to somebody

заебаться = to get fed up with something, to get tired of something,

наебать = to betray somebody?,

наебаться = to get saturated by something, to get enough of something

наебнуть = to cheat sombody,

наебнуться = for the fun of it, just do it,

объебать = to get a piece of something, to get familiar with something?,

объебаться = to get into something nasty, to fuck up,

остоебенить = blow you mind (not sure about this one)

остоебеть = ?,

отъебать = to fuck with something/somebody, to fight, to degrade somebody,

отъебаться = get rid of something, for example to pass exams or get rid of
obligations,

переебать = (maybe) to understand something,

переебаться = to fuck over something, to get something done,

поебать = do something (depending on context),

поебаться = to do nonsense, try hard to no avail, fool around,

подъебать = ?,

подъебаться = ?,

подъебнуть = to make joke, for example april the 1., even flirt?,

разъебать = to ( accidantly or pusposfully ) break something,

разъебаться = to clear a relationship, to bring everything to order?,

съебать = to get away (maybe in a cool way),

съебаться ( to fuck off, to stop anoing ),

уебать = to run away (maybe after stealing something)

~~~
teetertater
This list mostly corresponds to my personal experiences as a native Russian
speaker, and I've heard 70% used in the wild. To fill two of the ?'s:

    
    
      ёбнуть = to f*ck
    
      ёбнуться = to stumble or injure oneself
    

What's interesting is that this whole list only uses one root word, of which
there are 3 more

------
Udik
.

.

.

.

.

.

ด้้้้้็็็็็้้้้้็็็็็้้้้้้้้็็็็็้้้้้็็็็็้้้้้้้้็็็็็้้้้้็็็็็้้้้้้้้็็็็็้้้้้็็็็
ด้้้้้็็็็็้้้้้็็็็็้้้้้้้้็็็็็้้้้้็็็็็้้้้้้้้็็็็็้้้้้็็็็็้้้้้้้้็็็็็้้้้้็็็็
ด้้้้้็็็็็้้้้้็็็็็้้้้้้้้็็็็็้้้้้็็็็็้้้้้้้้็็็็็้้้้้็็็็็้้้้้้้้็็็็็้้้้้็็็็

Wow, what's this? :)

~~~
majewsky
Layers upon layers of combining diacritics.

------
13415
I don't quite understand the purpose of this list. It contains potentially
malicious input, but also emoticons based on Unicode characters that are
completely harmless and used in every second post on Reddit.

~~~
MauranKilom
It's essentially a test suite for character encoding all throughout your
application. If you input all those strings (e.g. send chat message) and they
arrive incorrectly at some other end (e.g. other user receiving chat message)
then there's a problem somewhere.

~~~
13415
That makes sense. Thanks a lot! Of course, it's very useful for testing. I
erroneously assumed it was for input validation.

------
chris_wot
Strongly advise not using cat on the list, you will get beeped at.

~~~
fareesh
would that be considered animal abuse :D

------
monax
Yup, can't view the file using the GitHub app for Android

~~~
minimaxir
Out of curiosity, what happens when you try to do so?

~~~
Johnjonjoan
Something went wrong

<button>TRY AGAIN<button>

Edit: as far as I could see it's only opening blns.txt that causes this error
the other files are fine in the app.

------
gitnewbie
The latest commit message of README is "Merge branch 'master' into master"
[1]. As someone who doesn't do git, what does that even mean? Does git allows
multiple branches with same names?

[1]: [https://github.com/minimaxir/big-list-of-naughty-
strings/com...](https://github.com/minimaxir/big-list-of-naughty-
strings/commit/84c8b77529dfdfaf3d28d597681dff5f633aa91b)

~~~
colinchartier
The GitHub flavor of git essentially has two parts:

Repository (i.e., minimaxir/big-list-of-naughty-strings), and branch (i.e.,
master)

In this case, someone merged eliabieri/big-list-of-naughty-strings at master
to minimaxir/big-list-of-naughty-strings at master via a pull request [1].

[1] [https://github.com/minimaxir/big-list-of-naughty-
strings/com...](https://github.com/minimaxir/big-list-of-naughty-
strings/commit/eec4732aac0d42107e1cc3701cc4a8be7ff351de)

------
gerdesj
Jimmy Clitheroe - the Clitheroe Kid. That brings back some memories. It's also
nice to see that England is suitably represented in the place names, obviously
Scunthorpe is the classic. I'll tender Somerset for first amongst equals for
daft and downright odd place names.

------
bloody-crow
We need a similar list of weird unconventional emails to make sure every new
registration form won't erroneously reject a valid email.

The number of times I get validation errors or some unexpected crashes when I
enter my fairly pedestrian email with + sign in it... Jeez.

------
toolslive
[https://github.com/minimaxir/big-list-of-naughty-
strings/blo...](https://github.com/minimaxir/big-list-of-naughty-
strings/blob/master/naughtystrings/internal/resource.go#L1228)

just lovely ;)

~~~
duggable
This one got me:

> "If you're reading this, you've been in a coma for almost 20 years now.
> We're trying a new technique. We don't know where this message will end up
> in your dream, but we hope it works. Please wake up, we miss you."

Strangely terrifying....

~~~
willismichael
The question is, which one of us is the message meant for?

~~~
naniwaduni
Why would there be more than one of you?

~~~
myself248
Why are there so many of me posting in this thread?

~~~
jbay808
Username checks out.

------
hunter2_
Neat. I recently found that googling the Japanese Post Office emoji results in
a totally borked SERP (cross-browser, desktop, including desktop mode on
Android Chrome). I assume there are other characters as well.

------
bullen
I just do:

    
    
      .matches("[a-zA-Z0-9.\\-]+")
    

And prepared statements or my own NoSQL.

