Hacker News new | past | comments | ask | show | jobs | submit login

Great stuff. Found a real word:

  refactoring

  the systematic and systematic reworking of a piece of text to reduce unnecessary redundancies
Interesting that the made-up definition is pretty much the real one.

https://www.thisworddoesnotexist.com/w/refactoring/eyJ3IjogI...




Got a few comments about that one and a few others! I just updated the existing word blacklist and re-deployed.


"fibroblastosis" -- appears to feature in some medical journals

"bryosphere" -- something to do with moss

"biosprint" -- a brand of yeast.

Many of these 'non-words' have already been taken....


Oh, that second one is cute. I read up on bryophyes (moss and friends) for an exceedingly brief stint back in my undergrad days. Pronunciation similarity to bio made for many "bryo" puns.


I got Undercrowded

  undercrowded
  un·der·crowded
  (of a place) not full of people or vehicles
  "the area was undercrowded with traffic"


Oh, I'm surprised the "blacklist" isn't just the standard English dictionary. I'm sure I'm just being naive though. Why not just blacklist any word that already exists in English?


There are some subtleties (e.g. hyphens, derived forms, bigrams, etc.) but the biggest problem is that most English dictionaries don't have entries for every scientific word / piece of internet slang. I ended up tokenizing Wikipedia for a blacklist and still missed a lot :(


> I ended up tokenizing Wikipedia for a blacklist and still missed a lot :(

That sounds like an impressive project in itself :)


Words not on Wikipedia, found on other sources, listed by frequency (perhaps with a date-weighting of the source document to reduce rating of older sources), would be an interesting way to find holes in Wikipedia's coverage.


Someone should make a Wikipedia page of that list. Oh, wait.


I like how you had information, made a sarcastic comment about it, but didn't share the actual information ... just in case your comment might prove helpful ...


Are you saying the URL of that Wikipedia page is “actual information” that patrickthebold failed to share?

I think that page doesn’t exist. patrickthebold wasn’t sarcastically mocking people who were too lazy to look up that page. He was just making the point that as soon as a hypothetical list like that was uploaded to Wikipedia, it should be deleted, since those words would then be words found on Wikipedia.


That may be because there isn't a single definitive list of all words in the English language

https://www.lexico.com/explore/how-many-words-are-there-in-t...


If you're using a blacklist, it it really machine learning? Or are you using it to re-train?


blacklist is probably to avoid cases where it randomly generates a real word like above two cases, so that blacklist filter is probably applied after the ml stuff.


Yup, the line for the blacklist lookup is here: https://github.com/turtlesoupy/this-word-does-not-exist/blob...


Data scientist here. It's common to define boundaries for a machine learning algorithm by hand. Think of telling a chess AI that it can't move pieces off the board.


Unspooled and Hardstyle popped up for me. Perhaps you should do a google search for generated words before displaying them to prevent existing words from being shown.


Or maybe the ML algorithm behind is just using an existing dictionary and performs no operation.


You flatter me by thinking so; it is a bot! The source code is open here: https://github.com/turtlesoupy/this-word-does-not-exist


It gave me "Intermodulate" a minute ago. https://en.wikipedia.org/wiki/Intermodulation


That ones a little more fuzzy; intermodulate doesn't occur very much in discourse (e.g. not in the wiki article at all) even though it would naturally be related


https://www.thisworddoesnotexist.com/w/bordellum/eyJ3IjogImJ...

Bordellum. It's a word, I think?

https://www.thisworddoesnotexist.com/w/disaproval/eyJ3IjogIm...

Disaproval. So close to a real word it looks like a misspelling.

Anyway, this is really interesting.


For the latter, press the "Write Your Own" button and it'll do exactly that

In general, I don't fix a random seed so in general you can get different definitions (sometimes: I cache data)


Thanks!


I got "cyberpolice" which seems to be a real word.

I also got "deflategate" which i love - it must be an upcoming scandal! :-)


Deflategate was a National Football League (NFL) controversy involving the allegation that New England Patriots quarterback Tom Brady ordered the deliberate deflation of footballs used in the Patriots' victory against the Indianapolis Colts in the 2014 American Football Conference (AFC) Championship Game.

https://en.wikipedia.org/wiki/Deflategate


> it must be an upcoming scandal!

It was, in 2016.

https://en.wikipedia.org/wiki/Deflategate


Multiple people managed to find it? How likely is it to generate the same non-word more than once? Is there a limited set?


I also got «invention» up as a result.


I just saw "gofundme".


I got "pataphysical".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: