I really like your Hidden Unit Zoo here http://colinmorris.github.io/rbm/zoo/ as a window into what this thing is actually "thinking" about. The "top matches" for a given hidden unit are pretty helpful.
Your examples just look like spelling errors ("contact", "framework", "Word", "square".) I got a lot of ones that looked indistinguishable from real repos, at least on first glance. E.g., "WebDashApp", "PlayFrameProject", "check-bat", "language-1", "data-cores", etc. As a random sample of just the first 5 I got.
I wonder what libdog would do. Is it meant to be used by dogs, or used by humans to interface with dogs. Or perhaps an accessibility library to make regular apps usable by dogs.
Yep, I got tons of normal-looking ones as well - had to dig to find some of these, after I found HelloWaurd and was tickled by it.
The thing I find interesting and amusing - which I guess is common to both Markov chain generative models and this RBM-driven generator - is that the 'errors' still follow the rules of the English language pretty well, and they're all pronounceable.
I wish this was restricted to just the 3.9 million repos of the JavaScript build systems and task runners. The remainder of the repos just tainted the training set.
I can only assume that "Scalp_game" is either a new PC title that involves phrenology, or a means by which shampoo companies can alter product ratings.
It's guaranteed to not exist in the training set (i.e. to not be a real GitHub repo that existed before 2015). But the model can certainly repeat itself.
One thing that I find amazing about this job is how it retarget the purpose of a neural network. This is not a classifier, this is not a here-are-a-bunch-of-fuzzy-images-tell-me-what-character-it-is.
This using knowledge, combines the fundamentals to create something new and plausible. A thin shadow of imagination?
It's not a new idea. Since char-rnn came out, people have been using it to generate music, generate fake linux code, write Shakespeare, make chatbots, etc. And before that people were using markov models to do the same things. There have been a lot of markov generators including a fake headline generator, and a subreddit populated entirely by markov chain bots.
it will be great if you can add some optional features to the interface, like the programming language(python, ruby, java, etc.), the category of the repo (game, development, database, etc), etc