
Show HN: This Question Does Not Exist - yeldarb
https://stackroboflow.com
======
yeldarb
I created this site using a Fast.ai trained language model using the Stack
Overflow data dump.

Full writeup available here:
[https://stackroboflow.com/about/index.html](https://stackroboflow.com/about/index.html)

Interesting things I’ve noticed so far:

* It does a remarkably good job of context switching between programming languages based on the semantics of the question! If the question is about SQL it often includes SQL in < code > tags. If it’s about JavaScript it will include JavaScript! The syntax isn’t perfect due to the tokenizer mangling some things but it’s pretty close!

* The English grammar isn’t perfect but it’s pretty good.

* It doesn’t seem to lose closing track of closing tags and quotes.

* It's learned to sometimes pre-emptively thank people for their answers and to "edit" in "updates" at the end of the post.

If you find any interesting ones you can share them with the permalink! Use
the "Fresh Question" button to load a new one.

~~~
arendtio
> __Answering Questions __Right now the model only generates questions. In
> version 2 I want to train it to answer questions. If I could get this
> working it 'd actually become a useful tool instead of a fun toy.

Looking forward to that part :D

I mean, those answers are probably not going to be correct, but I wonder how
close they will be to something useful.

~~~
drilldrive
Yes, many times the questioner does not actually need an answer to the
question, he just needs to look a little closer to the situation, which is
potentially able to be automated. But one should not disguise such automation
as an 'answer': more like a query autocheck but more tooled-up.

~~~
wpasc
I wonder what percentage of questions just need a correctly working example
because the questioner is unsure of how to use a given API. Automation of this
I imagine could actually be doable.

------
jerf
"I have creating a PNG Image file where I am printing out the image's with
different colors and Image Types. Now I am sure I am drawing properly, but
what I'm seeing is that the image is not differently jpeg (ie FF or Chrome)
and Safari (for Firefox) is different from the one in Firefox. "

As a bit of a connoisseur of babblebots over the decades, one of the
interesting things about this generation is that it is producing text that has
a very interesting effect in my mind. There is a part of the parsing process
where the above text went down smooth; yup, that's what Stack Overflow
questions from early developers tend to look like. That part of my brain
issues no objection. But the next layer up screams bloody murder about how
nonsensical that is. And it's not just "that's a bad question but I still see
the order under it", but nonsense.

It's a combination I've not experienced before. Previous generation babblebots
could often produce a lot of fun text, but every processing level above raw
word processing has always been able to tell it's computer garbage, even when
it blundered onto a particularly entertaining piece of garbage. We've actually
successfully moved up a level here.

As I'm describing subjective experience, YMMV.

~~~
x1798DE
The experience you are describing reminds me of the comparative illusion [0],
which is a grammatical illusion where certain sentences seem grammatically
correct when you read them, but upon further reflection actually make no
sense, example:

"More people have been to Berlin than I have."

[0]
[https://en.wikipedia.org/wiki/Comparative_illusion](https://en.wikipedia.org/wiki/Comparative_illusion)

~~~
Zanni
Fascinating. There's a sentence I picked up from a friend in childhood,
"Although the Moon is only an eighth the size of the Earth, it is much farther
away," which seems to be similar, but not quite a CI, if I'm reading the
Wikipedia article correctly. Thanks for the link.

------
aur09
This one is golden:

"What's the best way to indeed start a process on an OS x machine?

What is the best way to start a process on Mac OS x Snow Leopard?

There I just need to be able to run the OS x.exe from the command line and
it's working fine (make it available in Windows). But I'm on an Mac and I
haven't figured out how to do this for a Linux machine.

Another reason I ask is that I only have a Unix shell running with the Python
process in it (it's my an Ubuntu machine, nothing didn't work in the shell).

Thank you in advance"

------
DoctorPenguin
[https://stackroboflow.com/#!/question/16993](https://stackroboflow.com/#!/question/16993)
Thanks Steve

Without too much sarcasm I have received support requests that were far too
close to this.

\---

Laughing way to hard, but is it having a stroke?
[https://stackroboflow.com/#!/question/14791](https://stackroboflow.com/#!/question/14791)

~~~
turtlebraile
This reminds me of the Ponzo famous line on Becket's Waiting for Godot.

------
apo
I propose a new kind of Turing Test.

Gather equal numbers of the least intelligible questions from SO (possibly
using a metric based on low views/upvotes/comments/answers over long time) and
a random selection from stackroboflow.

Present human judges with both sets of questions and ask them to tell the
difference.

Having read numerous SO questions from newbie developers whose grasp of
English was tenuous at best, I doubt I could tell the difference.

The next step up: the same test, but with mathematics or scientific papers
judged by non-experts in the field.

We may actually be there already - I'm not sure.

All of which makes me wonder when we'll reach the point where the bar has been
raised so high that the comparison will need to be against the best SO
questions and scientific/mathematics papers judged by subject matter experts.

------
faizshah
This is the most prototypical stackoverflow question I have ever seen:

[https://stackroboflow.com/#!/question/21467](https://stackroboflow.com/#!/question/21467)

~~~
anonytrary
Hah. I raise you
[https://stackroboflow.com/#!/question/2222](https://stackroboflow.com/#!/question/2222)

------
autechr3
"View from View to View, Need to open this new View in View"

[https://stackroboflow.com/#!/question/4781](https://stackroboflow.com/#!/question/4781)

A common problem for everyone, i'm sure.

~~~
dylan-m
I got a surprisingly comprehensible (and similarly recursive) ListView
question:
[https://stackroboflow.com/#!/question/15327](https://stackroboflow.com/#!/question/15327).

~~~
kruczek
Similarly this one:
[https://stackroboflow.com/#!/question/19297](https://stackroboflow.com/#!/question/19297)

Sounds like jumpstarting your userbase would be easy, once you allow users to
define other users :)

------
avip
[https://stackroboflow.com/#!/question/22733](https://stackroboflow.com/#!/question/22733)
I hate when fellow coders do that.

Another pearl: _Creating a PDF from PDF. The situation is as follows: We have
a video file hosted by Google Map_.

It's like reading a doco-satire about my life.

------
Findus23
I can claim to have experience [0] with generating funny nonsense based on
Stackoverflow data (what a wired thing to say :))

Seems like you beat me to my plan to make a Neural Network based variant and I
really like the results (especially that they stay a topic instead of totally
drifting off into fun nonsense like my Markov Chains.

Have you tried also using other Stackexchange sites as a source? In my
experience they result in more fun questions as they have more "human"
interactions (especially the more personal advice based sites) which creates
things like: \- Do Greeks driving affect the whaling industry? \- Essential
windsurfing equipment to fish? \- Do mountaineers eat grass? \- Can I toast

[0]
[https://news.ycombinator.com/item?id=16947038](https://news.ycombinator.com/item?id=16947038)

~~~
yeldarb
I haven't yet! It's on my list of things I'd like to try.

------
gudok
I reviewed 1600 edits at StackOverflow. And I can say that some of the
automatically generated questions are more intelligible than the average SO
question. For example, this one looks fine to me:
[https://stackroboflow.com/#!/question/11235](https://stackroboflow.com/#!/question/11235)

~~~
inostia
It's so close to being intelligible, but I still can't quite parse it, like so
many actual SO posts.

------
natch
Fascinating. I wonder if our current discussion boards on the interwebs can
survive the coming influx of content like this and the next generations of it
that follow.

There are a lot of SO questions posted by very weak non-native speakers of
English and some of these are hard to distinguish from those. Kind of scary!

What possible positive outcomes do you see for this kind of (admittedly
inevitable) capability?

~~~
yeldarb
I am actually a bit worried that I’m already starting to see search engine
traffic coming in...

I hope that the good will outweigh the bad. I’d love to create an _answer_
generator, for example.

Once enough questions are generated I’m going to try creating a classifier to
see if a neural net can differentiate between real questions and fake ones.

~~~
triplewipeass
You could put up a robots.txt denying all search engines.

~~~
natch
But the issue is not just what he could do, but what malicious content
generation systems could do.

~~~
triplewipeass
The issue is, as stated:

> I am actually a bit worried that I’m already starting to see search engine
> traffic coming in...

We can discuss hypothetical systems that could maliciously flood us with
generated content. The creator of this particular service which is being
discussed here and now could also begin taking steps to ensure that his
creation does not inadvertently create a problem for some hapless Google user.

~~~
natch
Well, no. You have to read further up the thread to see the issue I was
referring to.

>I wonder if our current discussion boards on the interwebs can survive the
coming influx of content like this and the next generations of it that follow.

Yes the robots.txt is a good and trivial step he could take to ensure well
behaved robots do not pick up his content. So your comment suggesting
robots.txt is a good comment in its narrow frame, but one that missed the
larger picture. That minor problem is solved. The interesting problem is of a
different nature.

------
9dev
I was cycling through some answers, when suddenly the following, completely
unrelated text shows up in a random code block:

I can feel the admin is different

You sure you didn't just accidentally create a self-aware AI? Forgot to
permalink sadly

------
motohagiography
There is immense value in training these to synthesize test data sets for
sensitive information you can't safely put in a preprod environment.

Health information would be the main case I can think of now.

Having synthesized data for testing new services in govt would be a huge
improvement.

De-identification is basically impossible and there are a bunch of companies
who will lie to you if you pay them to, but synthesized data covers many use
cases for de-identification and for homomorphic encryption.

------
ggambetta
Reminds me of [https://git-man-page-generator.lokaltog.net/](https://git-man-
page-generator.lokaltog.net/), which I always found hilarious :)

~~~
arendtio
> This is NOT real git documentation!

YMMD :D

------
kristianc
Awesome. Can you create a neutral net that arbitrarily closes questions as off
topic or non constructive? ;)

~~~
yeldarb
It’s something I’m interested in!

Unfortunately I’ve come to the conclusion that upvotes on Stackoverflow aren’t
correlated with question content (or I’m not skilled enough to be able to
differentiate between “good” and “bad” questions). Check out the linked write
up for more detailed info.

~~~
deckar01
> arbitrarily

I think the original comment is being sarcastic and suggesting that the actual
humans that close discussions and mark them as "off topic" don't understand
the question and perform these actions at random. This is a sentiment shared
by many who don't "live" in those types of forums.

~~~
yeldarb
Ah, missed the operative word there. I could definitely do that! ;D

------
jefb
I think we've all been here before:

"i've been asked to use Json to call a webservice. I don't modify a JSON
object at all. However, when calling JSON returned by the Json object, it
fails because the object life isn't array!"

[https://stackroboflow.com/#!/question/24101](https://stackroboflow.com/#!/question/24101)

------
tom_usher
Excellent! It's great when it tries to generate code:
[https://stackroboflow.com/#!/question/8138](https://stackroboflow.com/#!/question/8138)
(the last line here made me laugh)

~~~
MagnificentSpam
This one looks like a genuine java program.

[https://stackroboflow.com/#!/question/29584](https://stackroboflow.com/#!/question/29584)

``` Cat cat = new Cat(); Cat cat = new 2nd Cat ");" ```

------
syllable_studio
Very fun. But how am I supposed to help Charset solve their urgent problem?
I'll just answer here.

Q: How to use a JSON string in a funky way
[https://stackroboflow.com/#!/question/49913](https://stackroboflow.com/#!/question/49913)

A: Dear Charset, I hope this might resolve your issue.

window.location = JSON.parse('[{"use":
"[https://www.youtube.com/embed/0ROzGihgCj8?rel=0&amp;autoplay...](https://www.youtube.com/embed/0ROzGihgCj8?rel=0&amp;autoplay=1;fs=0;autohide=0;hd=0;"}\]'\)\[0\].use;)

------
rcthompson
[https://stackroboflow.com/#!/question/12875](https://stackroboflow.com/#!/question/12875)
"It works fine in GCC but it does not work in GCC / GCC."

------
joshvm
Absolute gold: "Is there a animal out there that someone can apply to do the
sort of thing I'm looking for?"

Not sure what happened to the title that time.

[https://stackroboflow.com/#!/question/11716](https://stackroboflow.com/#!/question/11716)

(perhaps op is a vim user)

~~~
kyle-rb
Try a Python or a Pony, or maybe even an OCaml.

~~~
owl57
Probably OCaml: that looks like an ML-style function signature.

------
kyle-rb
Oh wow, this is amazing. My favorite so far is:
[https://stackroboflow.com/#!/question/22890](https://stackroboflow.com/#!/question/22890)

>How can I do this software?

Although it sounds more like a question from Quora.

~~~
woodrowbarlow
having spent some time in the triage & edit queues, this 100% sounds like
stackoverflow.

------
silveroriole
This is great... “I'm getting errors with Line 1, Line 39, Column million” lol

~~~
skykooler
When trying to pack your code into a one-liner goes too far...

------
sampleinajar
Nice! This is also what every question looked like when I was new to
programming.

------
mannykannot
I'm voting to close as duplicate.

~~~
mormegil
Right, after adding tags and answers, comments need to be added as well...

------
ZoomZoomZoom
Oh, great, now I know how my clueless questions look like to a knowledgeable
person! Example:

>"I need to create an image from a imported wav file (for a user - friendly
format find enough header for the cookie). I looked for a solution, but that
didn't work either."

------
hiccuphippo
> ... Thinking I have to use the first two but it's not possible to use
> Jquery.

> So: Is it recommended to use a Perl function

This is just like the real thing.

------
jpatokal
I presume this is due to tokenization or something, but there's a lot of extra
whitespace in the code samples that make them look very unrealistic:

    
    
      def _ _ init__(self, default): 
      " " " 
      See if the default value for the field on a view is 
      timespan. 
      " " " 
    
      < select > 
      < option > value < / option > 
      < option > value < / option > 
      < / select >
    

And indentation is also missing completely. Maybe you need to use another NN
to guess which language the fake code is in and autoformat it accordingly!

~~~
yeldarb
It is, the tokenizer isn't reversible (and it adds spaces all over the place).

But a lot of these I should be able to add to my regex that converts the
output back into more human readable format (in the raw output, there's a
space before every punctuation mark so I already remove those extraneous
spaces from periods, commas, etc).

I just haven't gotten around to adding in any heuristics specifically for code
but adding a bit more post-processing is on my to-do list.

~~~
yeldarb
I updated my regexes to clean up some of the tokenizer noise last night. So
many of the formatting in the code snippets should look a bit more natural
now.

------
hyperpallium
Comgratulations, you have simulated a million monkeys at typewriters with a
million monkeys at typewriters. Has anyone really been far even as decided to
use even go want to do look more like?

------
inostia
This one boggles my mind, it even has code:

[https://stackroboflow.com/#!/question/25131](https://stackroboflow.com/#!/question/25131)

------
code_duck
Final question didn’t end in a question mark - perfect!

“I want to do something like this

$ _ -1 = object();”

We all do....

Now I see that the virtual question is different every time. Great work. It
read better than most SO questions.

------
8bitsrule
This one is clearly written by a broken agent:
[https://stackroboflow.com/#!/question/1450](https://stackroboflow.com/#!/question/1450)

Reminds me of those online chatbots I used to torture back 10 or 15 years ago.
One I started asking about personal information about its creator. It was
remarkably evasive, constantly attempting to switch the subject.

------
joshvm
Another gem:
[https://stackroboflow.com/#!/question/17035](https://stackroboflow.com/#!/question/17035)

"I have got a big "someone" who will be going to be using the asp.net site.

I have a black box and a background in firefox, where they have a width of
100%.

They will never know of a color.

They come from a background color."

Film starring Liam Neeson?

------
dugluak
Every good invention can be terrifying if it falls in the hands of bad guys
(Nuclear technology for example). It's true for AI also. I am sure bad guys
must be training similar AI agents by only feeding fake news, conspiracy
theories etc. and it's easy to build AI agents as there is so much Open Source
material online about AI.

~~~
chrisco255
I'm trying to imagine a productive use case for this? Maybe in reverse for
attempting to answer questions?

~~~
jdefr89
Think things like election meddling. Propagating truly fake news to cater to
the emotions of what people simply want to be true. Humans are weak against
Confirmation Bias, ten minutes on Facebook will show you for sure.

~~~
cpeterso
Yes. That was the rationale OpenAI made just a few weeks ago to not release
their new language models:

[http://approximatelycorrect.com/2019/02/17/openai-trains-
lan...](http://approximatelycorrect.com/2019/02/17/openai-trains-language-
model-mass-hysteria-ensues/)

------
aboutruby
I think it would still be considered a "Does Not Exist"-valid website if the
generated questions would have some auto-formatter for the code. Main issue I
see is extra spaces everywhere, often in a syntax breaking way (and missing
spaces for formatting) (not that all SO questions have those).

~~~
yeldarb
Yeah this is a shortcoming of the tokenizer. It splits things up in ways that
are not 1:1 mappable back to their source unfortunately.

I did a bit of post-processing to get it formatted a bit better (re-combining
the “would“ and “n’t” tokens and changing html tags to markdown for example)
but there’s still room for improvement.

Spacing specifically is different based on the context. Outside of code blocks
you want a space after a period. Inside you probably don’t. But since the
tokenizer has one in both places there’s no opportunity for the neural net to
learn this (it can’t see any difference). And my naive formatted doesn’t know
the difference either. (If you’re curious you can find it in the JS file)

~~~
yeldarb
I updated my regexes to clean up some of the tokenizer noise last night. So
many of the formatting in the code snippets should look a bit more natural
now.

------
TheAsprngHacker
Huh, in this question, there are a lot of words that get repeated five
consecutive times:
[https://stackroboflow.com/#!/question/13733](https://stackroboflow.com/#!/question/13733)

Is there a reason why? (I don't know anything about AI.)

~~~
yeldarb
The way the language model is trained is by rewarding it for correctly
predicting the next word in a sequence.

The output of the model is a predicted probability distribution of the next
word and a “state” — the next iteration takes the state output of the previous
interation and generates another word and state (and this process repeats many
times).

Since there’s a probabilistic dimension, what may have happened in this case
is that it happened to repeat once by chance and the model had learned that if
something repeats 2x it’s likely that it will repeat a third, fourth, and
fifth time.

Basically it’s just trying to game the loss function which rewarded it for
predicting the next word in the sequence correctly.

~~~
TheAsprngHacker
Thanks for the explanation. Your description superficially reminds me of a
Markov chain
([https://en.wikipedia.org/wiki/Markov_chain](https://en.wikipedia.org/wiki/Markov_chain)).
Is this related or is it totally different?

~~~
LeanderK
I haven't read the paper the work is based on, but if the RNN outputs a
probability distribution for the next letter/word then they form Markov Chains
(since then they only depend on the current state and not the previous state)!

RNNs are just fancy parametric functions that take a (state, input)-pair and
return a new (state', output)-pair.

------
drdaeman
This desperately needs some AI-generated expert answers!

------
deepsy
This is similar to

thispersondoesnotexist.com thisresumedoesnotexist.com

~~~
yeldarb
Yes, I was heavily inspired by them :) Glad someone made the connection!

I actually hadn't seen thisresumedoesnotexist.com yet; but I loved
[https://thiscatdoesnotexist.com](https://thiscatdoesnotexist.com) and
[https://thisrentaldoesnotexist.com](https://thisrentaldoesnotexist.com)

~~~
iforgotpassword
[https://thiscatdoesnotexist.com](https://thiscatdoesnotexist.com)

Oh my god, some of these look terrifying, pure nightmare fuel.

~~~
gwern
You might enjoy
[https://www.thiswaifudoesnotexist.net](https://www.thiswaifudoesnotexist.net)

Like the Airbnb, it's StyleGAN+GPT-2 (finetuned in this case on anime plot
synopses+summaries: [https://www.gwern.net/TWDNE#gpt-2-anime-plot-synopses-
for-gp...](https://www.gwern.net/TWDNE#gpt-2-anime-plot-synopses-for-
gpt-2-small) ).

I'm currently training an improved 'portrait' anime StyleGAN to fix up some of
the faces' issues.

------
yeldarb
Just added a browsable archive of all of the questions it has generated thus
far:
[https://stackroboflow.com/browse/index.html](https://stackroboflow.com/browse/index.html)

~~~
yeldarb
And now, tooltip previews on that page for browsing convenience.

------
sachin18590
This looks absolutely amazing! I would be very curious to know how you went
about conceptualizing the project and the AI beneath. Do you have a blogpost
on it or planning to write one?

~~~
yeldarb
Yep, there's a writeup on the site:
[https://stackroboflow.com/about/index.html](https://stackroboflow.com/about/index.html)

------
stabbles
This is very refreshing!

"I'm starting a new website using VB. I make a migration file and save it to a
local Azure database"

------
drinane
This comment is worth about as much as this website.

------
mitchtbaum
The software that wrote this comment does not exist.

------
TomMckenny
It has better grammar than the real one anyway.

------
turtlebraile
I really would like a bot like this to produce ideas of things to create with
programming in general.

Any ideas on the possible dataset?

------
booleandilemma
Can we please get a Jon Skeet neural network to provide answers?

------
droptablemain
_Giggles_ Love it.

~~~
chrisco255
Funny, yet terrifying at the same time. How can I be sure that HN isn't just a
really well trained Neural Net?

~~~
chrisco256
How can I be sure that i'm not just a really well trained Neural Net?

~~~
jdefr89
You are a very well trained neural net... The concept is based off of actual
Neurons in our brain. Can't tell if you're serious or trolling though lol.

------
hyperpallium
stackroboflow

------
drinane
This is lame.

~~~
drinane
I protest getting -4 points. They used github and stackoverflow. Wrote a
function to connect the two based on tags and then randomly generate a
question off of that. It's lame. Do something useful or cool.

