
Generating Factoid Questions with RNNs: The 30M Factoid Question-Answer Corpus - fitzwatermellow
http://arxiv.org/abs/1603.06807
======
mark_l_watson
Great project and I enjoyed reading the paper. Thanks for posting it.

I was on a DARPA neural network advisory panel in 1998 and 1999, and I used
simple 1 hidden layer or 2 hidden layer back prop networks for several
projects.

My mind is blown by both the computational tricks for training many layer
networks and the advances of using GPUs to train large networks. We built our
own neural network hardware that was excellent for the time, but the progress
in the last decade is enormous.

Still I also believe in the power and utility of so-called 'symbolic AI', but
I think I am in the minority.

------
pserwylo
This is the type of project which really interests me. In the future we plan
on building a service to wholesale trivia questions to pubs, schools,
fundraisers, etc. I'll read this paper with interest, because at a quick
glance the generated questions do sound great.

The next step from generating good quality questions that sound natural, is to
make sure the the questions are up to date and topical. This means keeping up
with current affairs, memes, popular cultue, and other trends. Adding
questions about "which politician is in trouble for x" really only remain
relevant for about a week or two and sit in a different category than these
pure fact style questions presented here.

It would be great to see this research team or others to continue in this
direction, and look at an increasingly broad metrics of what makes a "good"
question, in addition to whether the English phrasing sounds good.

Nevertheless, this really does sound great.

~~~
lifeisstillgood
This was my first thought too - not "hey pushing back the boundaries of
computer science" but "hmmm, way to put pub quizmasters out of business".

~~~
dave_sullivan
Read this paper from July of last year and it'll make more sense as to why
this research is interesting:
[http://arxiv.org/pdf/1506.05869.pdf](http://arxiv.org/pdf/1506.05869.pdf).

Seeding a dataset with human labels, then generating new, better data is
pretty cool. Similar to deep mind watching go games, then learning to play
better than said games. Add to which, humans can't tell the difference between
the human generated data and algo generated data.

We're about 3 months from where you can bootstrap a system like microsofts
with freely available code and data. From there, if "memory modules" start
working better so models can "remember specific context" (it can already
remember general context), you'll have a bot that's a pretty good model for
passing the Turing test.

I'm defining a practical Turing test as text-based, human gets 5 questions,
other party replies 5 times, then human must predict: human/not human.

------
tyingq
I'm confused as to how this ends up being useful.

Using an example from the paper:

Q: who’s an american singer that plays pop music?

A: nikki flores

So, the generated question and answer match, but the answer isn't exclusively
the right answer. There are thousands of "right answers".

They do note this: _" We have also observed that the questions are often
ambiguous: that is, one can easily come up with several possible answers that
may fit the specifications of the question."_

But they don't say anything more about it. I suppose that's just out of scope
for them, and something someone else works on?

------
0xdada
Unfortunate title.

factoid - an invented fact believed to be true because it appears in print.

~~~
return0
\- or -

2 : a briefly stated and usually trivial fact.

~~~
dingaling
_-oid_

    
    
        Of similar form to, but not the same as.

~~~
Jtsummers
This doesn't make the second definition of factoid (in GP post) wrong.
Prefixes, suffixes, and even roots end up being used in words that don't
follow the literal definition suggested by them. They typically don't start
life that way, usually closer to the literal meaning pedants will later expect
to be used, but, overtime, usage causes a definitional shift.

