
How We Used Machine Learning to win at HQ Trivia - bdod6
https://mux.com/blog/how-we-used-machine-learning-to-predict-hq-trivia-answers/
======
jtokoph
To any future hackers: You don't need OCR. HQ has a simple websocket server
that will stream the questions and possible answers in real time. Set up an
http proxy on your phone to inspect the requests the app is making. You'll
find lots of helpful stuff.

~~~
applecrazy
Ooh. Sounds interesting. I've taken an OCR approach before (see my profile for
post) since I thought the iOS app had cert pinning, but this method takes the
cake and (presumably) will be faster in a game situation.

------
bdod6
Author here: At mux, we experimented using machine learning to predict HQ
Trivia answers. We managed to get 80-90% accuracy across a dataset of around
500 questions.

The trickiest questions were relational questions (e.g. What's heavier, a
pineapple or a Siamese cat?). Would appreciate any feedback on our approach
(and happy to answer questions!).

~~~
conanbatt
Time to bring in the big questions. The tortoise is on its back. And you are
not helping it. Why?

~~~
bdod6
I think anyone playing HQ should be encouraged by our results. I know a lot of
people turned off by playing because of all the "bots" playing. Based on our
analysis and results from over a hundred games...I think it's clear that bots
are not sophisticated enough to solve HQ.

We're also using a more sophisticated approach than most bots I've read about,
and we continuously train out model on new data. Even so, we would only expect
to win 7 out of 100 games.

Our goal was not to hurt the HQ community, but rather to challenge ourselves
into solving a difficult data science problem.

~~~
mezzode
Pretty sure they were making a Blade Runner reference

~~~
bdod6
whoosh.

------
calbear81
I thought this was going to be a retrospective from the HQ Trivia team about
how they were mediocre given the scaling challenges and hiccups they are
facing and then they solved it through ML!

------
argonaut
This seems pretty misleading, since honestly 99% of the machine learning that
goes on here happens when running the questions/answers through Google Search.
There are probably millions of man-years of machine learning / information
retrieval that have gone into Google Search.

~~~
xkcd-sucks
The concept of machine learning is pretty misleading, because it's founded
upon billions of man-years of human learning

------
petercooper
Then we find HQ eventually pivots to being a machine learning research
platform once someone invents a perfectly scoring bot ;-)

Joking aside, I'd say HQ Trivia are getting savvier with the questions. A
final question the other day was along the lines of "Which two female artists
collectively have the same number of Grammys as Beyoncé?" with the answer
being "Adele + Madonna", I believe.

~~~
bdod6
Yep. I actually mention that specific question in the article as an unsolvable
question for machine learning, at least given our current constraints.

Those are generally rare questions though because difficult questions for bots
are also difficult questions for humans. HQ can't have too many of those
questions without degrading the player experience.

Because of that, I don't think we will ever get beyond 10/11 questions right
per game. That still leads to a decent chance at winning at least one game per
week though.

~~~
petercooper
Ha, so you did! I got down to the Nick Hornby question which reminded me of it
and then commented here ;-)

You are right about the experience issue, though. It feels almost like they're
_trying_ to make it so you can't win with questions worded in that way, since
even someone who actually _knew_ the numbers of Grammys all the artists listed
had would struggle to add them together in time.

~~~
bdod6
Yep! We felt the same way when we saw that question. I think HQ will
prioritize questions that are hard but still feel possible. Otherwise their
engagement will start dropping off.

------
nicolashahn
Is there a dataset of past HQ questions and answers?

~~~
bdod6
Yes, we have been archiving each game going back to October. We augment the
questions and answers though so that we get more relevant results when run our
web scrapes.

~~~
nicolashahn
I don't suppose you'd be willing to publish what you've gathered at some point
in the near future?

~~~
bdod6
We might in the future, but we're unsure what the copyright is on those
questions.

