Hacker News new | past | comments | ask | show | jobs | submit login
How We Used Machine Learning to win at HQ Trivia (mux.com)
49 points by bdod6 on Jan 12, 2018 | hide | past | web | favorite | 23 comments

To any future hackers: You don't need OCR. HQ has a simple websocket server that will stream the questions and possible answers in real time. Set up an http proxy on your phone to inspect the requests the app is making. You'll find lots of helpful stuff.

Ooh. Sounds interesting. I've taken an OCR approach before (see my profile for post) since I thought the iOS app had cert pinning, but this method takes the cake and (presumably) will be faster in a game situation.

That works until the app starts using certificate pinning.

Author here: At mux, we experimented using machine learning to predict HQ Trivia answers. We managed to get 80-90% accuracy across a dataset of around 500 questions.

The trickiest questions were relational questions (e.g. What's heavier, a pineapple or a Siamese cat?). Would appreciate any feedback on our approach (and happy to answer questions!).

Time to bring in the big questions. The tortoise is on its back. And you are not helping it. Why?

I think anyone playing HQ should be encouraged by our results. I know a lot of people turned off by playing because of all the "bots" playing. Based on our analysis and results from over a hundred games...I think it's clear that bots are not sophisticated enough to solve HQ.

We're also using a more sophisticated approach than most bots I've read about, and we continuously train out model on new data. Even so, we would only expect to win 7 out of 100 games.

Our goal was not to hurt the HQ community, but rather to challenge ourselves into solving a difficult data science problem.

Pretty sure they were making a Blade Runner reference


How is this different from coding, say, a wall hack in an online FPS?

There's a lot less teabagging when we win.

I thought this was going to be a retrospective from the HQ Trivia team about how they were mediocre given the scaling challenges and hiccups they are facing and then they solved it through ML!

This seems pretty misleading, since honestly 99% of the machine learning that goes on here happens when running the questions/answers through Google Search. There are probably millions of man-years of machine learning / information retrieval that have gone into Google Search.

The concept of machine learning is pretty misleading, because it's founded upon billions of man-years of human learning

Then we find HQ eventually pivots to being a machine learning research platform once someone invents a perfectly scoring bot ;-)

Joking aside, I'd say HQ Trivia are getting savvier with the questions. A final question the other day was along the lines of "Which two female artists collectively have the same number of Grammys as Beyoncé?" with the answer being "Adele + Madonna", I believe.

Yep. I actually mention that specific question in the article as an unsolvable question for machine learning, at least given our current constraints.

Those are generally rare questions though because difficult questions for bots are also difficult questions for humans. HQ can't have too many of those questions without degrading the player experience.

Because of that, I don't think we will ever get beyond 10/11 questions right per game. That still leads to a decent chance at winning at least one game per week though.

Ha, so you did! I got down to the Nick Hornby question which reminded me of it and then commented here ;-)

You are right about the experience issue, though. It feels almost like they're trying to make it so you can't win with questions worded in that way, since even someone who actually knew the numbers of Grammys all the artists listed had would struggle to add them together in time.

Yep! We felt the same way when we saw that question. I think HQ will prioritize questions that are hard but still feel possible. Otherwise their engagement will start dropping off.

Is there a dataset of past HQ questions and answers?

Yes, we have been archiving each game going back to October. We augment the questions and answers though so that we get more relevant results when run our web scrapes.

I don't suppose you'd be willing to publish what you've gathered at some point in the near future?

We might in the future, but we're unsure what the copyright is on those questions.

Would be awesome if you guys publish the raw data (questions and answers). Would love to try to build a model as a learning exercise.

We're not sure about the copyright, but that's something we will look at doing if that's not an issue.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact