Hacker News new | past | comments | ask | show | jobs | submit login
Magic Sudoku – Solve Sudoku with the power of AR (magicsudoku.com)
134 points by mikerg87 on Sept 21, 2017 | hide | past | favorite | 57 comments

Hi, thanks for submitting. I'm the developer of the app! Happy to answer any questions

(I was going to write up a medium post about the technical side of building the app but haven't gotten a chance yet)

Hi yeldarb! This is a really neat app. I don't want to diminish your creation, so please don't think I'm trying to put you down with this question:

How much of this was just glue-code to integrate a few modules, vs. how much did you have to do from scratch?

I'm asking to try to get an idea to see if ARkit is within the reach of an average joe- sudoku solver code could be a copy/paste, digit recognition could be a copy/paste, and ARKit should provide you with some environment/location info.

It's mostly glue (but some of it is special glue).

Oddly enough, digit recognition was one of the harder parts. iOS11's Vision API has character recognition but it only gives you a bounding box, it doesn't do the OCR!

So for that I ended up ripping up a whole bunch of Sudoku puzzle books from the second hand book store and scanning them in to create a dataset. Then I trained my own nerual network in Keras and converted it to CoreML.

The actual ARKit part was really easy, even for me who has never done iOS programming before.

Have you tried to render individual digits in the existing empty squares instead of rendering a whole new grid? Coupled with a font detection algorithm this could end up looking more magical.

Originally that was the goal. Unfortunately it wouldn't align exactly perfect (either the size was slightly off due to whether the rectangle detection snagged the inside or outside of the borders or the depth was slightly off so there was slippage when moving around).

Didn't quite work right.

I do have some things on my to-do list to experiment with though that might help the magic

Hey, congrats on shipping! Unfortunately your app doesn't work for me :(. I've tried it on few sudoku designs on my laptop screen (don't have anything on hand right now) and there were two problems - the camera would not focus on the screen which resulted in blurry image, and second - it didn't recognize the puzzle correctly and returned incorrect results. I did put my screen flat on the table to simulate a paper laying on it, because I noticed the app doesn't work when the screen was upright - it rendered the results horizontally anyway.

Thanks for giving it a shot! Sorry it's not working for you.

I'm working on a hack to get non-horizontal puzzles working right now.

Unfortunately ARKit can only detect horizontal planes at this point so I'm trying to find a workaround since I now realize most people won't have a paper sudoku laying around to try it out and will logically go to the web to find one.

I suspect what's happening with the bad reads is that my neural network for digit classification is trained only on paper puzzles (I chopped up a bunch of books and scanned them to make my dataset). I'll have to get some scans of computer screen puzzles to get better accuracy.

Can you let me know what site you tried to use so I can make sure to get that font included in the dataset as well?

Hoping to get v1.1 submitted tonight yet that will fix the horizontal plane deficiency. Collecting, labeling, and retraining the network may take a bit longer than that.

Your app is awesome! What type of neural network did you use? Please write a post, I’m teaching myself machine learning and iOS development, and would love to learn from a successful developer!

Thanks! It's using a convolutional neural network which I minimally adapted from an MNIST tutorial (one of the reasons I picked this project to start with is that there are so many MNIST tutorials for how to do digit recognition out there and it seemed like a good way to dip my toe into ML).

Converting it to CoreML after it was trained was really easy.

The hardest part was actually creating the dataset and wrestling with pixel buffers to get the image data into a format CoreML liked.

Just tried it out, and it looks very good. It was a perfect way to demonstrate ARKit to colleagues. But for some reason it flips the board upside down.

Sorry about that!

That was happening when the phone traveled to the other side of the scene's origin from where it started. The angle calculation got confused.

I just got that fixed and submitted to Apple in v1.1 this morning. Should be inbound today or tomorrow as soon as they approve it.

Were there any particular advantages in using a neural network to solve the sudoku as opposed to an existing algorithm?

Actually the solving of the puzzle itself is just done with a simple recursive algorithm.

I trained a neural network to do the computer vision part to interpret the puzzles.

And there are a bunch of pre-trained networks built into iOS11's Vision API and ARKit that the app just uses black box.

ahh, sorry for the misunderstanding.

Does the network do well with handwriting, e.g. puzzles that have been half-solved?

Well I wasn't going to spill the beans just yet but if you promise to keep a secret.. working on that for v2 :)

Edit: Although so far I've learned that some people have VERY bad handwriting. I'm having a hard time sanitizing a dataset because I can't even tell what digit they were trying to write

Tried those first but no, they didn't work great with the data I was able to extract; those datasets are only handwritten digits which apparently weren't close enough to the computer printed ones.

I think the artifacts from the square borders (eg if it's slightly mis-cropped or misaligned) were also tripping up the model before I had my custom dataset loaded in there.

Edit: I should say this was my first stab at doing machine learning so I may be missing something; definitely open to ideas or suggestions!

Shouldn't you be able to generate a lot of synthetic "printed-looking" digits very easily? Just get say 20 common fonts and render each digit with random rotations and scaling etc.

That's definitely on my list of things to try! Thanks for the push in that direction

I'm that guy. Sorry :(

I have the worst handwriting.

People are like, "Dude, wtf?" when I try to whiteboard stuff.

All good; validates that I'm at least on the right track! Had to pare back my ambitions quite a bit to get _something_ finished before the iOS11 launch.

I didn't expect the reception to be so enthusiastic for v1 tbh; but I guess I chose a good place to call it an "MVP"

I'm really interested in the growth of ML on mobile apps. Did you test the ability to call existing ML APIs in the cloud and found they weren't responsive enough, or did it just feel more natural to run the net locally?

I've played with AWS Rekognition a little bit. It's definitely WAY faster to do it on-device. Plus it works even in Airplane mode.

I'm running the iOS11 Vision stuff, my own text recognition CoreML neural network (81 times per loop), plus all the ARKit stuff and I can still get 30-60 fps on my iPhone 7+

There shouldn’t be a need for that. All the info information you need to solve it all come with the puzzle.

They want to recognize the handwriting to give the player feedback on errors made.

Are you the same yeldarb as the one from kirupaForum?

I am! Good to see you :)

I miss that place; I feel like we grew up there.

Awesome! It's surreal to see a familiar name making such a neat app. I saw this earlier today and had a small moment of triumph since I'm terrible at solving sudokus myself. A few people visit kF every once and a while; feel free to stop by!

I feel like this is solving the symptom, and not the actual problem which is that newspapers are printing sudoku with some of the numbers missing.

Did something similar some time back before ARKit and vision frameworks were around.

Are you using the rectangle detection from Vision? https://www.youtube.com/watch?v=wt96OomJY9A

FYI I tried building something similar running completely client-side a little while ago; I did not get very far TBH: http://users.telenet.be/bull/sudoku/ As for solving the sudoku: that's the easy part: https://github.com/ToJans/learninghaskell/blob/master/0003Su...


Yeah, I am using Vision's rectangle detection.. but it's really touchy. I've got a lot of heuristics in there to throw out bad results and smooth things out.

It doesn't do well if there's something near the puzzle though (like a horizontal rule or text like you might see in a newspaper layout).

Any suggestions on a better way to segment the rectangles? Or any preprocessing tips (I couldn't really find anything that made it detect things more reliably)

I've got a very old and slightly broken blog post on what I was doing:


My approach was to adaptive threshold the image and then assume that that the object of interest was the biggest connected region in the image. Then run that though rectangle detection.

Have you tried setting the aspect ratio and minimum width on the Vision rectangle detector? That might filter out any nonsense.

The parameters are not very well documented. (Or at least it wasn't when I was coding that part). But yeah I did get those set finally.

Whatever I do though I can't get it to recognize any rectangles here: http://www.telegraph.co.uk/news/science/science-news/9359579...

For our demo video we ended up photoshopping out the horizontal line right above it and that seemed to work.

Just tried it on my old app and it worked well :)

I think you'll have to roll your own rectangle detection and use heuristics to make it better.

So in mine I assume that the connected object in the thresholded image with the largest number of pixels must be the puzzle. That makes it a lot easier.

Good luck!

Cool I'll put that on my todo list. Thanks!

Other good ideas:

- point at a board game and either do all the boring scorekeeping for me, or else advise me on best strategy

- take a photo of a bunch of lottery tickets and tell me whether I've won

The UK's national lottery android (and maybe Apple?) app already does the second one.

It's just done with a QR code.

Unfortunately... there was a bug and it told people they hadn't won: http://www.independent.co.uk/news/uk/home-news/national-lott...

Love these ideas! I have a long list of ARKit apps I want to build.. if only there were more months in a year.

I think my favorite is a Mario Kart style "ghost mode" for running. Where you could tell it what pace you wanted to go and it'd show ghost running in front or behind you at that pace!

I remember using Google Goggles for this back in 2011 or so.

The cool new thing is projecting the solution onto the paper and having it track correctly!

Wow, I was at a hack night and saw a demo by someone who had made exactly this type of app on his computer using its webcam. He built a neural network and trained it, and it could even handle angles that weren't straight on. I don't think he's the same guy, but yeah, it's a clever idea.

Do you have a link or other info on the hack night?

It's a group called OpenHack organized on Meetup.com and it's located in the southwest suburbs of Chicago. It was just one of like 30 people who attended it.

Hey, what if I don't have a Sudoku puzzle to provide? Can't you just generate one on the table ? :P

I just submitted v1.1 to the App Store that supports puzzles on a vertical plane so once that lands you can load up https://websudoku.com on your computer monitor to try it out!

Sneak peek of that update: https://twitter.com/braddwyer/status/910861205442527233

Several bug fixes included as well based on input from early users.

How would HN attack the problem of solving a crossword using this type of tech + character recognition?

A crossword solver was what I originally set out to build. iOS11's Vision APIs for rectangle detection weren't good enough to even have a shot at it though.

With better rectangle detection I think it'd be pretty doable!

I had assumed that the AR side of the problem would be the easy bit and that the real issue would be solving the riddles.

I think that bit would be easy; but I'd probably cheat. You don't need to understand the context of the clues. You just need a big enough wordlist to brute force with. Start with the longest words since they will have the least options.

I'll bet you could scrape all the previously used words/phrases from the last 100 years of NYTimes, etc to get things rolling.

I think a word2vec guided brute force approach to the riddles would be the winning combination.

Check out Dr. Fill and maybe Cruciform.

Oh god mods, please add the word "magic" to the title, it's painful to read.

Or remove "like", which is probably what was intended and slightly less sensationalist

On the contrary, the author almost certainly intended to say "like magic", since that's what it says on the page: "Solve sudoku like magic with the power of AR."

I think what parent meant was that the word magic was removed from the HN post to make it less sensational, but that whoever removed that word overlooked removing the word "like".

Email them. They don't see it otherwise.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact