Hi yeldarb! This is a really neat app. I don't want to diminish your creation, so please don't think I'm trying to put you down with this question:
How much of this was just glue-code to integrate a few modules, vs. how much did you have to do from scratch?
I'm asking to try to get an idea to see if ARkit is within the reach of an average joe- sudoku solver code could be a copy/paste, digit recognition could be a copy/paste, and ARKit should provide you with some environment/location info.
It's mostly glue (but some of it is special glue).
Oddly enough, digit recognition was one of the harder parts. iOS11's Vision API has character recognition but it only gives you a bounding box, it doesn't do the OCR!
So for that I ended up ripping up a whole bunch of Sudoku puzzle books from the second hand book store and scanning them in to create a dataset. Then I trained my own nerual network in Keras and converted it to CoreML.
The actual ARKit part was really easy, even for me who has never done iOS programming before.
Have you tried to render individual digits in the existing empty squares instead of rendering a whole new grid? Coupled with a font detection algorithm this could end up looking more magical.
Originally that was the goal. Unfortunately it wouldn't align exactly perfect (either the size was slightly off due to whether the rectangle detection snagged the inside or outside of the borders or the depth was slightly off so there was slippage when moving around).
Didn't quite work right.
I do have some things on my to-do list to experiment with though that might help the magic
Hey, congrats on shipping! Unfortunately your app doesn't work for me :(. I've tried it on few sudoku designs on my laptop screen (don't have anything on hand right now) and there were two problems - the camera would not focus on the screen which resulted in blurry image, and second - it didn't recognize the puzzle correctly and returned incorrect results. I did put my screen flat on the table to simulate a paper laying on it, because I noticed the app doesn't work when the screen was upright - it rendered the results horizontally anyway.
Thanks for giving it a shot! Sorry it's not working for you.
I'm working on a hack to get non-horizontal puzzles working right now.
Unfortunately ARKit can only detect horizontal planes at this point so I'm trying to find a workaround since I now realize most people won't have a paper sudoku laying around to try it out and will logically go to the web to find one.
I suspect what's happening with the bad reads is that my neural network for digit classification is trained only on paper puzzles (I chopped up a bunch of books and scanned them to make my dataset). I'll have to get some scans of computer screen puzzles to get better accuracy.
Can you let me know what site you tried to use so I can make sure to get that font included in the dataset as well?
Hoping to get v1.1 submitted tonight yet that will fix the horizontal plane deficiency. Collecting, labeling, and retraining the network may take a bit longer than that.
Your app is awesome! What type of neural network did you use? Please write a post, I’m teaching myself machine learning and iOS development, and would love to learn from a successful developer!
Thanks! It's using a convolutional neural network which I minimally adapted from an MNIST tutorial (one of the reasons I picked this project to start with is that there are so many MNIST tutorials for how to do digit recognition out there and it seemed like a good way to dip my toe into ML).
Converting it to CoreML after it was trained was really easy.
The hardest part was actually creating the dataset and wrestling with pixel buffers to get the image data into a format CoreML liked.
Just tried it out, and it looks very good. It was a perfect way to demonstrate ARKit to colleagues. But for some reason it flips the board upside down.
Well I wasn't going to spill the beans just yet but if you promise to keep a secret.. working on that for v2 :)
Edit: Although so far I've learned that some people have VERY bad handwriting. I'm having a hard time sanitizing a dataset because I can't even tell what digit they were trying to write
Tried those first but no, they didn't work great with the data I was able to extract; those datasets are only handwritten digits which apparently weren't close enough to the computer printed ones.
I think the artifacts from the square borders (eg if it's slightly mis-cropped or misaligned) were also tripping up the model before I had my custom dataset loaded in there.
Edit: I should say this was my first stab at doing machine learning so I may be missing something; definitely open to ideas or suggestions!
Shouldn't you be able to generate a lot of synthetic "printed-looking" digits very easily? Just get say 20 common fonts and render each digit with random rotations and scaling etc.
All good; validates that I'm at least on the right track! Had to pare back my ambitions quite a bit to get _something_ finished before the iOS11 launch.
I didn't expect the reception to be so enthusiastic for v1 tbh; but I guess I chose a good place to call it an "MVP"
I'm really interested in the growth of ML on mobile apps. Did you test the ability to call existing ML APIs in the cloud and found they weren't responsive enough, or did it just feel more natural to run the net locally?
I've played with AWS Rekognition a little bit. It's definitely WAY faster to do it on-device. Plus it works even in Airplane mode.
I'm running the iOS11 Vision stuff, my own text recognition CoreML neural network (81 times per loop), plus all the ARKit stuff and I can still get 30-60 fps on my iPhone 7+
Awesome! It's surreal to see a familiar name making such a neat app. I saw this earlier today and had a small moment of triumph since I'm terrible at solving sudokus myself. A few people visit kF every once and a while; feel free to stop by!
Yeah, I am using Vision's rectangle detection.. but it's really touchy. I've got a lot of heuristics in there to throw out bad results and smooth things out.
It doesn't do well if there's something near the puzzle though (like a horizontal rule or text like you might see in a newspaper layout).
Any suggestions on a better way to segment the rectangles? Or any preprocessing tips (I couldn't really find anything that made it detect things more reliably)
My approach was to adaptive threshold the image and then assume that that the object of interest was the biggest connected region in the image. Then run that though rectangle detection.
Have you tried setting the aspect ratio and minimum width on the Vision rectangle detector? That might filter out any nonsense.
I think you'll have to roll your own rectangle detection and use heuristics to make it better.
So in mine I assume that the connected object in the thresholded image with the largest number of pixels must be the puzzle. That makes it a lot easier.
Love these ideas! I have a long list of ARKit apps I want to build.. if only there were more months in a year.
I think my favorite is a Mario Kart style "ghost mode" for running. Where you could tell it what pace you wanted to go and it'd show ghost running in front or behind you at that pace!
Wow, I was at a hack night and saw a demo by someone who had made exactly this type of app on his computer using its webcam. He built a neural network and trained it, and it could even handle angles that weren't straight on. I don't think he's the same guy, but yeah, it's a clever idea.
It's a group called OpenHack organized on Meetup.com and it's located in the southwest suburbs of Chicago. It was just one of like 30 people who attended it.
I just submitted v1.1 to the App Store that supports puzzles on a vertical plane so once that lands you can load up https://websudoku.com on your computer monitor to try it out!
A crossword solver was what I originally set out to build. iOS11's Vision APIs for rectangle detection weren't good enough to even have a shot at it though.
With better rectangle detection I think it'd be pretty doable!
I think that bit would be easy; but I'd probably cheat. You don't need to understand the context of the clues. You just need a big enough wordlist to brute force with. Start with the longest words since they will have the least options.
I'll bet you could scrape all the previously used words/phrases from the last 100 years of NYTimes, etc to get things rolling.
On the contrary, the author almost certainly intended to say "like magic", since that's what it says on the page: "Solve sudoku like magic with the power of AR."
I think what parent meant was that the word magic was removed from the HN post to make it less sensational, but that whoever removed that word overlooked removing the word "like".
(I was going to write up a medium post about the technical side of building the app but haven't gotten a chance yet)