(I was going to write up a medium post about the technical side of building the app but haven't gotten a chance yet)
How much of this was just glue-code to integrate a few modules, vs. how much did you have to do from scratch?
I'm asking to try to get an idea to see if ARkit is within the reach of an average joe- sudoku solver code could be a copy/paste, digit recognition could be a copy/paste, and ARKit should provide you with some environment/location info.
Oddly enough, digit recognition was one of the harder parts. iOS11's Vision API has character recognition but it only gives you a bounding box, it doesn't do the OCR!
So for that I ended up ripping up a whole bunch of Sudoku puzzle books from the second hand book store and scanning them in to create a dataset. Then I trained my own nerual network in Keras and converted it to CoreML.
The actual ARKit part was really easy, even for me who has never done iOS programming before.
Didn't quite work right.
I do have some things on my to-do list to experiment with though that might help the magic
I'm working on a hack to get non-horizontal puzzles working right now.
Unfortunately ARKit can only detect horizontal planes at this point so I'm trying to find a workaround since I now realize most people won't have a paper sudoku laying around to try it out and will logically go to the web to find one.
I suspect what's happening with the bad reads is that my neural network for digit classification is trained only on paper puzzles (I chopped up a bunch of books and scanned them to make my dataset). I'll have to get some scans of computer screen puzzles to get better accuracy.
Can you let me know what site you tried to use so I can make sure to get that font included in the dataset as well?
Hoping to get v1.1 submitted tonight yet that will fix the horizontal plane deficiency. Collecting, labeling, and retraining the network may take a bit longer than that.
Converting it to CoreML after it was trained was really easy.
The hardest part was actually creating the dataset and wrestling with pixel buffers to get the image data into a format CoreML liked.
That was happening when the phone traveled to the other side of the scene's origin from where it started. The angle calculation got confused.
I just got that fixed and submitted to Apple in v1.1 this morning. Should be inbound today or tomorrow as soon as they approve it.
I trained a neural network to do the computer vision part to interpret the puzzles.
And there are a bunch of pre-trained networks built into iOS11's Vision API and ARKit that the app just uses black box.
Does the network do well with handwriting, e.g. puzzles that have been half-solved?
Edit: Although so far I've learned that some people have VERY bad handwriting. I'm having a hard time sanitizing a dataset because I can't even tell what digit they were trying to write
I think the artifacts from the square borders (eg if it's slightly mis-cropped or misaligned) were also tripping up the model before I had my custom dataset loaded in there.
Edit: I should say this was my first stab at doing machine learning so I may be missing something; definitely open to ideas or suggestions!
I have the worst handwriting.
People are like, "Dude, wtf?" when I try to whiteboard stuff.
I didn't expect the reception to be so enthusiastic for v1 tbh; but I guess I chose a good place to call it an "MVP"
I'm running the iOS11 Vision stuff, my own text recognition CoreML neural network (81 times per loop), plus all the ARKit stuff and I can still get 30-60 fps on my iPhone 7+
I miss that place; I feel like we grew up there.
Are you using the rectangle detection from Vision?
Yeah, I am using Vision's rectangle detection.. but it's really touchy. I've got a lot of heuristics in there to throw out bad results and smooth things out.
It doesn't do well if there's something near the puzzle though (like a horizontal rule or text like you might see in a newspaper layout).
Any suggestions on a better way to segment the rectangles? Or any preprocessing tips (I couldn't really find anything that made it detect things more reliably)
My approach was to adaptive threshold the image and then assume that that the object of interest was the biggest connected region in the image. Then run that though rectangle detection.
Have you tried setting the aspect ratio and minimum width on the Vision rectangle detector? That might filter out any nonsense.
Whatever I do though I can't get it to recognize any rectangles here: http://www.telegraph.co.uk/news/science/science-news/9359579...
For our demo video we ended up photoshopping out the horizontal line right above it and that seemed to work.
I think you'll have to roll your own rectangle detection and use heuristics to make it better.
So in mine I assume that the connected object in the thresholded image with the largest number of pixels must be the puzzle. That makes it a lot easier.
- point at a board game and either do all the boring scorekeeping for me, or else advise me on best strategy
- take a photo of a bunch of lottery tickets and tell me whether I've won
It's just done with a QR code.
Unfortunately... there was a bug and it told people they hadn't won: http://www.independent.co.uk/news/uk/home-news/national-lott...
I think my favorite is a Mario Kart style "ghost mode" for running. Where you could tell it what pace you wanted to go and it'd show ghost running in front or behind you at that pace!
The cool new thing is projecting the solution onto the paper and having it track correctly!
Sneak peek of that update: https://twitter.com/braddwyer/status/910861205442527233
Several bug fixes included as well based on input from early users.
With better rectangle detection I think it'd be pretty doable!
I'll bet you could scrape all the previously used words/phrases from the last 100 years of NYTimes, etc to get things rolling.