I’d be happy to answer anything else you’d like to know!
Original thread: https://news.ycombinator.com/item?id=14347211
Demo of the app (in the show): https://www.youtube.com/watch?v=ACmydtFDTGs
App for iOS: https://itunes.apple.com/app/not-hotdog/id1212457521
App for Android (just released yesterday): https://play.google.com/store/apps/details?id=com.seefoodtec...
I am kind of curious how well you do. If you ever do get it going, there may be a use for it - but, again, liability is a huge factor in something like this.
I am not a lawyer but I have spent a whole lot of time with lawyers and in court rooms. It was part of my business, indirectly. Thankfully, I sold and retired years ago.
Why not be able to search over a list, and once on an item, show frequently related (e.g. garlic if you search onion). It may not involve any “AI,” but it’ll be far more accurate and easier to implement.
I don't think there's a person that could tell you whether food in a picture has allergens, let alone an app.
GPS was considered, but rejected as insufficiently cool.
Did you, or the writing staff or other consultants, start with hotdogs or dick picks?
It made me laugh because it reminded me of this popular lecture(o) that was, and is, passed around tech circles
Any relation? Or fun coincidence?
(o) https://youtu.be/uJnd7GBgxuA?t=5555s .. the lecture is given by andrej karpathy on 2016-09-24 and the timestamp goes to a point in the lecture where the lecturer discusses an interface made to have humans compete with conv nets to identify hotdogs
I suppose it's a clear testament to the quality of the show and its commentary.. thanks for your contribution to that end!
Probably only 20% of the world's hot dogs are just a basic hot dog with mustard on it. Once you move past one or two condiments, the domain of hot dogs identification along with fixings gets confusing from a computer vision standpoint.
Pinterest's similar images function is able to identify hotdogs with single condiments fairly well:
They appear to be using deep CNN's.
Having embedded tensorflow for on-site identification is all well and good for immediacy and cost, but if I can't really properly identify whether something is a hotdog vs. a long skinny thing with a mustard squiggle, what good does that do me? What would be the next step up in your mind?
I ask this as someone who is sincerely interested in building low cost, fun, projects.
My condiments to the author, I see what you did there ;)
Mike Judge, Alec Berg, Clay Tarver, and all the awesome writers that actually came up with the concept: Meghan Pleticha (who wrote the episode), Adam Countee, Carrie Kemper, Dan O’Keefe (of Festivus fame), Chris Provenzano (who wrote the amazing “Hooli-con” episode this season), Graham Wagner, Shawn Boxee, Rachele Lynn & Andrew Law…
Todd Silverstein, Jonathan Dotan, Amy Solomon, Jim Klever-Weis and our awesome Transmedia Producer Lisa Schomas for shepherding it through and making it real!
Our kick-ass production designers Dorothy Street & Rich Toyon.
Meaghan, Dana, David, Jay, Jonathan and the entire crew at HBO that worked hard to get the app published (yay! we did it!)
When you get into the final, long training runs, I would say the developer experience advantages start to come down, and not having to deal with the freezes/crashes or other eGPU disadvantages (like keeping your laptop powered on in one place for an 80-hour run) makes moving to the cloud (or a dedicated machine) become very appealing indeed. You will also sometimes be able to parallelize your training in such a way that the cloud will be more time-efficient (if still not quite money-efficient). For Cloud, I had my best experience using Paperspace . I’m very interested to give Google Cloud’s Machine Learning API a try.
If you’re pressed for money, you can’t do better than buying a top of the line GPU once every year or every other year, and putting it in an eGPU enclosure.
If you want the absolute best experience, I’d build a local desktop machine with 2–4 GPUs (so you can do multiple training runs in parallel while you design, or do a faster, parallelized run when you are finalizing).
Cloud does not quite totally make sense to me until the costs come down, unless you are 1) pressed for time and 2) will not be doing more than 1 machine learning training in your lifetime. Building your own local cluster becomes cost-efficient after 2 or 3 AI projects
per year, I’d say.
I guess the interpretation is that the first few normalize->convolution->pool->dropout layers are basically achieving something broadly analogous to the initial feature extraction steps that used to be the mainstay in this area (PCA/ICA, HOG, SIFT/SURF, etc.), and are reasonably problem-independent.
I am waiting until the next-gen enclosures/cards come out which play nicer with the OS for deep learning.
For those of you like me that never knew they existed, here is what they look like: http://img.food.com/img/recipes/25/97/2/large/picMKCD0m.jpg
I could have a good half-hour conversion about the nuances of Alex Trebek's vocal inflections... shudders
We had a telesync'd demo that let you play along with a Jeopardy episode by yelling answers at your phone. The app knew the timing markers for when the question was asked + when a contestant answered, so would only give you credit if you beat the contestant with the correct answer.
Our model user was "people who yell answers at the screen when Jeopardy is on."
Still think it would have made a decent companion app to the show though...
Trebek's elocution is just something you pick up on after rewatching an episode enough times. He has really interesting ways of emphasizing things, but they seem normal if you're just listening to them once through.
Did you try quantizing the parameters to shrink the model size some more? If so, how did it affect the results? It also runs slightly faster on mobile from my experience.
It’s also my understanding at the moment that quantization does not help with inference speed or memory usage, which were my chief concerns. I was comfortable with the binary size (<20MB) that was being shipped and did not feel the need to save a few more MBs there. I was more worried about accuracy, and did not want to ship a quantized version of my network without being able to assess the impact.
Finally, it now seems that quantization may be best applied at training time rather than at shipping time, according to a recent paper by the University of Iowa & Snapchat , so I would probably want to bake that earlier into my design phase next time around.
Have a look at their sample PlateSpace app:
Very cool new service and some excellent tutorials as well, for example for the PlateSpace web app: https://docs.mongodb.com/stitch/getting-started/platespace-w...
I'd definitely recommend having a look.
Any chance the full source will ever be opened up? Would be an excellent companion to the article.
In the meantime, iff there are any details you’d like to see, don’t hesitate to chime in and I’ll try to respond with details!
As for the gear, I think it’s really damaging that so many people think Deep Learning is only for people with large datasets, cloud farms (and PhDs) — as the app proves, you can do a lot with just data you curate by hand, a laptop (and a lowly Master’s degree :p)
var percentage = await NativeModules.AIManager.analyzeImage(path)
How did you source and categorize the initial 150K of hotdogs & not hotdogs?
That would have beat what I ended up shipping, but the problem of course was the size of those networks. So really, if we’re comparing apples to apples, I’ll say none of the “small”, mobile-friendly neural nets (e.g. SqueezeNet, MobileNet) I tried to retrain did anywhere near as well as my DeepDog network trained from scratch. The training runs were really erratic and never really reached any sort of upper bound asymptotically as they should. I think this has to do with the fact that these very small networks contain data about a lot of ImageNet classes, and it’s very hard to tune what they should retain vs. what they should forget, so picking your learning rate (and possibly adjusting it on the fly) ends up being very critical. It’s like doing neurosurgery on a mouse vs. a human I guess — the brain is much smaller, but the blade says the same size :-/
Honestly I think the biggest gains would be to go back to a beefier, pre-trained architecture like Inception, and see if I can quantize it to a size that’s manageable, especially if paired with CoreML on device. You’d get the accuracy that comes from big models, but in a package that runs well on mobile.
What I’m most proud of is the remote neural network injection  — which I’m surprised no one has commented on here. I just think it’s absolutely huge to be able to set large parts of your app’s behavior in TensorFlow code, and replace that on the fly in your user’s app as needed. But maybe I’m the only one excited about this :D
Beyond that, I would recommend making sure you have a concrete project you pursue. ML is very theoretical otherwise, and to be honest, our shared understanding of what works and what doesn’t is still fairly limited — with major discoveries every other week it feels like. So without a concrete project to anchor your thoughts, it can be hard to learn what “works” and what doesn’t, just because different things work on different projects.
If you have any questions, I highly recommend the FastAI forums  or the Machine Learning subreddit !