
Teachable Machine: Teach a machine using your camera, live in the browser - jozydapozy
https://www.blog.google/topics/machine-learning/now-anyone-can-explore-machine-learning-no-coding-required/
======
nsthorat
deeplearn.js author here...

We do _not_ send _any_ webcam / audio data back to a server, all of the
computation is totally client side. The storage API requests are just
downloading weights of a pretrained model.

We're thinking about releasing a blog post explaining the technical details of
this project, would people be interested?

~~~
amelius
Yes please! :)

And some quick questions:

What network topology do you use, and on what model is it based (e.g.
"inception")?

What kind of data have you used to pretrain the model?

~~~
nsthorat
We're using SqueezeNet
([https://github.com/DeepScale/SqueezeNet](https://github.com/DeepScale/SqueezeNet)),
which is similar to Inception (trained on the same ImageNet dataset) but is
much smaller - 5MB instead of inception's 100MB - and inference is much much
quicker.

The application takes webcam frames and infers through SqueezeNet, producing a
1000D logits vector for each frame. These can be thought of as unnormalized
probabilities for each of ImageNet's 1000 classes.

During the collection phase, we collect these vectors for each class in
browser memory, and during inference we pass the frame through SqueezeNet and
do k-nearest neighbors to find the class with the most similar logits vector.
KNN is quick because we vectorize it as one large matrix multiplication.

I'll go deeper in a blog post soon :)

~~~
amelius
Interesting!

I'm curious why you've used a different classification algorithm on top of a
neural network. I would expect that a neural network on top of a pretrained
network could give similar results, with the benefit of simpler code. Is
performance the reason?

Anyway, I'm looking forward to your blog post.

~~~
nsthorat
Training a neural network on top would require a "proper" training phase, and
finding the right hyperparameters that work everywhere turned out to be
tricky. Actually, this is what we did originally, in the blog post we'll try
to show demos of each of the approaches and explain why they don't work.

KNN also makes training "instant", and the code much much simpler.

~~~
amelius
That makes sense.

By the way, I think your software could become very popular on the Raspberry
Pi, because it would be very cheap and fun to use it for all sorts of
applications (e.g. home automation).

~~~
nsthorat
[https://github.com/PAIR-code/deeplearnjs/issues/158](https://github.com/PAIR-
code/deeplearnjs/issues/158)

------
celim307
Pretty neat! Good overview without overwhelming right off the bat. Would be
cool if they showed off common pitfalls like over fitting, or even segued into
general statistics!

------
melling
How long before I can teach my computer gestures that are mapped to real
computer functions? For example, scroll up/down, switch apps, save document,
cut/copy/paste, etc.

One could probably map each gesture to a regular USB device that acts as a
second keyboard and mouse? The hard part is identifying enough unique
gestures?

~~~
wildebaard
You mean a device such as the Leap Motion Controller? [http://store-
eur.leapmotion.com/products/leap-motion-control...](http://store-
eur.leapmotion.com/products/leap-motion-controller) They seem to offer a VR
headset add-on these days, but I've only ever seen the 'basic' controller in
action, which worked okay-ish.

~~~
melling
I think that’s infrared but the same idea. That never quite worked. Also, Leap
didn’t continue to refine their hardware for consumers. They have next
generation hardware that’s going directly into VR headsets, but you can’t buy
it.

[https://www.engadget.com/2014/08/28/leap-motion-s-next-
senso...](https://www.engadget.com/2014/08/28/leap-motion-s-next-sensor-is-
designed-specifically-for-virtual-r/)

~~~
wildebaard
Seems to me that you can buy the SDK[0] (which is not much more than a
Controller and a bracket to hold in against your VR headset of choice), so at
least they've made some progress since 2014. [0]: [http://store-
eur.leapmotion.com/products/universal-vr-develo...](http://store-
eur.leapmotion.com/products/universal-vr-developer-bundle)

------
amelius
I don't have a camera here. Did anyone try it? How does it work?

~~~
IanCal
Surprisingly well!

It's a _really_ well put together demo & tutorial.

I held a pen up next to me and held the green button.

Then did the same with a mouse.

It would flick between the two if I was holding nothing, so I held the orange
button for a bit while holding nothing.

Worked pretty much every time.

Training is fast enough with a few hundred images per class that I didn't
notice any delay.

~~~
amelius
What do you mean exactly by "held the green button"?

I can't run the demo here (browser not capable enough, and no camera) and I'm
getting really curious what this is about.

~~~
icc97
Watch the demo video in the link, it explains the green button

~~~
amelius
Thanks!

------
crypticlizard
The value-add for this demo is amazing, it's going to be many people's first
approachable experience of ml, or things just like this will be. I expect a
lot more of this stuff to appear in UI/UX. It's fun, intuitive, and a game
changer away from dumb screens to fully interactive machines with their own
knowledge graph.

------
lelima
You can solve problems using Machine learning without coding from a while
ago.. azure machine learning have this features from more than a year ago.

I've solve regression, classification and recommendation problems with it and
the best part is it deploys an web service with a few clicks.

~~~
thanksgiving
But you would need to have:

1 a working phone

2 a valid credit card

To use azure which places a too high bar on students. I mean I've tried to
argue for graduated restrictions so basically students with .edu emails should
be able to do some things without entering a credit card number but the fact
that it is not possible suggests this isn't a priority for azure.

Google says this finds on your browser so there's little infrastructure cost
for this demo, right?

~~~
jlian
There is a student offer that doesn’t require credit card
[https://azure.microsoft.com/en-us/offers/ms-
azr-0144p/](https://azure.microsoft.com/en-us/offers/ms-azr-0144p/)

~~~
marksomnian
But IIRC it doesn't include ML tools.

------
StavrosK
Does anyone know what this uses under the hood? I loved the demo, but I would
like a similarly easy way to get started locally with Python, for example.

Is there an ML library that can easily start capturing images from the webcam
so you can play around with training a model?

~~~
icc97
It says it in the video: [https://deeplearnjs.org](https://deeplearnjs.org)

~~~
make3
he is not asking for the learning library

~~~
Geee
What does he ask then? The whole code of this is on Github.

------
greggman
be aware, at least in Chrome, once you give teachablemachine.withgoogle.com
permission to use you camera, unless you revoke that permission is has
permission to use your camera without further permission including from
iframes. In other words every ad from and analytics from Google could start
injecting camera access.

I wish chrome would give the option to only give permission "this time" and I
wish it didn't allow camera access from cross domain iframes.

~~~
azinman2
But won’t it be on just that FQDN alone? Google analytics and ads are served
from a totally different domain. What’s the actual concern here?

~~~
amigoingtodie
So you are contending we are secure via DNS?

~~~
matt4077
That's one of these arguments that may attack the parent in isolation, but
makes absolutely no sense in the context of the thread they were replying to.

Because if you assume an attacker to have control over DNS, the security model
of giving permission on a per-domain basis is broken anyway, and the initial
concern with granting google this access is already subsumed in your general
paranoia.

~~~
ec109685
No it isn’t. TLS helps ensure you aren’t talking to a rogue server and HSTS
ensures you can be spoofed in the first http request to a new server.

------
netcraft
What makes it non-mobile? Is it something about the expected performance of
the JS? or are there apis being used im not thinking about?

~~~
nsthorat
It works on mobile, it's just slow. Every time we read and write from memory
we have to pack and unpack 32 bit floats as 4 bytes without bit shifting
operators >.>

~~~
white-flame
Isn't that what ArrayBuffers can do for you at nearly the same amortized speed
as C unions?

------
f00_
this is really cool, openframeworks-esque in browser javascript

if you like this I would highly recommend looking at openframeworks.

the interactive browser part excites me want to try to make something with
deeplearn.js

------
mschuster91
Hmm. I wonder if one could train this with dick pics and embed into popular
messenger apps client-side... "this picture was classified as a penis", to
counter morons sending their dick as first message.

------
peepopeep
Am I the only paranoid one who thinks this is just Google's way of capturing
millions of faces in their database? Or did Apple beat them to it?

~~~
jamesmishra
The faces are unlabeled, and I'm not sure what that data would be good for. If
Google really wanted face data, they could look at:

\- Gmail / Google Plus / Google Apps profile pictures

\- Google Street View

\- Google Hangouts

\- implementing a primitive Face ID or Snapchat-style camera on Google Android

\- the large mass of face pictures that they index with Google Images

~~~
runj__
Google Photos seems like the absolute best bet there, they're "organizing"
them there by default.

~~~
jamesmishra
Can't believe I forgot about that one! I'm an avid Google Photos user, and
they definitely have some pretty amazing unsupervised clustering for faces.

------
eggie5
i bet it's fine tuning an ImageNet CNN

~~~
nsthorat
kinda, ya

~~~
eggie5
What do u think?

