
Show HN: GPU-Accelerated Digit Recognition with WebGL - erkaman
https://erkaman.github.io/regl-cnn/src/demo
======
erkaman
The demo uses WebGL, and if you can't get the demo to work, you can
here([https://github.com/Erkaman/regl-cnn](https://github.com/Erkaman/regl-
cnn)) find a recorded gif that shows what it is supposed to look like.

This demo does handwritten digit recognition by evaluating a Convolutional
Neural Network on the GPU with WebGL. The network was trained in TensorFlow by
this script here([https://github.com/Erkaman/regl-cnn/blob/gh-
pages/scripts/cr...](https://github.com/Erkaman/regl-cnn/blob/gh-
pages/scripts/create_cnn.py)), and the network was then reimplemented on the
GPU by hand with WebGL. The main purpose of the demo was to demonstate how our
WebGL framework
regl([https://github.com/mikolalysenko/regl](https://github.com/mikolalysenko/regl))
can be used to greatly simplify GPGPU programming in WebGL. The secondary
purpose was to test whether evaluating Deep Learning networks in WebGL is
doable. To our knowledge(but we may be wrong!), our implementation is the
first implementation ever to attempt GPU accelerating neural networks with
WebGL And we hope that this implementation will provide a foundation for
people who, like us, wish to experiment with Deep Learning and WebGL The GPU
implementation can be found here([https://github.com/Erkaman/regl-cnn/blob/gh-
pages/src/gpu.js](https://github.com/Erkaman/regl-cnn/blob/gh-
pages/src/gpu.js))

Note that this network will probably be slower than the corresponding network
implemented on the CPU. This is because of the overhead associated with
transferring data to and from the GPU. But in the future we will attempt
implementing more complex networks in the browser, such as Neural
Style([https://arxiv.org/pdf/1508.06576v2.pdf](https://arxiv.org/pdf/1508.06576v2.pdf)),
and then we think that we will see a significant speedup compared to the CPU.

Lastly, if anyone has any questions, I will be glad to answer them here.

~~~
transcranial
Nice work! I've also tried my hands at use WebGL to do deep learning in the
browser (e.g.
[https://github.com/scienceai/neocortex/blob/master/src/lib/w...](https://github.com/scienceai/neocortex/blob/master/src/lib/webgl/matmul-
webgl.js)). The conclusion I came to was that there are just way too many
limitations for it to really pay off. The need to encode everything in
textures etc limits the data shape and dimensionality, not to mention the
complexity cost. If you can get more complex networks working I'll be really
impressed!

MXnet.js [1] is an emscripten port of the base C++ framework. It runs entirely
in the browser and works fairly well. The actual code produced by emscripten
isn't that large, but the model weights can become an issue. I've tried to get
emscripten working on tensorflow, even just for forward prediction, but have
been pretty much gotten nowhere. Of course this doesn't let you harness GPU
power.

Lots of cool potential applications of doing deep learning over the web are
just waiting to be discovered and built.

[1] [https://github.com/dmlc/mxnet.js](https://github.com/dmlc/mxnet.js)

~~~
erkaman
Wow, so there are other people who have tried doing deep learning in WebGL!
But I will also give it a try, and see if I can do it better than you.

------
imadfy
When I handwrite a 7, it has a dash through the middle. Everyone I know does
this. Your app doesn't like these; it sees a 3.

~~~
Fifer82
Really?? It is the opposite here. No one really uses the dash. I like wee
differences like this.

~~~
eatbitseveryday
> It is the opposite here. No one really uses the dash.

Who is no one? Where is here?

~~~
esrauch
In my experience the dash is unusual (but not unheard of) in the US. I've
heard people refer to it as a German or European 7. I've never seen the German
1 that is like an upside down V used by an American.

------
gecgooden
I keep getting incorrect results for 4's in this style:
[http://imgur.com/BIlhnN9](http://imgur.com/BIlhnN9)

However if I draw 4's like this:
[http://imgur.com/akifdRs](http://imgur.com/akifdRs) I get the correct result.

Is this a limitation with the training set?

~~~
erkaman
that kind of 4 doesn't exist in the training set. This works:
[http://imgur.com/gallery/xnFnF](http://imgur.com/gallery/xnFnF)

------
flohofwoe
Interesting it returns the false numbers when written in 'non-American' style.
Drawing a 1 is recognized as 7, and drawing a 7 (with 'strike-through') is
recognized as 2 :)

([https://en.wikipedia.org/wiki/Regional_handwriting_variation...](https://en.wikipedia.org/wiki/Regional_handwriting_variation#Arabic_numerals))

------
DanWaterworth
Since the network has already been trained, I can't imagine using WebGL is
worthwhile. It would interesting to see, either, drawing samples from the
equilibrium distribution or training in the browser.

------
n00b101
It must have been quite painstaking to hand-code this neural network into
WebGL shaders. It could have been easier if the browser vendors would just
implement the WebCL standard.

This seems like a throwback to the pre-CUDA "GPGPU" era, when people were
implementing numerical algorithms in OpenGL to be able to leverage GPUs for
general purpose computing.

~~~
erkaman
Yeah, it was really painful. In order to speed up the process, I first
implemented the network on the CPU, so that I could quickly verify my GPU
implementation.

------
jvdl
I draw my 1's the way it's often displayed on screen, with a little tail at
the top and a line at the base. Your system keeps detecting this as a 2. I've
also had 3's detected as 2's.

~~~
erkaman
Yeah, the MNIST dataset is not that great it seems. At least for recognising
real handwritten digits.

~~~
jvdl
That's bad news for me, because I also plan to use the MNIST data in a project
of mine. :)

------
piggycurse
It may have some issues with 9, depending on how you write it.
[https://i.imgur.com/URYuOSO.png](https://i.imgur.com/URYuOSO.png)

------
eatbitseveryday
Pretty cool. I realize it is mainly a proof-of-concept, but decided to try out
variations of scribbles[1]. Does the code make explicit assumptions about
orientation, or did the training data make certain assumptions?

[1] [http://imgur.com/a/HCFzy](http://imgur.com/a/HCFzy)

~~~
erkaman
The original MNIST training data made the assumption that the digits are not
flipped. But you could solve it by creating more training data by flipping the
original digits. But then you suddenly end up with an awful lot of data, and
then the training process would take days, literally.

------
vtange
Drew 8s skewed slightly toward left, ended up with "6" or "3"

------
willvarfar
Intriguing! I was under the illusion that webGL was emit-only, and that
Javascript programs can't read back any output generated by WebGL. I must be
wrong or out of date :) So how does the script do it?

~~~
deafcalculus
With the right extensions, you can render to a texture which can be sampled
from in subsequent draw calls.

The framebuffer object extensions that allow you to write to 32-bit RGBA
textures are widely supported on desktops and mobiles (OpenGL ES 2). But
floating point textures are not. So, shaders resort to encoding 32-bit floats
in RGBA textures. This unfortunately isn't a simple cast. More here:
[http://aras-p.info/blog/2009/07/30/encoding-floats-to-
rgba-t...](http://aras-p.info/blog/2009/07/30/encoding-floats-to-rgba-the-
final/)

------
Sander_Marechal
I have trouble with the 9's being returned as 3's

~~~
zargath
same here - I thought for a second that my handwriting was THAT bad after
years with keyboard.

------
deutronium
Very nice! For some reason I thought it was for letters, and was wondering why
it thought my 'h' looked like a '6' heh

------
ythl
I've never gotten a correct result.

7 returned 9, 3 returned 8, 1 returned 9...

~~~
eyelidlessness
666 returns 4

~~~
cordite
When you break the contract of a function (a single digit here), you might as
well be playing dice. [https://xkcd.com/221/](https://xkcd.com/221/)

------
mp3geek
Might be helpful used in a Captcha-type setup

~~~
wongarsu
What are you envisioning? It would seem that drawing digits is fairly easy for
bots.

~~~
MasterScrat
One way would be to _crack_ captchas, eg make a Chrome extension that
automatically fills them up. I remember someone implemented that at some point
for MegaUpload.

Another way is to gather a data set of people writing digits with their mouse,
and make a classifier that tells you if an input is realistic or not. Of
course you'd need to store previous user inputs to make sure someone is not
just reusing the same digit over and over again.

------
amitmerchant
I have tried this. It is always returning 0.

~~~
erkaman
Hmm, may be a problem with your graphics card. What's your graphics card and
browser?

~~~
eksrow
Getting the same issue with chrome 52, windows 10 and a r9 fury.

[.Offscreen-For-WebGL-043C4318]GL ERROR :GL_INVALID_OPERATION : glReadPixels:
demo:1

Issue with amd cards maybe?

~~~
erkaman
Yeah, reading from floating point textures with `glReadPixels` is not really
supported on some cards or browsers, it seems.

~~~
eksrow
I just checked chrome://gpu and it shows:

Canvas: Software only, hardware acceleration unavailable

and some other 'Problems detected' so the problem is probably on my side.

Cool project though!

