
XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks - jarmitage
http://arxiv.org/abs/1603.05279
======
hacker42
Fantastic work. They do rely on floating point numbers for the weight updates
(I was secretly hoping that they would have binarized gradient descent).

~~~
Houshalter
It can't work with gradient descent, which requires about 16 bits of
precision. The problem is the gradients are small, and so are rounded down to
zero. Then when added together they get zero, even though the number should be
higher.

However it's been shown this can be solved by using stochastic rounding. But I
believe that would require specialized hardware to implement. I'm not sure.

~~~
kuschku
Can’t you just store the logarithm of the value instead of the actual value?

Then rounding should be irrelevant.

~~~
Houshalter
Logarithms are both expensive to compute, and don't magically add any more
bits of accuracy.

~~~
kuschku
Well, (a) 2-based logarithms are cheap to compute, and (b) add quite some bits
of accuracy if the alternative is rounding the value to 0.

~~~
Houshalter
They are not cheap to compute, I believe the best way to approximate them is
using expensive polynomial approximation which consumes a few cycles and
operations.

All they do is map the numbers into a different range. A logarithmic value can
represent more numbers near 0, but in order to add two logarithms, they need
to be converted back to normal form, where the small numbers are rounded down
to zero again.

There's no way around that, as adding a very small number to a very large
number will always require many bits of precision to do accurately, regardless
what transforms you use on the numbers.

------
xiphias
58x speedup sounds great, the question that is unanswered here is wheter using
more layers can get back the accuracy, and what kind of speedup can be
achieved compared to the original version?

------
lucciano
Sounds interesting :-) But is there any Github repo or Dockerfile cook book
available ? Last year, I have noticed these 2 repos:
[https://github.com/kevinlin311tw/caffe-
cvprw15](https://github.com/kevinlin311tw/caffe-cvprw15),
[https://github.com/kevinlin311tw/Caffe-
DeepBinaryCode](https://github.com/kevinlin311tw/Caffe-DeepBinaryCode)

How different are they ?

