
AI’s big leap to tiny devices opens world of possibilities - happy-go-lucky
https://blogs.microsoft.com/next/2017/06/29/ais-big-leap-tiny-devices-opens-world-possibilities/
======
idlewords
AI + tiny devices = automatic surveillance.

There's just no way to train dumb little computers on enough observed data.
You have to centralize the AI training at least, and therefore centralize data
collection. Most existing services (like Amazon Echo or Siri) also centralize
the AI logic, requiring them to be online to function.

This setup of ubiquitous, always-online smart devices reporting to a central
collection server is very hazardous. At the very least, we need to require
that these devices be disconnectable with a hardware switch (without losing
other functionality), and that the training data they send home be pre-cooked
as much as possible.

~~~
EGreg
The article is about doing the calculations on-board. This is about
decentralizing what is now centralized!

~~~
idlewords
The training remains centralized, and drives the data collection.

The article mentions de-centralized training (in the "bottom up" section), but
that begins by saying we have to reinvent the entire field.

~~~
candiodari
Even that still means that data collection becomes more limited now (before:
every, every, everything, like for any "cloud" product versus now: what they
need for training)

------
oh_teh_meows
"A technique called weight quantization, for example, represents each neural
network parameter with only a few bits, sometimes a single bit, instead of the
standard 32...The models are equally accurate, but the compressed version runs
about 20 times faster."

This is pretty great. How can we tell if a particular ML problem is amenable
to weight quantization without sacrificing accuracy?

~~~
justifier
weight quantization is basically a short list of shortened values used as an
index for a lookup table that represents the desired full values

if you have a 24bit value.. say, a 24bit color, that means you have
~16million.. 2^24==16777216.. possible colors

but if you only want to use 200 colors you can, instead of representing them
as the full 24bit value, use an 8bit value.. 2^8==256>200.. and have those
8bits represent a value in an index that points to the desired full 24bit
value

so you have to ask yourself.. what parameters of my neural net can be
represented as an index? or, what parameters are of a quantity less than the
parameter values' size?

wiki defines ann parameters as:

An ANN is typically defined by three types of parameters:

    
    
        The connection pattern between the different layers of neurons
        The weights of the connections, which are updated in the learning process.
        The activation function that converts a neuron's weighted input to its output activation.
    

here is a great paper that tries to answer this question for you in a way that
highlights error resulting from quantization decisions(i)

(o)
[https://en.wikipedia.org/wiki/Artificial_neural_network#Netw...](https://en.wikipedia.org/wiki/Artificial_neural_network#Network_function)

(i)
[https://www.cmpe.boun.edu.tr/~ethem/files/papers/fatih_icann...](https://www.cmpe.boun.edu.tr/~ethem/files/papers/fatih_icann01.pdf)

------
pcunite
_realizing the promise of a world populated with tiny intelligent devices at
every turn – embedded in our clothes, scattered around our homes and offices_

What an incredible responsibility we have to protect our families from the
misuse of this capability.

------
Aron
I wonder if there are certain patterns that take a lot of parameters to
express in an NN, that could be more efficiently represented using some other
kind of logic, and that could be automatically discovered by some variety of
algorithm such that subsection of the NN is replaced with this alternate form.
I mean a simple case is that I'm sure that an NN trained to do multiplication
is less efficient than just running a multiply op in the hardware. I'm talking
about the complicated scenario where some subset of the NN is performing a
replaceable and inefficient function.

~~~
visarga
Yes, this is a real technique. There are NNs that are mixed with regular
programming. As data propagates through the code, a graph is created and
gradients flow automatically backwards training the various neural net bits.
All this is fully mixable with functions, loops, if's and math expressions,
the only condition is that any instruction used has to allow for gradients to
flow - so it needs to be able to assign blame correctly from outputs to
inputs.

A second technique is to use deep learning to learn from stack traces. Any old
software could be stack-traced by inserting a few prints here and there. Then
a NN could learn recursive algorithms just by trying to recreate the whole
stack trace, not just the actual outputs. It's a way to distill plain old
programming into NNs, by incorporating side information that is cheap to get.
This would be useful to quickly teach a NN some algorithm while making it less
brittle than symbolic approaches. Imagine how many algorithms could be
extracted from conventional software.

~~~
Aron
Thanks for that. Very interesting.

On the first one, I assume we are still talking hand-generation\coordination
of the procedural bits. I was waving my hands at possibly _learning_ the
topology of those bits, possibly even from recognizing them as being
reproduced [inefficiently] in a trained NN.

I don't think I've ever pondered that second technique before and it's very
intriguing. Is there a canonical best-of-class in that category? Offhand, it
sounds brutally hard to do.

Also, I think DeepMind might have published something on an NN that learned to
write a procedural program that wrote a sort algorithm. Is that related?

~~~
visarga
After some considerable digging I came out with these papers:

Differentiable Programs with Neural Libraries - [https://www.microsoft.com/en-
us/research/wp-content/uploads/...](https://www.microsoft.com/en-
us/research/wp-content/uploads/2017/03/main2.pdf)

Making Neural Programming Architectures Generalize via Recursion -
[https://openreview.net/forum?id=BkbY4psgg](https://openreview.net/forum?id=BkbY4psgg)

Neural Programmer Interpreters -
[https://arxiv.org/abs/1511.06279](https://arxiv.org/abs/1511.06279)

~~~
Aron
Your first paper references the one I was thinking of: the Turing NN. Thanks.
I hope you learned something useful while digging.

------
teekert
I wonder what OS they run on the Pi. Since they don't mention Windows IoT Core
at all... I think machine learning is mainly a Linux field, right?

------
Aron
If you train a large parameter NN, and then somehow prune it to a lower
parameter NN, will it generally outperform an equally-sized lower parameter NN
trained from scratch? I'm not talking about reducing bit-counts per node here,
but total nodes.

------
louithethrid
Pack the whole miracle world into a box? Maybe- if you have enough boxes, you
can path some trodden path with boxes. Or you could stack them and put some
marketing hor'selsman on top of it, praising cowardice as innovation. It would
all be joke, if something new was at least tryied.

But what the article tells us is- basically , that its time to go back to
where we where before the cloud hype and get statistic models on the machines,
in the machines and horray. Full Circle. Till we are all in boxes.

~~~
walterbell
Each time the cheese is moved, the economic order mutates, creating new
winners and losers. That's reason enough to move in a circle, when you are not
currently a winner.

------
homarp
hidden in the article is the link to the github repo:
[https://github.com/Microsoft/ELL/](https://github.com/Microsoft/ELL/)

