
Generative Teaching Networks: Accelerating Neural Architecture Search - yigitdemirag
https://eng.uber.com/generative-teaching-networks/
======
CShorten
I made a video explaining this research if you are interested:
[https://www.youtube.com/watch?v=lmnJfLjDVrI&t=4s](https://www.youtube.com/watch?v=lmnJfLjDVrI&t=4s)

------
Eug894
Isn't it interesting what is more efficient: neural nets or a learning Mealy
machine? Anyway, an optimization of an exhaustive search is a slow but assured
way of solving a car driver problem. You don't need the most accurate
simulation for it as Elon says here:

[https://www.youtube.com/watch?v=Ucp0TTmvqOE&t=7358](https://www.youtube.com/watch?v=Ucp0TTmvqOE&t=7358)

A "brute-force" algorithm (an exhaustive search, in other words) is the
easiest way to find an answer to almost any engineering problem. But it often
must be optimized before being computed. The optimization may be done by an AI
agent based on Neural Nets, or on a Learning Mealy Machine.

A Learning Mealy Machine is an finite automaton in which training data stream
is remembered by constructing disjunctive normal forms of the output function
of the automaton and the transition function between its states. Then those
functions are optimized (compressed with losses by logic transformations like
De Morgan's Laws, arithmetic rules, loop unrolling/rolling, etc.) into some
generalized forms. That introduces random hypotheses into the automaton's
functions, so it can be used in inference. The optimizer for automaton's
functions may be another AI agent, or any heuristic algorithm, which you
like...

Some interesting engineering (and scientific) problems are: \- finding a
machine code for a controller of a car, which makes it able to drive
autonomously; \- finding a machine code for a controller of a bipedal robot,
which makes it able to work in warehouses and factories; \- finding a CAD
file, which describes the design of a spheromak working with a guiding center
drift generator (hypothetical device, idk!); \- finding a CAD file, which
describes some kind of working Smoluchowski’s trapdoor (in some specific
conditions, of course); \- finding a file, which describes an automaton
working in accordance to the data of a scientific experiment; \- finding a
file, which describes manufacturing steps to produce the first molecular
nanofactory in the world.

Related work by Embecosm is here: superoptimization.org Though it seems people
have superoptimized only tiny programs so far as you can see from the ICRL
2017 paper (App. D): arxiv.org/abs/1611.01787 And loops can also be rolled,
not just unrolled. That kind of loop optimization seems to be absent here:
en.wikipedia.org/wiki/Loop_optimization

If you have any questions, ask me here:
[https://www.facebook.com/eugene.zavidovsky](https://www.facebook.com/eugene.zavidovsky)

~~~
heyitsguay
That sounds very different from what Uber is doing here, which is basically
accelerating training with synthetic data to accelerate otherwise standard
neural architecture search tools. The focus is on the data synthesis network.

Also, the system you describe sounds impractical for any of the complex
learning tasks you suggest, especially if it hasn't even done much simpler
things yet. Why would machine code be the right level of abstraction for a
vision or robotics problem?

~~~
Eug894
> Why would machine code be the right level of abstraction for a vision or
> robotics problem?

That code would be used to calculate the output function and the transition
function of the automaton. At first, as the automaton tries some action and
receives a reaction, those functions are constructed accordingly in plain movs
and cmps with jmps (suppose x86 ISA here). Then a whole machine code of all
actions-reactions is optimized by arithmetic rules, loop _rolling_ and
unrolling, etc, so its size is reduced. That optimization may include some
hypotheses about Don't Care values of the functions too, which will be
corrected in future passes, if they turn out to be wrong... Imagine that code
running on something like Thomas Sohmers' Neo processor or Sunway SW26010.

Yeah, it is completely different to Neural Nets. I posted it here because I
feel the urge to popularize the idea : ) I am a dilettante in machine learning
actually.

------
lettergram
For those interested, I also work in this area: [https://medium.com/capital-
one-tech/why-you-dont-necessarily...](https://medium.com/capital-one-tech/why-
you-dont-necessarily-need-data-for-data-science-48d7bf503074)

Arguably, this is still a new field — but IMO will eventually become standard
practice. Imo you can completely separate humans from data and still do
machine learning (likely analytics). This would dramatically limit data
breaches if implemented properly.

------
w_t_payne
I really like the idea of optimising the 'direct' training data, and wonder
how it would interact with the use of synthetic data as the 'indirect'
training data. Or perhaps some sort of restriction on the (optimised) 'direct'
training data as a form of regularisation. Lots of potential ideas to explore
here.

~~~
felipepsuch
The generator we use in our paper is a form of restriction of 'direct'
training data. You can think of it as a weird encoding for images.

PS: I'm the author of the GTN paper. Feel free to ask any questions

~~~
gcucurull
Hey, congrats on the paper, I read it a while ago and thought it was really
interesting.

I tried implementing it, and the samples generated by the Teacher seem to
suffer from mode collapse (as if the generator is ignoring the random vector z
but not the label condition). Do you recall having that issue at some point?

I have to say I'm using a simpler generator than the one in the paper, and I'm
not changing the learner architechture at each batch, only its weights.

Thanks!

~~~
felipepsuch
Thanks, I'm glad you liked it! Mode collapse was actually the one thing I
never encountered during my exploration (which was the reason we looked into
using GTNs as a mode-collapse solution for GANs). That said, I found meta-
learning to be surprisingly hard to implement efficiently and ran into more
bugs in both PyTorch and TensorFlow than I can count.

Changing the learner architecture is not that important actually so that's
probably not your problem.

~~~
gcucurull
Ok, I'll keep digging to figure out where the problem might be, thanks!

------
SubiculumCode
Not my area of expertise. Is the innovation here searching over generated
training examples that appear optimize training efficiency/rate of learning
for the target task?

------
SubiculumCode
Does this technique potentially allow training on smaller datasets? I am
thinking in application with neuroimaging datasets, which are usually numbered
in the hundreds.

