
Darpa Goes “Meta” with Machine Learning for Machine Learning (2016) - sethbannon
http://www.darpa.mil/news-events/2016-06-17
======
siavosh
I'm curious to hear from current researchers:

Back when I did some work in neural networks (10-15 years ago), the field had
become saturated with 'meta' papers on topology generation, and also on
optimization techniques, (slightly better gradient descent, pruning techniques
etc). Neural networks had become more art than science. At the end of that
period, I got the distinct sense that these were signs that the field had
stalled in any significant break throughs. Soon after they fell out favor for
well known reasons.

I'm curious outside of the current hype, if there's a sense of that now with
the current field practitioners and saturation of papers.

~~~
cr0sh
First off, I'm not an expert - right now, I'm an amateur student of the field.
I don't have any credentials or anything published. In other words, what I'm
going to say probably has little to no merit to it.

What I have seen in recent weeks, as I've investigated different news and
articles about machine learning, neural networks, and self-driving vehicles,
etc - is that it seems like, at least maybe for a certain class of problems,
there is a convergence on a generalized pattern, if not a solution, for
implementing the network.

This pattern or solution seems built off of the Lecun LeNet MNIST
convolutional neural network; specifically, the pattern seems to be roughly:

(input) -> (1:n - conv layers) -> (flatten) -> (1:n - fully connected layers)
-> (output)

where:

(1:n - conv layers) = (conv layer) -> (loss layer) -> (activation layer)

and:

(1:n - fully connected layers) = (fully connected layer) -> (loss layer) ->
(activation layer)

The (loss layer) is optional, but given that the (activation layer) seems to
have converged (in most cases I've seen, which is probably not representative)
to RELU, the (loss layer) is needed (sometimes) to prevent overfitting (simple
dropout can work well, for instance).

I don't want to say that this pattern is the "be-all-end-all" of deep
learning, but I have found it curious just how many different problems it can
be successfully applied to. The RELU operation, while not being differentiable
at zero - seems to work regardless (and there are other activation functions
similar to RELU that are differentiable - like softplus - if needed).

Anyhow - these "building blocks" seem like the basics - the "lego" for deep
learning; taking it from art to engineering. As a student, I've been playing
with TensorFlow - and recently Keras (a framework for TensorFlow). These tools
make it quick and easy to build deep learning neural networks like I
described.

My gut feeling is that I'm talking from my rear; my level of knowledge isn't
that great in this field. So - grain 'o salt and all that.

/disclaimer: I am currently enrolled and taking the Udacity Self Driving Car
Engineer Nanodegree; in the past I've taken the Udacity CS373 course (2012)
and the Standford-sponsored ML Class (2011).

~~~
malux85
Nice summary!

I would like to augment just a couple of your observations with a few
explanations --- I think the reason the architectures that you mention work so
well is because they exploit hierarchical structure inherent in the data. Lots
of things in the universe have hierarchical structure, e.g. vision has spatial
hierarchy, video has spatial and temporal hierarchy, audio has temporal and
spatial hierarchy.

(side note, thats how a baby learns vision "unsupervised" \-- the spatial
patterns on the retina have temporal proximity, so the supervisor is "time")

RELU is good, but you have to remember how you're vectorizing your data - if
you're vectorising is normalizing to 0-1 then RELU fits nicely, but if you're
scaling to mean 0 std dev. 1, then half your samples are negative and you'll
get information loss when RELU does this:
[http://7pn4yt.com1.z0.glb.clouddn.com/blog-relu-
perf.png](http://7pn4yt.com1.z0.glb.clouddn.com/blog-relu-perf.png) so
remember to keep your vectorizer in mind.

> My gut feeling is that I'm talking from my rear;

Nope not at all, you have good insight - my email address is in my profile,
come join the slack group too (email me, let's go from there)

~~~
Capt-RogerOver
Would you care to expand "(side note, thats how a baby learns vision
"unsupervised" \-- the spatial patterns on the retina have temporal proximity,
so the supervisor is "time")" to a couple of paragraphs? It seems like a very
deep and profound thought and would be very interesting to read.

------
zopf
First off: isn't this from June of 2016?

Second: I don't get it. The primary example they use to illustrate the need
has almost nothing to do with model building or selection, and everything to
do with selecting and painstakingly cleaning data. This mirrors my experience
with data science so far.

"A recent exercise conducted by researchers from New York University
illustrated the problem. The goal was to model traffic flows as a function of
time, weather and location for each block in downtown Manhattan, and then use
that model to conduct “what-if” simulations of various ride-sharing scenarios
and project the likely effects of those ride-sharing variants on congestion.
The team managed to make the model, but it required about 30 person-months of
NYU data scientists’ time and more than 60 person-months of preparatory effort
to explore, clean and regularize several urban data sets, including statistics
about local crime, schools, subway systems, parks, noise, taxis, and
restaurants."

So - the meta part isn't such a big deal. But if DARPA has found a way to
properly automate the painstaking process of selecting, cleaning, validating,
and normalizing data, well THEN we'll really have something to be impressed
about.

------
amelius
So after truck drivers, data scientists are next on the list of people who
will lose their jobs to ML.

It seems that this new technology impacts the poorly and highly educated
alike.

~~~
felippee
Not a single truck driver has yet lost his job to ML. Just a side note.

~~~
digler999
Driver jobs have been lost to autonomous container movers at shipping ports.
Not expecting ML to have been integral to that, but driver jobs were indeed
lost due to automation.

~~~
argonaut
You must be thinking of longshoremen, not drivers.

------
zitterbewegung
Should have a 2016 in the title. I know that Mathematica tries to do this with
machine learning functions and its available in Mathematica 10
[https://www.wolfram.com/mathematica/new-in-10/highly-
automat...](https://www.wolfram.com/mathematica/new-in-10/highly-automated-
machine-learning/)

~~~
Cacti
This isn't about automating model generation, which is an old goal of many,
but of DARPA's involvement in it and the start of a new program to encourage
work in this area.

------
hellofunk
Dah, that's nothing. We have models that we built specifically to determine
the best way to train other models. After 2 years of tweaking these models, we
learned that the best way to optimize our learning was to build new models
that train these models for determining how to train our other models.

~~~
beachbum8029
So it's models all the way down?

~~~
randcraw
Of course. Even Plato knew that. But since most models are wrong... there's
life yet in us 'bags of mostly water'.

~~~
Capt-RogerOver
Models are not wrong. They are just models. Ie, != reality.

------
jgalloway___
I pitched something similar a few years back to a client and the client's
architect agreed there was value in it but we couldn't make it fit in our
sprint schedule so went with some other user stories.

------
malux85
Topology generation is the new feature engineering, it's not surprising that
it's the first thing to go. I have been working on advancing HyperNEAT for
this very purpose - similar, but not quite the same to:
[http://blog.otoro.net/2016/05/07/backprop-
neat/](http://blog.otoro.net/2016/05/07/backprop-neat/)

Non-sequential and non-hierarchical topologies are the future, and it makes
sense that machines should generate them

~~~
aaronsnoswell
Can you expand on what you mean by non-sequential and non-hierarchical?

------
benevol
We're about to _really_ get to know and feel the exponential property of
technology.

------
radarsat1
With recent successes in specific games (Go, arcade games), I'd love to see a
resurgence of interest in "General Game Playing" research, i.e., the idea that
an AI is given a description of a game and learns how to play it.

Doing a search shows that it's a subject that is still being taught, but I
rarely see recent articles about it.

~~~
bluetwo
I'm also very interested in this area of AI. It does seem like it is due for a
resurgence.

I wonder a bit if there is a bias against the gaming aspect of it.

The jumps in poker have been kind of neat lately. Would like to see more
variations played.

~~~
mastazi
This is completely OT but I just googled about the topic of "solving" Texas
Hold'em Poker and other imperfect information games and discovered that one of
the contributors to the research on Counterfactual Regret Minimization
algorithms, developed at the University of Alberta[0], is Oskari Tammelin[1]
who also created, in the late 90s, my favourite piece of music software,
Jeskola Buzz[2]. His personal website[3] contains pages on both topics.

[0] [http://poker.srv.ualberta.ca/about](http://poker.srv.ualberta.ca/about)

[1] [https://arxiv.org/abs/1407.5042](https://arxiv.org/abs/1407.5042)

[2]
[https://en.wikipedia.org/wiki/Jeskola_Buzz](https://en.wikipedia.org/wiki/Jeskola_Buzz)

[3] [http://jeskola.net/](http://jeskola.net/)

~~~
radarsat1
That's pretty awesome! I used to use Buzz quite a bit too! In fact his
harddrive mishap is what pushed me over the edge to become an open source
advocate ;) I wish I could find the details about that accident, but if I can
remember correctly he lost the source code for Buzz and people spent quite a
bit of time reverse engineering the .exe just to be able to continue writing
"gear" for Buzz..

~~~
mastazi
Yes, he eventually re-wrote the whole application in .NET, that is the version
I'm using now and 99% of my old tracks and plugins from around 1999-2003 still
work OK in the current version!

I wish Buzz was open source too, I don't know if you've ever tried any but
there are a couple of similar open source applications, my favourite used to
be Psycle, even though the project doesn't seem to be very active nowadays
[http://psycle.pastnotecut.org/portal.php](http://psycle.pastnotecut.org/portal.php)
and there is also Buzztrax which directly inspired from Buzz
[http://buzztrax.org/](http://buzztrax.org/)

~~~
mastazi
BTW if anyone is looking for buzzmachines.com then you're out of luck, it
seems that the website was hacked and DB user info leaked, more back story
here:
[http://forums.jeskola.net/viewtopic.php?f=2&t=2040](http://forums.jeskola.net/viewtopic.php?f=2&t=2040)

The new reference site for Buzz gear is this one:
[http://buzz.robotplanet.dk/](http://buzz.robotplanet.dk/)

------
phaemon
Despite having thought of the dangers of AI, this literally never occurred to
me. Humans don't create AI, machines do.

~~~
LesZedCB
Imagine if we had the unambiguous source code for our own bodies? AI will be
able to have that without any work, and can modify and recompile with little
to no effect and all benefit in the matter of seconds, where humans have to
wait years.

This is referred to as the intelligence explosion[0] or AI Foom[1].

[0]
[https://en.wikipedia.org/wiki/Intelligence_explosion](https://en.wikipedia.org/wiki/Intelligence_explosion)

[1] [https://wiki.lesswrong.com/wiki/The_Hanson-Yudkowsky_AI-
Foom...](https://wiki.lesswrong.com/wiki/The_Hanson-Yudkowsky_AI-Foom_Debate)

~~~
mastazi
> Imagine if we had the unambiguous source code for our own bodies?

We sequenced human DNA, so I think we already have something kinda like it.
I'm not a scientist but my basic understanding is that while we have the whole
sequence (in your source code analogy, we can do git clone or git pull) we are
not really good at tweaking it ("oops this codebase uses architectural
solutions we're not familiar with, we will have to study it extensively before
we can deploy those proposed changes to the production environment")

~~~
jcranmer
DNA is not really source code. At best, it's kind of like... an algorithm
textbook. Only a very tiny portion of DNA (like, ~1-2% in Homo sapiens) codes
for proteins in the manner that you would have learned in biology class. A
much larger fraction is used somehow, but how much and what its functions are
is still a matter of major debate.

The effort to sequence the human genome has also led us to discover that there
are other ways in which traits can be heritable that don't involve changes to
DNA. The focus now is largely on epigenetics, although explanations invoking
microbiomes of bacteria inside our organs was definitely popular for a while.

In short, we really don't know a whole lot about our own molecular biology,
and a lot of the research in the past 60 years since the discovery of DNA has
tended to show "there's more going on than we thought." Where things involve
just DNA, we have very good tools for reading (sequencing) and writing
(CRISPR/Cas9) it. What we don't have good tools for is modifying our
epigenetics, and we don't have a good handle on what comes from DNA and what
comes from epigenetics.

~~~
mastazi
Thanks, I'm interested in this topic, would you be able to suggest sources
that are understandable by a layperson?

~~~
allenz
The standard way to learn this stuff is a course/textbook followed by reading
the important papers. I recommend
[https://ocw.mit.edu/courses/biology/7-28-molecular-
biology-s...](https://ocw.mit.edu/courses/biology/7-28-molecular-biology-
spring-2005/): it is well-organized and a great use of time, even though it
doesn't get into the latest most exciting topics.

If you just want an overview of what's going on, I curated these links for
you. For each, focus on the main idea and why it is significant for
accomplishing biological goals.

[https://en.wikibooks.org/wiki/Human_Physiology/Homeostasis#O...](https://en.wikibooks.org/wiki/Human_Physiology/Homeostasis#Overview)

[https://en.wikipedia.org/wiki/Molecular_biology](https://en.wikipedia.org/wiki/Molecular_biology)
section 1-2

Skim
[https://en.wikipedia.org/wiki/Central_dogma_of_molecular_bio...](https://en.wikipedia.org/wiki/Central_dogma_of_molecular_biology)

[https://www.khanacademy.org/science/biology/classical-
geneti...](https://www.khanacademy.org/science/biology/classical-
genetics/molecular-basis-of-genetics-tutorial/) watch 2x speed

[https://en.wikipedia.org/wiki/Genetics#Research_methods](https://en.wikipedia.org/wiki/Genetics#Research_methods)

[https://en.wikipedia.org/wiki/Cellular_communication_(biolog...](https://en.wikipedia.org/wiki/Cellular_communication_\(biology\))

[https://www.khanacademy.org/science/biology/cell-
signaling](https://www.khanacademy.org/science/biology/cell-signaling)

[https://mcb.berkeley.edu/courses/mcb110spring/nogales/mcb110...](https://mcb.berkeley.edu/courses/mcb110spring/nogales/mcb110_s2008_4signaling.pdf)

[https://en.wikipedia.org/wiki/Epigenetics](https://en.wikipedia.org/wiki/Epigenetics)
intro and diagram only

[http://www.zymoresearch.com/learning-
center/epigenetics/what...](http://www.zymoresearch.com/learning-
center/epigenetics/what-is-epigenetics) all pages

[https://www.jove.com/science-education-database/2/basic-
meth...](https://www.jove.com/science-education-database/2/basic-methods-in-
cellular-and-molecular-biology) understand basic research methods

[https://www.khanacademy.org/science/biology/human-
biology#im...](https://www.khanacademy.org/science/biology/human-
biology#immunology)

skim
[https://en.wikipedia.org/wiki/Induced_pluripotent_stem_cell](https://en.wikipedia.org/wiki/Induced_pluripotent_stem_cell)

[https://www.cancerquest.org/cancer-
biology/](https://www.cancerquest.org/cancer-biology/) is awesome. I recommend
angiogenesis, metastasis, and tumor-host interactions

~~~
mastazi
Thank you so much, this is awesome! I really appreciate the time you must have
spent to put it together!

------
mark_l_watson
From my limited experience of just working on a couple of DARPA projects
(decades ago), DARPA is interested in new technology focused on specific
problems.

The idea of automating aspects of data science like data cleansing and model
building makes sense. A good goal.

There is obviously a lot of excitement in deep learning success stories but I
would like to see more effort also put into both fusing more traditional AI
with deep learning, and also better interactive UIs for data science and
machine learning work flows.

A little off topic, but I started using Pharo Smalltalk again this week after
not touching it in a long time. I was thinking about using Pharo like
environments instead of tools like iPython for organizing workflows. I admit
this is likely not such a good idea because most of the great libraries for
data science and AI are in C++, Java, Python, and Scala.

------
yellowapple
I'm pretty sure this is _exactly_ how Skynet starts. ;)

More seriously, as someone who's interested in machine learning but doesn't
really know where to start in terms of really understanding it beyond "it
takes input, does some weird statistical magic, and gives you an output":
would one of these D3M tools (should they become readily available and less
theoretical) be a decent starting point? Or would it still be better from a
learning perspective to start with something more fundamental or "basic
building block"? In other words: does it help to know _how_ to build these
models from scratch even if you're using some tool to do it automatically?

~~~
Florin_Andrei
> _doesn 't really know where to start_

\- Take Andrew Ng's ML class on Coursera.

\- Install TensorFlow and TFLearn.

\- git clone something based on TF from GitHub and hack it.

\- Do some pet project from scratch.

\- profit

------
xiphias
The idea is nothing new..everything in machine learning is being automated.
The hard problem is to find.the next step that can be easily...just randomly
training new models and adjusting doesn't work, because training time goes up
too fast.

------
adamnemecek
I've been trying to find an answer to this for some time. Is it possible that
ML could be implemented much better on analog computers as they have a "native
support" for differential calculus?

~~~
dnautics
"native support" for differential calculus is not a problem for ML. If you
have a sufficiently powerful programming language, you can trivially do
forward differentiation/automatic differentiation instead of backprop which is
just about as "native support" as it gets (AD can also be done in, say, C, or
python, it's just trickier). Granularity is also not really that much of a
problem, you can reduce your bit precision to ~12 bits and still get good
results with ML.

~~~
adamnemecek
This isn't about language support or library support, this is about hardware
support. How many instructions does it take to derive/integrate a function on
x86? On an analog platform this is very close to "single instruction". What if
you could solve calculus idk, 1000x times faster. It could be even more
though.

~~~
dnautics
Yeah, and on analog platforms you have serious problems with accumulating.
Good luck doing a 100x100 matrix times a 100x500 matrix on an analog system,
which is the sort of thing you need for machine learning.

Automatic differentiation is usually very little overhead. Think "putting a
second set of values through a compiled slightly differently function".

Although there are calculus-related issues like saturation, smoothness of
transfer functions, etc. "solving calculus" is not the major problem with
machine learning efficiency and throughput. Linear algebra is the main
bottleneck.

~~~
p1esk
Actually, matrix-matrix multiplication is where analog computing really
shines.

A multiplication operation can be done with a single transistor: one operand
is gate voltage, another one is source to drain voltage. The resulting current
through the transistor is proportional to the product of the two. An addition
operation is even simpler: it's just connecting two wires carrying two
different currents - the resulting current on the output wire will be the sum
(due to Kirchhoff laws).

Of course, you pay for this efficiency with precision, which for some
applications (e.g. neural networks) can be a reasonable trade off.

------
kelvin0
>"...The goal of D3M is to help overcome the data-science expertise gap by
enabling non-experts to construct complex empirical models through automation
of large parts of the model-creation process..."

This makes me think of the premise behind Excel, and Access: provide a way for
(mostly) non-programmers a nice wizard like tool to analyze data. Of course,
the (painful) shortcomings of such tools are known all too well. Hope this
Darpa project fares better. On the plus side it democratizes access to a
technology, the good and the bad.

------
lowglow
This is epic and nice to see more people thinking this way. We're actually
tackling something similar at Asteria right now and we're about to open source
our device client software in a much needed RFC. Gitter here for anyone
interested:
[https://gitter.im/getasteria/General](https://gitter.im/getasteria/General)

------
randcraw
I suspect the goal here is to automate the task of intelligence analysts. As
the quantity of signal data has gone up exponentially, the DoD needs ever more
subject matter experts who are also schooled in data science techniques. But
that's a rare combo of skills, and given the high demand for these folks in
industry and the better pay there, I suspect Uncle Sam is suffering a dire
unmet need. Ergo the proposed solution: automate them.

But the example in the article of traffic modeling with 30 person-months of
analysis and 90 p-months of cleansing illustrates how hard such analysis
actually is. Ain't no way you can automate it.

Useful intel signal is hard to find in noise, especially in what by now must
be zottabytes. Enlisting _anyone_ to do analysis without deep domain
expertise, especially someone as dumb as a computer, is not a promising
strategy for success. But it sounds like a great way for beltway bandits to
get funding for long term blue sky R&D contracts...

------
eruditely
Is Darpa just a really good program or something? Across the years I have yet
to hear any governmental rachet toward decay, they have been seemingly
involved and doing it will for quite some time.

Any reason for this?

------
slantaclaus
Upvote because title

