
A Convolutional Neural Network for Modelling Sentences (2014) [pdf] - ilyaeck
http://arxiv.org/pdf/1404.2188.pdf
======
nl
If you are interested in this, the list of papers for the 2015 International
Conference on Learning Representations has been released:
[http://www.iclr.cc/doku.php?id=iclr2015:main#submissions](http://www.iclr.cc/doku.php?id=iclr2015:main#submissions)

There's some pretty good stuff there. I really liked _Crypto-Nets: Neural
Networks over Encrypted Data_ [1]:

 _The problem we address is the following: how can a user employ a predictive
model that is held by a third party, without compromising private information.
For example, a hospital may wish to use a cloud service to predict the
readmission risk of a patient. However, due to regulations, the patient 's
medical files cannot be revealed. The goal is to make an inference using the
model, without jeopardizing the accuracy of the prediction or the privacy of
the data._

[1] [http://arxiv.org/abs/1412.6181](http://arxiv.org/abs/1412.6181)

------
kastnerkyle
If you see convolution as basically truncated recurrence this approach ties in
very strongly to recent approaches to machine translation using recurrent
nets. I guess depth _should_ allow you to find longterm dependencies, but the
fact that CNN were designed for images which have strong local structure and
much weaker long term structure makes me think RNNs are better for language,
where we see a lot of important long term dependencies. As an example: "The
man with the long brown hair entered the saloon" \- I would tie saloon and man
as the key pieces of that sentence, but that dependency is pretty long and
somewhat different than natural images where you don't really expect the
corners of images to have any strong relationship in general.

~~~
bearzoo
^agreed - no ANN can be treated as a 'jack of all trades' because of the
success found in one domain (it seems more networks are designed to boost
performance on one type of data)

------
skythomas
Would like to actually see these CNN's first hand but if I'm not missing
something on a first quick pass, the amazing results of 2013-2014 are
continuing. Major gating barriers that have stymied AGI's are disappearing.
This result is cool on its own. But, what really excites is how this provides
a key foundational building block for even cooler ideas. When you have
sentence structure, it naturally lends itself to many unsupervised NLP and
learning tasks. Just wow!

------
gbachik
How do you guys understand this stuff!? The topic absolutely fascinated me,
but after reading the paper... I just feel dumb. Are there any good resources
I could use to better understand machine learning papers such as this? I mean
I can't even comprehend the implications this paper could have? Is anyone
opening to doing some mentoring? Gbachik@gmail.com

~~~
chriswarbo
Machine learning can be quite jargon-heavy, and in my experience the core
ideas can often be hidden amongst a bunch of unnecessary maths.

I got frustrated by a paper yesterday which contained function definitions,
summations-of-summations, products of sequences, convolutions, set theory,
switching back-and-forth between unary-functions/vectors and binary-
functions/matrices, converting back-and-forth between {0, 1}, {-1, 1} and
{true, false}, weighting elements of a set by 0/1 instead of taking a sub-set,
linear programming, etc.

What was their result? To speed up pair-wise comparisons of structured data,
only do N% of the comparisons and it will only take N% of the time. To decide
which comparisons to discard, see what works well on a small sample of inputs.

------
rmcfeeley
Why don't we reallocate some of this effort to teaching humans how to use
language more effectively? Or is it better to just delegate the responsibility
for truth to a seemingly "pure" logic that we like to think exists outside of
our human condition?

------
gojomo
The progress here is getting spooky, including the success of such networks
for captioning arbitrary images and translating between natural languages. I
read that one researcher predicts live narration of video within 5 years.

A cold-splash-in-the-face intro for laypeople can be found in the TEDx talk of
Jeremy Howard (founder of Kaggle):

 _The wonderful and terrifying implications of computers that can learn_ –
[https://www.youtube.com/watch?v=xx310zM3tLs](https://www.youtube.com/watch?v=xx310zM3tLs)

For cutting-edge research, it seems the "NIPS" conference each December is
where many of the new results appear:

[http://nips.cc/](http://nips.cc/)

------
ilyaeck
Inviting a discussion on the topic.

------
biomimic
I see Thomas Mikolov, the creator of Google's Word2Vec
([https://www.kaggle.com/c/word2vec-nlp-
tutorial/forums/t/1234...](https://www.kaggle.com/c/word2vec-nlp-
tutorial/forums/t/12349/word2vec-is-based-on-an-approach-from-lawrence-
berkeley-national-lab)) is referenced in the paper.

~~~
gojomo
Notably, Mikolev is now at Facebook.

My hunch (as a total outsider) is that anything Google publishes is about 2
years behind their current best practices.

~~~
gwern
The back-and-forth on the ImageNet record between Google/Facebook/Baidu
suggests that, unless they're exquisitely coordinating research release, at
least some of what they're releasing is indeed close to their state of the art
_.

_ obviously writing it up does mean the specified results will be a bit behind
what they can actually do in their labs, but that's true of everywhere

~~~
gojomo
Not sure why that follows. (Can you outline the reasoning a bit more?)

As an example model, imagine all three are years ahead of published material,
and each has policies that encourage publishing only the minimum necessary to
"take the crown" (because any more would risk diluting proprietary
advantages).

That'd result in the observed back-and-forth, and – given lead-times on paper-
writing, internal review, and marquee conferences – it wouldn't necessarily
force an acceleration of the "published state-of-the-art" up to the level of
all of their "internal states-of-the-art". (Perhaps, for a group in
trailing/catch-up position, their published results will be very close to
their best. But the leader could be arbitrarily further ahead.)

By analogy to English auction bidding: outsiders only learn the second-highest
reserve price just before the end, and never learn the winner's true reserve
price. But here there's no end, and there's always potential competitive
reasons for the top-N pack to hide some of their leading-edge practices.

This doesn't require explicit coordination… but there could be explicit
coordination, too! All the programs are staffed by former
students/colleagues/coworkers of each other.

~~~
gwern
> That'd result in the observed back-and-forth, and – given lead-times on
> paper-writing, internal review, and marquee conferences – it wouldn't
> necessarily force an acceleration of the "published state-of-the-art" up to
> the level of all of their "internal states-of-the-art".

If they aren't coordinating very carefully, a single _defector_ would result
in the entire slack being used up by a single announcement; and the bigger the
slack used up, the more a PR win it is...

~~~
gojomo
OK, but in this sort of race, publishing your very latest internal techniques
lets everyone else instantly catch up.

If these techniques are commercially valuable and in current use – and I
believe they are! – then no leading company, or even member of the leading
pack, would want to do a current-best reveal. They all have competitive
reasons to do only carefully vetted, incremental reveals of somewhat-older
work.

~~~
gwern
You can publish papers without giving away all the details necessary to make
it work. Sad but true.

For example, submitted to HN sometime ago was the blog of a dude trying to
make Deepmind's 'neural turing machine' work; he was having a hella time
because the published paper seems to be unclear or skip over a number of
crucial points. Or more historically, the German chemical giants made an art
of filing patents on all their key techniques, to gain IP protection, but
leaving out enough crucial details that when the USA gleefully seized their IP
rights during WWI, the American companies discovered they couldn't make the
processes work.

~~~
gojomo
Indeed, and if that's also happening with the deep-learning papers here, then
it's another mechanism in support of my main point: what's published is often,
by motivated choice, behind what's being done internally. (And not just, "time
it takes to write up" behind.)

Given that intent, there's no way to deduce from the pattern of new claimed
results whether what's being revealed is a merely a few months, or many years,
behind.

