
Critical Behavior from Deep Dynamics: A Hidden Dimension in Natural Language - hacker42
http://arxiv.org/abs/1606.06737v2
======
mark_l_watson
This seems like a very important paper, basically showing that Markov models
with exponential decay of influence of tokens by distance are often a poor
model, where as deep neural networks with LSTM (long short term memory) has
power law decay of influence decay, which performs better for a variety of
sequential data.

BTW, I went to the North American Association of Computational Linguistics
conference in April and it seemed like half the papers used LSTM.

Edit: the NAACL 2016 papers are here:
[http://aclweb.org/anthology/N/N16/](http://aclweb.org/anthology/N/N16/)

~~~
forgotpwtomain
> This seems like a very important paper, basically showing that Markov models
> with exponential decay of influence of tokens by distance are often a poor
> model, where as deep neural networks with LSTM (long short term memory) has
> power law decay of influence decay, which performs better for a variety of
> sequential data.

As someone outside of this field, it seems to me that this kind of result
should have been very obviously foreseeable, hindsight bias and all that of
course - but I would never have considered Markov processes to be an adequate
predictability model for natural language. Though obviously the formalized
results are important.

Could someone with more knowledge comment on what the current working
assumptions were prior to this paper and what the consequences would be?

~~~
thomasahle
> I would never have considered Markov processes to be an adequate
> predictability model for natural language.

Would you consider LSTM an adequate model?

~~~
TheOtherHobbes
Personally, no. I think all of these models are essentially trivial and a long
way from genuine NLP.

That doesn't mean they're not useful in very narrow domains. But language is
pretty much the definition of the ultimate wide domain, and trying to cover it
with statistical correlations makes as much sense as word counting Shakespeare
to try to generate some new plays.

~~~
mark_l_watson
I think you might be pleasantly surprised by recent results using DL and LSTM
for building models of natural language. The next advancement I would like to
see is handling anaphora resolution (resolving pronouns to previous noun
phrases in text, resolving words like 'there' to a place mentioned elsewhere
in text, etc.) Progress has been so rapid that I bet I don't have to wait
long.

~~~
nschucher
Have you seen the results from Dynamic Memory Networks? [0]

The relevant example from the paper:

    
    
      I: Jane went to the hallway.
      I: Mary walked to the bathroom.
      I: Sandra went to the garden.
      I: Daniel went back to the garden.
      I: Sandra took the milk there.
      Q: Where is the milk?
      A: garden
    

Obviously just a toy task, but as you said, progress is rapid!

[0]: [http://arxiv.org/abs/1506.07285](http://arxiv.org/abs/1506.07285)

~~~
mark_l_watson
Thanks for the link!

------
BenoitP
> [...] A Hidden Dimension in Natural Language

Mmmh

> [...] We show that in many data sequences — from texts in different
> languages to melodies and genomes

Hum, ehrm

> [...] natural languages are poorly approximated by Markov processes.

Alright, alright

> [...] This model class captures the essence of probabilistic context-free
> grammars

Ok, ok

> [...] and cosmological inflation

Wat.

Out of nowhere, Creation of the Univerve.

\-------------

I'm always baffled by the ability to draw parallels. Did a colleague take at
peek at the screen and said, hey I have the same equations?

~~~
jakub_h
> Did a colleague take at peek at the screen and said, hey I have the same
> equations?

This would an interesting thing to try - a computer system that would scan all
the papers for math and find parallels. I think we already have something like
term indexing for deductive systems?

~~~
kough
This is theoretically very possible, and I know that at least a few people
([http://ccimi.maths.cam.ac.uk/projects/create-semantic-
search...](http://ccimi.maths.cam.ac.uk/projects/create-semantic-search-
engine-mathematical-literature/)) are working on it.

~~~
jakub_h
Very yummy, thanks!

------
w_t_payne
I wonder what programming languages look like? I'd guess a lot like natural
language?

~~~
xtacy
Not quite, but as the paper says:

    
    
        Corollary:  No  probabilistic  regular  grammar
        exhibits criticality.
    
        In the next section, we will show that this statement is
        not true for context-free grammars (CFGs).
    

That is, there exists CFGs that exhibit criticality. Programming languages are
often parsed by CFGs, so it's likely that some programming languages exhibit
the same criticality structure as natural languages.

~~~
rntz
I think that means only that programs written in a language described by a CFG
_could_ exhibit "criticality", not that they will. "Exhibiting criticality" is
a property of a distribution (e.g. a corpus of human-written programs or an
algorithm for generating programs), not of a grammar, IIUC.

------
meeper16
Here's an implemented Markov + word2vec chatbot
[http://lexcognition.com/lexi.html](http://lexcognition.com/lexi.html)

~~~
the_duke
I am VERY impressed:

\-----

you: tell me something

me: Don't speak for me first time I asked it

you: tell me about your mother

me: just like my mother is on that AK47 diet

you: Tell me about artificial intelligence

me: Okay , maybe not intelligence capabilities, etc

you: ask me something

me: _Points bow_ No one would ask this haha

------
hacker43
hmm...maybe the way we then encode data digitally is also "wrong" or should I
say unnatural.

~~~
jakub_h
It seems to me that if we humans are the only ones doing this, it is either
trivially natural or trivially unnatural based on whether you include or
exclude our creations in/out of the natural world.

------
wingcommander
"... which explains why natural languages are poorly approximated by Markov
processes."

Is that a joke? It's 2016 and you think you need to explain that a Markov
process is a poor approximation of natural language? This has been obvious for
computational linguists, and anyone working in the field, from day one.

------
mrcactu5

      The Bach data consists of 5727 notes from Partita No. 2 [11], 
      with all notes mapped into a 12-symbol alphabet
      consisting of the 12 half-tones {C, C#, D, D#, E, F, F#, G, G#, A, A#, B} 
      with all timing, volume and octave information

discarded.

I was good until the last part.

~~~
curiousgal
Can you elaborate?

