
Factorization tricks for LSTM networks - Katydid
https://arxiv.org/abs/1703.10722
======
mgraczyk
Since the authors appear to be listening on here, I have a question about the
method.

From my experience, RNN weights and the recurrent weights Tf in an LSTM tend
to look more like (I + low_rank) rather than low_rank. To be more specific, I
gather that with your F-LSTM you do:

    
    
      T1 = W1 * input
      T2 = W2 * T1
      output = T2
      so
      output = W2 * W1 * input
    

Where W1 and W2 are "factorized by design".

However, it seems like the recurrent weights (those for f in your paper)
should look more like

    
    
      T1 = W1 * input
      T2 = W2 * T1
      output = input + T2
      so
      output = (I + W2 * W1) * input
    

That way, you are imposing the simplification that the Tf ~= (I + low_rank)
instead of Tf ~= low_rank. Have you considered this?

~~~
option
>>"RNN weights and the recurrent weights Tf in an LSTM tend to look more like
(I + low_rank)" \- Do you have reference for this? But thanks for suggestion,
I'll have a look into this.

>>"I gather that with your F-LSTM you do:" \- Your understanding of F-LSTM
looks correct

>>"output = input + T2" Not quite clear where to put non-linearities. But this
looks similar to residual connections. Which, in my experience, is almost
always a good idea.

~~~
juxtaposicion
If you've seen tensor trains, might be interesting to reduce the rank even
further by splitting an input into many sequential small matrices (instead of
just two.)

------
xiphias
It's crazy and scary at the same time how fast new approaches are improving
machine learning efficiency. Am I only one who thinks that we will be able to
simulate the brain with much less computational power than the brain itself
has?

~~~
zenlikethat
The brain is a highly inefficient system evolved arbitrarily and slowly over a
long time. It does seem likely that we could, with much fewer computational
resources, simulate the function of some of the tasks which brains are
responsible for or "good at", such as image classification.

However, "simulating the brain" in general is a much broader problem in scope.
Consider: Can you even really truly say that you have "simulated the brain"
without also including a physical body for this system, since a brain is
inherently tied to a body and corporal existence in the world? It's not clear.

It's likely the problem won't be computational power but figuring out how to
extract or isolate only the bits of the system we are interested in from the
rest. Consider the genome, for instance: we have had the human genome mapped
since 2003 but the unfathomable complexity of the system as a whole makes it
difficult (though not impossible) to apply for useful stuff.

It's not just about computational power, it's also about defining
computational models that can create human-competitive performance for certain
tasks, or, in the case of AGI, for all tasks that might be expected of a
human. That's the fiendishly tricky bit.

~~~
zbyte64
Who you calling inefficient? :-p

Those "inefficiencies" are for handling a messy noisy world where we can't
just settle on a single solution. There is allot of overhead to play in the
game of evolution.

------
subhrm
Seems quite interesting. I would try it this week.

~~~
option
author here. let me know (on Github) if you encounter any issues.

------
backpropaganda
A much better paper worthy of attention from the HN crowd is this: Using Human
Brain Activity to Guide Machine Learning
([https://arxiv.org/abs/1703.05463](https://arxiv.org/abs/1703.05463))

