
Deep learning with text: Learning when to skim and when to read - alrojo
https://metamind.io/research/learning-when-to-skim-and-when-to-read
======
Frenchgeek
So... We are now automating laziness? There goes my last stronghold...

~~~
r00fus
Isn't automation "laziness incarnate"?

~~~
lechiffre10
Automation doesn't necessarily equate to laziness. You can automate the boring
shit so that you can focus on more productive things.

~~~
taurath
Boring things can be productive things!

------
itschekkers
cool paper, i enjoyed following the post (and really appreciated the effort
put into the bokeh viz!)

i wonder if this would have been improved by being clearer about the
motivation. the authors frame it as though the ~60ms penalty for using the
LSTM for prediction is a huge burden, and i can imagine situations where it
is. however, it seems like if this is the case, we need some real life/"scaled
out" examples of how this solution would work in practice. e.g. how long does
the decision logic take to execute (maybe 5ms?); what proportion of the time
will you have to run the LSTM after the BoW model anyway? note that those
instances you are now worse off than just running the LSTM in the first place
(total time = BoW time + decision time + LSTM time). once you have all these
you can run the math and know (on average) how much time you'll actually save,
and how much performance you sacrifice

~~~
sixhobbits
A 'batch' is how much of the data you put in memory at once while training the
NN. To train even a small language model, you'll go through 1000s of batches,
so the time difference is way bigger than it sounds. I agree a more practical
example would have been nice -- maybe it'll come out in the paper.

~~~
itschekkers
my impression was that this was about the time taken to make each prediction,
not to train the model? and yep, looking forward to the paper!

~~~
alrojo
It was based on test time prediction, so given you have received a sentence,
how fast does it take to compute the prediction with either a bag-of-words or
an LSTM.

When you say practical example, would that be in the scenario that you have an
API server running? So to consider such costs as latency, data transfer, API
overhead etc.?

Thanks for your feedback!

------
everling
It irks me that BoW is referred to as an algorithm. They do refer to it as a
model later, why conflate it?

------
muyun_
toread

