
Transfer learning in low-data environments with FloydHub, fast.ai, and PyTorch - robbiemitchell
https://blog.frame.ai/learning-more-with-less-1e618a5aa160
======
twillmas
Looks like this task focused on binary sentiment analysis (positive or
negative movie reviews) - have you tried this on something with a broader
potential output space? This seems relevant for what you’re calling “neural
tags” on your client’s customer conversations, which seems more open-ended
than simply “positive” or “negative”.

~~~
jessestcharles
Yes, great insight! The choice to focus on sentiment here was mainly to align
with fast.ai’s original research—-hopefully maximizing the generalizability
and accessibility of the results.

Internally we have made use of these results to improve a broad set of
language tasks, hopefully will be able to publish on those in the coming
months as well.

------
jdavies1618
Can you explain the domain-only model overtaking ULMFiT at 65k(?) unlabeled
examples? Just noise, or is the ULM contribution competing with the domain
model in some way?

~~~
jessestcharles
I think in this case the alignment between unlabeled domain data and the
language task supports a convergence in language task performance. One
argument to continue to prefer the ULM+domain model is that it is likely more
generally capable if you remain in same domain but switch to a task that is
less directly related to your unlabeled data. I haven’t seen any research that
directly speaks to that intuition so it’s a good area for further study.

------
jessestcharles
Author here! Happy to answer any questions that folks have.

------
eps248
Out of curiosity, have you done much work on examining your
misclassifications? I'd be curious to know if there are giveaways for
"negative" sentiment that show up in your task versus, say, reviews of
Spiderman II.

~~~
jessestcharles
In this work we didn't explore classification performance characteristics. I
suspect the nature of the misclassification at lower levels of domain data
would revolve around the ways language usage differs in reviews vs common
english. "Blockbuster" may have generally negative or neutral sentiment in a
wikipedia based language model, perhaps most often referring to the failed
rental chain. In the context of movie reviews "blockbuster" is almost always
universally positive.

------
nateburke
How are you using these tools to solve your business problems?

~~~
jessestcharles
One of the fantastic qualities of embedding based language models is that they
provide a view on a semantic space that can be used quantitatively in most any
downstream language task. As a conversational intelligence company Frame has
many products that are enhanced by having a high quality domain specific
language model to build on: tagging, sentiment, topic extraction, key words,
summarization, etc. Best of all, these products can be iterated on in
parallel! Improvements in a language model’s representation of a body of text
should improve all downstream task without modification.

~~~
mlthoughts2018
The same thing is also true for computer vision models. A core deep network
usually trained either with a dual embedding of associated text against search
ranking or trained to predict tags or labels. The output network may be of
limited use on the original training task but ends up producing an excellent
embedding model by extracting the neurons from some deep layer.

You start automatically encoding your entire image collection and incoming
images into that embedding model and rely on it as a lingua franca on which to
base all sorts of other companion models like object detection, face
recognition, gender/age/ethnicity prediction, spam detection, aesthetic /
composition appraisal, caption generation, style transfer etc etc.

