
TensorFlow Code for Google Research's BERT: Pre-Training Method for NLP Tasks - ArtWomb
https://github.com/google-research/bert
======
mooneater
Nice, I was just starting to contemplate about how much work it would be to
train a bert model from scratch.

Context to understand the importance of this release: [http://ruder.io/nlp-
imagenet/](http://ruder.io/nlp-imagenet/) Though not named in the post, BERT
is part of the same family of models as these.

------
ttul
Let’s say I wanted to classify email subject lines as “likely spam” vs “likely
legitimate”. As input, I have billions of examples of each.

Would BERT help me by first enabling me to transform the input subject lines
into vectors in a high dimensional vector space which could then be the inputs
into a relatively shallow network that does the classification?

~~~
jean-
Absolutely. If your only input is the subject line, then you're dealing with a
single-sentence classification task. You'd need to take the "class label"
vector from the top layer of BERT (labelled "C" in Fig 2b of the paper) and
then feed that to your own classifier.

For the experiments in paper they actually fine-tuned BERT on the downstream
task, but I reckon you'd get acceptable performance by just keeping it fixed
and using its outputs as features for a shallow classifier.

------
laichzeit0
Fully replicable code, yay! I really hope this becomes the norm. In fact, I
hope this approach becomes so expected that researchers outright ignore papers
reporting experimental results without fully replicable code and instructions
released with it.

------
LittlePeter
The paper is at
[https://arxiv.org/abs/1810.04805](https://arxiv.org/abs/1810.04805) however
the PDF link gives error:
[https://arxiv.org/pdf/1810.04805](https://arxiv.org/pdf/1810.04805)

~~~
LittlePeter
[https://arxiv.org/pdf/1810.04805](https://arxiv.org/pdf/1810.04805) works now

------
rawoke083600
Can I get JUST the embeddings out of the model for lets say:"My cat dance in
the rain" and compare that with the embeddings from "my dog runs in the rain"
to compare math-wise(vector-diff) how close they are ?

~~~
ttul
Pretty sure you can tap the model at any layer.

------
angel_j
TIL LSTMs are out, Transformers are in.

~~~
ttul
LSTMs were “too good to be true”.

~~~
miemo
why?

~~~
ttul
So simple yet so effective - to a point.

