
BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding - liviosoares
https://arxiv.org/abs/1810.04805
======
pacala
Fantastic!

> The code and pre-trained model will be available at
> [https://goo.gl/language/bert](https://goo.gl/language/bert). Will be
> released before the end of October 2018.

------
zimzim
"We demonstrate the importance of bidirectional pre-training for language
representations". can some one help me understand what bidirectional and pre-
trained means?

~~~
willwill100
* bidirectional - build representations of the current word by looking into both the future and the past

* pre-trained - train on lots of language modelling data (e.g. billions of words of wikipedia) and then train on the task you really care about but starting from the parameters learnt from the language modelling task.

------
wodenokoto
Is this comparable to fast.ai's ULMfit, which also promises an unsupervised
pretrained model, that can be tuned to best state-of-the-art NLP tasks?

~~~
jiuren
The big picture is similar. But ULMfit uses amd-lstm for the language
modeling, bert uses masked LM instead. Bert has some other tricks like
sentence prediction as well.

