Hacker News new | comments | show | ask | jobs | submit login
BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding (arxiv.org)
78 points by liviosoares 7 days ago | hide | past | web | favorite | 5 comments


> The code and pre-trained model will be available at https://goo.gl/language/bert. Will be released before the end of October 2018.

Is this comparable to fast.ai's ULMfit, which also promises an unsupervised pretrained model, that can be tuned to best state-of-the-art NLP tasks?

The big picture is similar. But ULMfit uses amd-lstm for the language modeling, bert uses masked LM instead. Bert has some other tricks like sentence prediction as well.

"We demonstrate the importance of bidirectional pre-training for language representations". can some one help me understand what bidirectional and pre-trained means?

* bidirectional - build representations of the current word by looking into both the future and the past

* pre-trained - train on lots of language modelling data (e.g. billions of words of wikipedia) and then train on the task you really care about but starting from the parameters learnt from the language modelling task.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact