Hacker News new | comments | show | ask | jobs | submit login
BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding (arxiv.org)
78 points by liviosoares 59 days ago | hide | past | web | favorite | 5 comments



Fantastic!

> The code and pre-trained model will be available at https://goo.gl/language/bert. Will be released before the end of October 2018.


"We demonstrate the importance of bidirectional pre-training for language representations". can some one help me understand what bidirectional and pre-trained means?


* bidirectional - build representations of the current word by looking into both the future and the past

* pre-trained - train on lots of language modelling data (e.g. billions of words of wikipedia) and then train on the task you really care about but starting from the parameters learnt from the language modelling task.


Is this comparable to fast.ai's ULMfit, which also promises an unsupervised pretrained model, that can be tuned to best state-of-the-art NLP tasks?


The big picture is similar. But ULMfit uses amd-lstm for the language modeling, bert uses masked LM instead. Bert has some other tricks like sentence prediction as well.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: