Hacker News new | past | comments | ask | show | jobs | submit login

The contextualized word embeddings you get out of BERT are still generated from fixed per-word vectors. And while you get one output vector for each input vector, that doesn't mean they correspond to each other. The model could arbitrarily reshuffle information between outputs, so long as the output as a whole reflects the input sufficiently well. So BERT embeddings are not "word embeddings" in the usual sense.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact