Hacker News new | past | comments | ask | show | jobs | submit login

Of course, the problem with these "synthetic" datasets are when the translation models overfit to the problems in these multilingual sentence encoders/embedders.

Still, it lets you massively increase the amount of data you're training on, which is generally worth it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: