Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Awesome work!!!! curious how you handle paragraphs and niche language like federal regulations.

What are your favorite ways to do sentence and paragraph embeddedings and is there a framework you like where you can tune to custom data? Do you find fine tuning your embedding model helpful?



Thanks! The post doesn’t cover fine tuning of the model which would be absolutely necessary (but out of scope for the post). Nils Reimers (the author of SBERT) has been on a speaking circuit covering Generative Pseudo Labelling to handle the vocabulary gap of new domains that a pretrained sbert model hasn’t seen yet.

https://youtu.be/qzQPbIcQu9Q




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: