Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Annotate Images and Scans with BIO-Scheme Using Konfuzio SDK (colab.research.google.com)
2 points by konfuzio on Aug 16, 2021 | hide | past | favorite | 1 comment



Retraining NLP models, like flair, often require the data structure to be in the BIO scheme. For scanned documents or images, we convert visual annotations to the BIO scheme using OCR, we transform the bounding box to the start and end offsets of each annotation and its label. In the new release of our SDK, this conversion can be done using the method get_text_in_bio_scheme() of the Document class.

Find the source code here https://github.com/konfuzio-ai/document-ai-python-sdk/blob/b...

Many other file types are supported. Have a look at https://dev.konfuzio.com/web/api.html#supported-file-types




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: