Hacker News new | past | comments | ask | show | jobs | submit login
Training an AI to Extract Information from PDF Files with Varying Section Titles
1 point by Philosophia 6 days ago | hide | past | favorite | discuss
How can I train an AI to extract details from PDF files? The sections I want to extract may have different titles for the same content. For example, let's say we have 1000 PDF files of essays. Each essay has a section for "background," but the section might be titled "background" in some PDFs and "my story" in others. The AI needs to identify these varying titles, determine where the section starts and ends, and then copy that content into an .xls file.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: