Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've been using camelot, which builds on the lower python pdf libraries, to extract tables from pdfs. Haven't tried anything exotic, but it seems to work. The tables I parse tend to be full page or the most dominant element

https://camelot-py.readthedocs.io/en/master/

I like Camelot because it gives me back pandas dataframes. I don't want markdown, I can make that from a dataframe if needed



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: