
How to Convert a PDF to Excel - ingve
https://tomassetti.me/how-to-convert-a-pdf-to-excel/
======
KSS42
Maybe I am missing something here, but Excel can already import a PDF and
convert it into a table.

Also Excel recently added the ability to scan an image and convert to a table.

[https://www.zdnet.com/article/microsoft-brings-ability-to-
tu...](https://www.zdnet.com/article/microsoft-brings-ability-to-turn-photos-
of-table-data-into-excel-spreadsheets-to-ios/)

------
ACow_Adonis
As someone who has now done this exact thing a couple of times for work, I'm
going to say that I disagree with the article about one point: you probably do
need developers/data scientists unless your extraction/standardisation
exercise is trivial in size and complexity. Obviously that depends on your
subject matter a bit, but generally people won't approach you if its small
scale and already in a good structure.

It's all well and good sucking out tables from pdfs in something roughly akin
to the format they were entered, this doesn't make the tables magically
analysable by machine, even under the assumption of perfect extraction (which
rarely holds in real life, at least in my experience).

So fundamentally, you need someone that can define the problem, parse the text
and information from the pdf, munge the data from its original form into
something useful, while writing some tests and procedures to verify and
quantify the quality/success of doing so.

I'm not saying that's sysadmin work or not (since job titles don't really mean
anything much to me), but it definitely sounds like developer/data science
work, and it's beyond the abilities of the non-tech crew...ymmv

------
crishoj
For the subset of PDFs that are invoices, there's the data extractor project
called invoice2data, which implements many of the ideas presented in the
article.

[https://github.com/invoice-x/invoice2data](https://github.com/invoice-x/invoice2data)

------
aphextim
This line confused me, "We are going to see that you do need developers for
this, but sysadmins."

I'm guessing it meant to read, "We are going to see that you do not need
developers for this, but sysadmins."

------
sonofgod
If you're looking for quick and easy, rather than an OSS solution,
[https://pdftables.com/](https://pdftables.com/) might be of interest.

(Disclaimer: I was working at the company when they were developing this)

------
abhishekjha
This is exactly how we do it for few of our clients. Opencv + Python is
provides a pretty successful solution for this.

------
dheera
Is Excel the new religion?

~~~
saagarjha
This joke would make more sense if the title was something like “How to
convert _from_ PDF to Excel”.

~~~
analogyexpert
That depends on whether PDF is the heathen religion, or the heathen.

