

Automating Microsoft Office with Python - mataug
http://nbviewer.ipython.org/urls/raw.github.com/sanand0/ipython-notebooks/master/Office.ipynb

======
unlimitedbacon
Years ago, at my first internship, we were assigned to copy fields of
information out of hundreds of PDF files and compile them into a single
spreadsheet. While my coworker began dutifully trudging away, I started
googling and found out about the COM API and the Python library. I spent the
rest of my time writing a script to automate the task.

The silly and wonderful thing about the Office API is that it doesn't work
silently, but instead performs the operations on screen as if they were being
done by a human being. The article explains this. There's something very
satisfying about seeing the reaction of your boss as he watches the computer
do the work on its own, as if its being operated by a ghost. Now I know what
it must have been like to see a player piano a hundred years ago.

~~~
agumonkey
In VBA there's a command to toggle screen redrawing, isn't it exposed through
COM ? very impactful on performance, although I too like to witness high speed
visual feedback of automated processing. The social implications are
surprising too.

~~~
pge
One of the first things i learned about writing VBA macros was to start every
one with application.screenupdating = false. It was amazing how much of a
performance hit the screen updating was.

~~~
agumonkey
I remember at least three "mandatory" statement to wrap any heavy script.
Actually I don't remember them, just some fuzzy count.

ps: I don't know how Excel is structured these days but up until 2003 there
was zero multicore code, and VBA is not the fastest system around, so any way
you can trim the workload is good.

------
qq66
At LiveLoop we've gone insanely deep into COM to build real-time collaboration
for PowerPoint, and while the interface is powerful there are mysteries at
every level. These presentations (especially the last) are very impressive ion
light of that.

~~~
josteink
_At LiveLoop we 've gone insanely deep into COM_

I can say this too and we do stuff not even Microsoft thought was possible.

 _while the interface is powerful there are mysteries at every level._

The interfaces are indeed powerful (not to leave out annoyingly inconsistent).
That said, there are only mysteries when you go oh so far. If play by a few
safe rules, you should be mostly safe unless you have very long lived
processes which starts mixing COM code with other code digging beneath the API
for things which isn't otherwise possible.

It can be argued that both the best and the worst part of the .NET Framework
is how it makes COM interop seem so utterly seamless. It's bound to land you
in a trap sooner or later.

------
bryogenic
I'd be really interested in this if there were a way to get Python to render a
ppt into a pdf. Basically, the goal would be to programmatically add a unique
watermark to each ppt, render the pdf, and save it. Doing this for 100+ things
is not worth the time but if Python can script it up it might be worth
investigating. Looks like I need to go read the COM api a bit.

~~~
harrytuttle

       wordInstance.ActiveDocument.SaveAs(filename + ".pdf", 17)
    

17 is the magic number for PDF output. There is a constant in VSTO and COM for
it but I can't remember what it is. Need Office 2007 with save as pdf addon
installed or vanilla 2010/2013.

People always say that APIs are the future and Microsoft are dead etc but
Microsoft stuff is usually chock full of extensibility and knee deep in
documentation. To give you an idea, it took me merely two days to extend word
2010 via VSTO with a fully GUI and ribbon integrated semantic template engine
that allows documents to be bound to our public API and rendered server side
or client side online and offline.

Do that in Google Apps? Yeah right...

also my solution will still be supported until 2023!

~~~
bliti
I did a simple application for a client that was a simple JSON-to-Excel dump.
It was a breeze. Took me about a day to do it. From zero to finished.

------
scorpion032
Just as interesting are some of the visualisations, the author (of this ipnb)
has done for some of the media houses in India. Done using powerpoint, but for
a print publication, as good as it could be.

(which are linked within this ipynb.)

------
nhebb
In case anyone is interested, pg wrote the SpamBayes Outlook add-in in Python
about 10 years ago:
[http://spambayes.sourceforge.net/](http://spambayes.sourceforge.net/)

------
regularfry
Tangentially, I'm convinced there's a worthwhile (small) business opportunity
in hooking up py.test to Excel to Fix The Spreadsheet Problem(tm). Not
something I can iterate on right now, though.

~~~
clebio
What is The Spreadsheet Problem?

~~~
regularfry
Big, important, expensive problems being "solved" by non-programmers without a
validation safety-net.

------
noisy_boy
For PowerPoint 2007, you need to add the below line (like in the example for
Excel) for the window to be visible:

    
    
      Application.Visible = 1

------
mataug
This was a talk at PyconIndia 2013

~~~
ndefinite
Do you know where to find the recording? Converting macros to python sounds
extremely useful

~~~
mataug
The recording should be available in a few days, check the mailing list, But
the author didn't go into the details of how he converted macros to python.

------
vardhanvarma
There is a open sourced screen reader called NVDA, which really puts this to
good use.

------
redman25
Unfortunately win32 API is Windows specific.

------
woodchuck64
Misread this as "Automating Microsoft with Python"

Knew Ballmer let the company down but didn't think it was that far.

