
Open-source Libraries for Working with Open XML Documents (docx, xlsx, pptx) - chx
https://github.com/OfficeDev/Open-Xml-Sdk
======
aquark
The actual docs for the sdk are at: [http://msdn.microsoft.com/en-
us/library/office/bb448854(v=of...](http://msdn.microsoft.com/en-
us/library/office/bb448854\(v=office.15\).aspx)

A link from the github page might be helpful!

~~~
spira
The content also looks to be open sourced on Github, too:
[https://github.com/OfficeDev/office-
content](https://github.com/OfficeDev/office-content)

------
FigBug
So far this looks Windows only. I'm currently using LibXL
([http://www.libxl.com/](http://www.libxl.com/)) to generate Excel documents.
Anybody know of any other cross platform alternatives?

~~~
kevingadd
I don't see any reason why it would be Windows only. All the code appears to
be spec-compliant C# that doesn't use P/Invoke to call native modules or rely
on C++/CLI. The only sticking point I see is the PowerShell script, and that's
pretty trivial to replace since it's just a build script (you could use
xbuild/msbuild, since mono supports that, or just use make)

~~~
clxl
[https://github.com/OfficeDev/Open-XML-
SDK/pull/3](https://github.com/OfficeDev/Open-XML-SDK/pull/3) suggests that
there are issues with use in mono

------
bithush
This is obviously great but I find it annoying the instructions ask your to
Set-ExecutionPolicy Unrestricted in order to run the PowerShell script. That
is just lazy.

~~~
Someone1234
The default is Restricted, so no scripts can run at all. So they need to tell
the user to change it to something.

Their choices are AllSigned (or RemoteSigned which is the same thing in this
context) or Unrestricted.

If they chose the AllSigned route and then sign it with a CA certificate (e.g.
Microsoft's Code Signing cert) that would work, but if the user ever made even
a one character change to the script the thing will break and it might be
unclear as to WHY. Plus you have to be careful not to lose your signature when
sending files via certain technologies.

Alternatively they could teach the user how to set up a local CA, make a key
pair set, and then install. But that too is going to break a lot as it needs
to be re-signed each time the script is updated, and obviously inter-machine
coding will be a PITA.

It is fairly standard practice on non-server developer machines to set it to
Unrestricted. It isn't really any more dangerous than for example BAT or VBS
scripts which have no such default restrictions.

You call it lazy, but the alternatives seem impractical and painful. There are
scenarios where you want to enforce code signing, I just don't really feel
like a developer machine is one of them.

~~~
toyg
MS screwed up with PowerShell policies. Because the default is so restrictive,
corporate sysadmins are worried about altering it to Unrestricted, and signing
each script is just impractical, so in practice lots of customers will just
refuse to enable it.

This is like a Linux distribution shipped OpenSSH disabled by default, and if
you enabled it, you'd have to sign every script you run or allow everyone to
run (almost) as root.

I'm not a fan of PS (coming from Python, PS syntax is just nuts), but it's the
most useful tool in the Windows world to automate tasks and deployments of all
sorts... or rather it would be, if the security model weren't so inane that
few people dare touching it.

~~~
Someone1234
I don't work for MS so consider this pure speculation:

In the past "bad guys" have used VBS to leverage a minor exploit into a full
compromise. For example, they might have the ability to write arbitrary
strings to the filesystem but not execute or run code. So they write out a
*.VBS into the StartUp folder, and when the user reboots they have gone from a
relatively minor entry into full remote control.

While VBS and BAT remain on Windows based PCs, having PS be locked down is a
little irrational. However I suspect Microsoft might be looking to the future
when one day they can remove both BAT and VBS (or force the user to
enable/install them manually) so that it becomes harder to use such things to
escalate your compromise.

Essentially they're trying to limit attack surface on 90%+ of consume grade
machines, as your average consumer will never run a PS script or even care
that they cannot. A lot of medium to large enterprises have an internal CA
(with the CA cert already on the machines for authenticating with AD) so
internally signing their network scripts is of lower cost (i.e. the
infrastructure is all in place, just a single command to sign which you can
automate with PS, unlike a solo developer who might not be aware of
certificate issues anyway and certainly won't be running their own CA).

Your post assumes that PS hasn't been hugely successful, which is definitely
not my experience. SysAdmins in the Windows worlds are all over PowerShell,
they love it, and if you go read something like SpiceWorks or any other
similar community 90% of the new automations are written in PS rather than
VBS/BAT (it helps that all of the MS tools in Server 2012 natively support
PS). I don't really think the default security restrictions are going to limit
PS's popularity, they might limit it for certain niche deployments (e.g. MAKE
scripts distributed over the internet) but not for internal use.

As far as you not liking PS, I understand, I really do. Most other scripting
languages use strings as their fundamental unit of work (e.g. pass strings
from process A to B, even over pipes it is all strings or file names which are
also strings). In PS the System.Object is the base unit, and everything above
that is also an object (it is classic OOP), so it really takes some getting
used to. There's a lot of inheritance in there.

But once you understand the underlying concepts involve, it is quite de-
mystified. However it really helps if you've come from a Java/.Net programming
background since even things like overloading isn't really something you'd run
across in a scripting language based largely on string transportation.

If you want to get started watch this workshop (you don't need to watch the
full 4 hours(!)):
[https://www.youtube.com/watch?v=-Ya1dQ1Igkc](https://www.youtube.com/watch?v=-Ya1dQ1Igkc)

~~~
toyg
This is the most common explanation, but tbh, it doesn't cut it: the consumer
market runs on Windows 7/8, whereas PowerShell is most useful on Server
2008/2012; so why can't they enable it by default on the latter?

 _> A lot of medium to large enterprises have an internal CA (with the CA cert
already on the machines for authenticating with AD)_

That's not my experience, but I guess we're both going by anecdotes here. (to
be precise, in the few cases where an internal CA is actually present, the
process to have anything signed is usually a bureaucratic nightmare)

My particular point of view is a vendor who comes in to deploy stuff on
customer servers and is told that Powershell should remain disabled (or be
enabled for installation purposes and then disabled again, which removes the
possibility of automated maintenance via PS). I guess it's a political choice
as much as a technical one, but that's what I've experienced; and the larger
the company, the most likely that this policy is not negotiable.

I don't disagree that PS has been successful; the Windows world was
_screaming_ for a half-decent shell and PS is exactly that. I just think that
it could have been _much_ more successful had MS chosen their defaults a bit
more carefully (and/or implemented a better security model).

On the syntax, what I don't like is not being object-based -- I don't
particularly like the "string everywhere" approach -- but rather the over-
reliance on special characters (that tends to make a PS script fairly
unreadable compared to Python). It also feels more of a bash-style world than
a scripting-language world, which I think is a step back from VBS/JS.

------
jwildeboer
I wonder why they call it Open XML. I thought it was called Offie Open XML
(OOXML) and it was standardized as ISO 29500.

As it seems, this project does NOT implement the ISO standard, instead it
works with the non-standard compliant versions that Microsoft Office produces.
Le Sigh.

------
fafner
Michael Meeks' (LibreOffice developer) response
[https://people.gnome.org/~michael/blog/2014-06-25-openxmlsdk...](https://people.gnome.org/~michael/blog/2014-06-25-openxmlsdk.html)

------
whitehat2k9
For Python there's also PyExcelerate for creating XLSX files:
[https://github.com/kz26/PyExcelerate](https://github.com/kz26/PyExcelerate)

Supposedly it's the fastest, even beating out xlsxwriter.

------
holloway
Alternative is my Docvert [http://github.com/holloway/docvert-
python3](http://github.com/holloway/docvert-python3) software... it works on
Linux/Windows/OSX though it's mostly for reading DOCX/DOC/ODT and not for
writing.

------
mirchiseth
Really no mention of docx4j
[http://www.docx4java.org/](http://www.docx4java.org/)

------
shmerl
C# only?

~~~
fleetfox
Can this be used with Mono?

~~~
shmerl
Not sure. I guess it's useful as a reference implementation (if it conforms to
the specification that is), but not beyond that. Actual editors and office
suites which could potentially use the code aren't written in C#.

------
jameskilton
Microsoft is making it _really_ hard to continue to blatantly dislike them.

~~~
pling
Spend some time with VSTO on the client and the OOXML SDK on the server and I
assure you that you'll still want to kill yourself. I just spent three months
doing this and it was akin to knawing your own limbs off. The Office VSTO API
is poorly designed, a mishmash of shims that barely work and are impossible to
debug. The documentation is awful and their support teams don't even know how
to fix problems. It's also absolutely painful making something work on Office
2097, 2010 and 2013. Its bad news when after a week you start wrapping all
entry points to stop managed exceptions crashing office.

Open sourcing stuff doesn't make it magically entirely pain free nor does it
make a quality product. This isn't a quality product and neither is the thing
that generates the documents you will need to parse (and clean all the
incongruous crap) from.

~~~
toyg
I know it's just a typo, but I hope our children will never have to suffer
"Office 2097"...

~~~
pling
Well spotted. I'll leave it in for posterity :)

However I expect our Surface 2095 will run Office 2097 and it'll still have
VBA jammed in due to some legacy clients moaning that they can't run their
financial app in it any more :)

~~~
jameshart
If it survives the VBA-pocalypse in 2030, that is, because it interprets dates
with two digit years ending in '/30' \- '/99' as referring to 1930-1999...

~~~
pling
Ah yes I'd forgotten about that. Good point!

