

Microsoft releases specifications for binary formats - mqt
http://www.microsoft.com/interop/docs/OfficeBinaryFormats.mspx

======
juanpablo
" _Microsoft may have patents, patent applications, trademarks, copyrights, or
other intellectual property rights covering subject matter in these materials.
Except as expressly provided in the Microsoft Open Specification Promise and
this notice, the furnishing of these materials does not give you any license
to these patents, trademarks, copyrights, or other intellectual property._ "

It's better than nothing but it's still a dangerous format to use.

~~~
DocSavage
If you look at the Microsoft Open Specification Promise it says:

"Microsoft irrevocably promises not to assert any Microsoft Necessary Claims
against you for making, using, selling, offering for sale, importing or
distributing any implementation to the extent it conforms to a Covered
Specification (“Covered Implementation”), subject to the following..."
[Doesn't cover non-MSoft patents. Not surprising.]

Office Binary formats are under Covered Specifications.

I wouldn't use the format, but not sure it's "dangerous" to write apps that
read or generate the format.

------
ctkrohn
Awesome. I'm a junior trader at a major investment bank and I have to deal
with Excel all the time... it would be great to be able to programmatically
generate well-formed Excel files without having to deal with VBA, COM
automation, or anything else. Now someone just has to write a nice Haskell
library...

~~~
henning
VB.NET (which now has static metaprogramming and closures) + COM automation
isn't too bad in my experience. You could also use Python for COM.

(I've found that when you set the application to be visible and each command
is carried out visually, it's very impressive to non-programmers, especially
ones who don't use macros.)

Other than that, <http://poi.apache.org/> and a marginal language that targets
the JVM is probably your best bet for now.

~~~
wallflower
Poi has worked well for our projects where we have to export customer data to
XLS (a commonly-requested feature that is almost like a checklist item)

I find the file system within a file (OLE2 compound document) fascinating. I
wonder who at Microsoft came up with that idea (or was it really an idea by
technical committee)

------
vegashacker
The Word spec is 210 pages. Yikes! I wonder what kind of open-source tools
these specs will spawn.

Related: anyone know how Scribd and folks like that read/display Microsoft-
formatted documents?

~~~
marcus
It's a wild guess but I would have used COM access to the word.dll to convert
it to a more reasonable format.

Again, a wild guess but that is how I would have done it, trying to reverse
engineer formats as bloated as the office formats is generally not a good idea
if avoidable.

~~~
llimllib
"Antiword is a free MS Word reader for Linux and RISC OS. There are ports to
FreeBSD, BeOS, OS/2, Mac OS X, Amiga, VMS, NetWare, Plan9, EPOC, Zaurus PDA,
MorphOS, Tru64/OSF and DOS. Antiword converts the binary files from Word 2, 6,
7, 97, 2000, 2002 and 2003 to plain text and to PostScript."

\- <http://www.winfield.demon.nl/>

~~~
marcus
Yeah, that'll work too. The point is that you need to leverage someone else's
work to do it. Focus on your core, find shortcuts for everything else.

~~~
boucher
Well, given the release of these documents, as well as the existence of the
Office Open XML format, there's nothing left to reverse engineer.

Granted, its no picnic implementing the specs these documents outline, but its
certainly better than having to figure it all out from a binary file.

~~~
marcus
Well it is a picnic, a picnic in the park, Jurassic park that is.

------
llimllib
Has anybody with word parsing experience read these, that can speak to their
level of detail?

Was this release prompted by any legal decision that anybody knows about?

~~~
marcus
EU antitrust litigation involving MS Office lack of interoperability , they
were already fined 613M USD.

