I remember Joel himself mentioning in some article about how Excel suddenly beca...

jcoffland · on Sept 11, 2016

> Isn't it true that even today, there is no viable competitor to MS Office even on non-Windows OS

I'm not sure what you mean by a viable competitor but LibreOffice works very well for my needs and it runs on Windows, OSX and Linux. It also does a sufficient job of reading and writing MS file formats. For that matter, Google Docs is a pretty good competitor as well.

I think a lot of people have come to accept MS dominance as a given and then put their heads in the sand. It's quite easy to get along just fine in business these days without Microsoft or Apple for that matter.

sz4kerto · on Sept 11, 2016

Try opening a large CSV file in Excel, GDocs, LibreOffice. GDocs dies around 50k rows, LO around a few 100k, Excel can cope with a million easily.

And yes, this is normal use, I don't want to use a database for doing a job that's 2 mins in Excel. We use GDocs at the company, but it's a toy (a nice toy though).

fn1 · on Sept 11, 2016

Try creating a UTF8 CSV-file that will open correctly in Excel.

Hint: It's not possible. My favourite chart about excel insufficiencies:

    Encoding  BOM      Win                            Mac
    --------  ---      ----------------------------   ------------
    utf-8              scrambled                      scrambled
    utf-8     BOM      WORKS                          scrambled
    utf-16             file not recognized            file not recognized
    utf-16    BOM      file not recognized            Chinese gibberish
    utf-16LE           file not recognized            file not recognized
    utf-16LE  BOM      characters OK,                 same as Win
                       row data all in first field

(https://stackoverflow.com/questions/6588068/which-encoding-o...)

glaberficken · on Sept 11, 2016

I found this unbelievable when I first came across it a few years ago.

More frustrating is when you need produce an UTF-8 CSV from excel. I always used open office calc for this as a workaround.

To import an UTF-8 csv to excel you just run the text import wizzard and specify the encoding.

fn1 · on Sept 11, 2016

Yes.

My relief in this area was discovering that most (western) non-software companies dealing will Excel-CSV know that encodings are a problem and many have settled on using CP-1252 instead of UTF-8.

jhanschoo · on Sept 11, 2016

Hi, you should check out my shell script gist that helps you with it. Remember to change the path to your LibreOffice executable https://gist.github.com/jhanschoo/bcd6d08d1e0f4b5d52735287f3...

user5994461 · on Sept 11, 2016

CSV is a poorly defined format. It doesn't have a standard. No wonder it screws up when you start varying character encoding.

The first practical lesson of CSV is CSV WILL fail miserably when changing character encoding, using non ASCII characters, having quoted string, varying end of line characters...

wolfgke · on Sept 12, 2016

> CSV is a poorly defined format. It doesn't have a standard.

Wrong: CSV is standardized in RFC 4180:

> https://tools.ietf.org/html/rfc4180

chinpokomon · on Sept 13, 2016

Ex post facto. It's been a "standard" long before the RFC, so there are many CSV documents which need to be read but don't confirm to the RFC spec. You should create a CSV writer that conforms, but your reader needs to support much more. Then if you save back to the RFC spec, the document might break other downstream readers which were written long ago and which don't follow the new RFC spec. It's an imperfect file format.

hermitdev · on Sept 12, 2016

Indeed. Differences in encoding or presence/lack of a BOM in UTF8 are a small set of issues. I've dealt with financial data from very large public banks and vendors that are horrible about vending proper CSV data. Either impromper or unescaped quotes (or delimiters). Takes weeks to get the data corrected, ao you manually correct the data or write a custom parser to try and navigate around it. One of my faves was a vendor that didn't escape the legal name of a company, so "Microsoft,Inc" would be 2 fields, rending most of that recors gibberish.

I generally agree with the idiom "If you write your own cav parser, you're doing it wrong", but sometimes you have no choice to work around the idiocy of others.

douche · on Sept 11, 2016

It is amazing how badly Excel handles CSV files. My favorite thing is where it will decide for some goddamn reason that a numeric or string column is actually supposed to be a date, or a currency, or some other random thing, and automatically format the column. Cue the calls from customers that don't know how to turn this "feature" off, when they export something as CSV and think it is corrupted.

xorcist · on Sept 11, 2016

This misfeature is also locale dependent. It's impossible to know how your customer's Excel will behave, unless you test every possible localization.

crispyambulance · on Sept 11, 2016

I don't know when it became "easy" to handle a million rows in excel, but I do know that some time ago that was intractable. As late as a few months ago, I recall trying to work with ~6 million rows in excel and it flaked out miserably.

For quick, ad-hoc tasks on huge files I recommend R/RStudio.

I also use Notepad++, of all things, for cleaning-up csv data because its "operating envelope" in terms of file-size and sheer number of files is unlike anything else. The regexp search/replace combined with zoomed-out views allow for easy clean-up of text data.

hermitdev · on Sept 12, 2016

Excel 2007. Prior it was impossible due to file format limits. Row limit was 64k, i think. Not sure what the Col limit was. Not sure what or if there limits are since the move to the xlsx format.

Despite doing Excel addin development in finance, i never hit a limit in post 2007 excel.

NetStrikeForce · on Sept 11, 2016

No joking, I find Sublime Text brilliant to work with humongous CSV files :-)

winteriscoming · on Sept 12, 2016

Which version? The latest 3.x version, that I use, almost stalls trying to load a file of relatively small size of 20 odd MBs.

NetStrikeForce · on Sept 12, 2016

With 2.x I've opened and edited with multiple cursors (my favourite feature) files around a GB in size. On Windows, if that makes a difference.

merb · on Sept 11, 2016

Basically thats a lie. Yes Office is good in many ways and especially ease of use. But opening a CSV file in Office > 2013 vs LibreOffice that contains more than 100k is ways more smooth in the newer LibreOffice version than in newer Office versions. If you would've written anything about UX or usages or Format interoperbility, rendering their own format etc than yes Office is a clear winner, but when it comes to big files and their resource usage the newer office versions are just bad.

sz4kerto · on Sept 11, 2016

> Basically thats a lie.

Well, that is how it worked for me, I used Excel 2016, the most recent LO (5?), and Google Spreadsheets. Around two months ago. I wanted to create a chart (histogram) from data in a CSV, with around 900k lines. Calling this a 'lie' is just bad manners, sorry.

reitanqild · on Sept 11, 2016

And for many of my use cases (although not all) LO/OO was way ahead of Word at times (formatting using styles comes to mind as a place where I used to prefer old OO for years and I also think LO/OO is less prone to misread basically anything as an American formatted date.)

merb · on Sept 11, 2016

Yeah if you have a document written in LO/OO, that's true however loading still some XLS files is just akward. I mean probably due to the spec and even microsoft sometimes had bugs in their program so that at some things the spec didn't helped. like some table bugs etc. But still I don't wanted to says XYZ is better overall or so. I just wanted to point out that resource usage and loading bigger files is definitly not the thing were office shines at least the newer versions and also don't expect office for mac to be good, it sucks.

reitanqild · on Sept 11, 2016

Back in 2005 ish I used to have access to both but still import to Writer to clean up after three other editors who didn't get (or care maybe) about styles, the export to Word again because Writer was missing some page numbering feature or something.

kweinber · on Sept 11, 2016

It is a common case in helpdesks I've collaborated with to fix corrupted Excel files by opening them in LibreOffice and resaving them to Excel. Also, Excel's practice of munging CSVs requires an initial read in LibreOffice sometimes and then saving to Excel when Excel simply can't handle the file format.

LibreOffice is slow but it is often more correct.

Update: Over the years, it seems to me that the MS formats are incented to be as incompatible as possible with other formats to lock folks into their format. It enables them to say that they are the most "compatible". . . but it really means the most "ability to read their own format" even though compatibility is better across the board with alternatives.

paldepind2 · on Sept 11, 2016

Have you tried Gnumeric? It's supposed to be fast at handling large files.

aw3c2 · on Sept 11, 2016

It is! Just imported a 3 million line csv with zero problems.

jenscow · on Sept 11, 2016

I agree. GD/LO spreadsheets are an appalling database engine.

I don't want to tell you how to do your own job, but if it's normal to handle millions of lines in a spreadsheet, you might be doing it wrong. (2 mins in excel; 2 seconds in a db)

tatoalo · on Sept 11, 2016

Excel can(at this day) handle 1,048,576 rows which is 2^20. If you really need to work on bigger DBs than that I'd recommend using Power Pivot(natively included in Excel 2016) which is supposed to handle way more than 2^20 rows.

pstuart · on Sept 11, 2016

It's not clear what you're doing with your data, but sqlite may be a viable option as well.

Using a million row sample from here:

https://blog.majestic.com/development/majestic-million-csv-d...

Taking a summary of the data:

  > cat csvtest.sql
  .mode csv
  .import majestic_million.csv rankings
  select TLD,count(TLD) as cnt from rankings group by TLD order by cnt desc limit 10;

  > time sqlite3 < csvtest.sql
  com,402802
  top,96734
  cn,72029
  net,55131
  org,45948
  gdn,40246
  ru,29746
  cc,27117
  de,19291
  uk,14971

  real	0m6.295s
  user	0m6.089s
  sys	0m0.156s

philipov · on Sept 11, 2016

Funny, I've always considered Excel (and all the others) equally bad at scaling because none of them can cope with csv files over 1 million rows. 50k or 1 million isn't a big difference. Where's the vi of spreedsheets for handling those 300 million row db dumps with a load error on line 253,452,624?

shakna · on Sept 11, 2016

CSV is just plain text, so vi can work... But as others have said, try it with Gnumeric.

fletchowns · on Sept 11, 2016

Million row Excel doc that you only need to do 2 mins worth of stuff to? I'm curious what your use case is, can it not be completely automated?

sz4kerto · on Sept 11, 2016

Taking a log file, grep, sed to extract some numbers and then create a graph that shows the distribution of those numbers (histogram). It was one-off, I could have used some command-line tools to do all of those but then I'd have spend an hour learning how gnuplot works, which I didn't have time for. I'm a developer, and that's exactly why my time is expensive -- automating everything doesn't always make sense.

xapata · on Sept 11, 2016

Depends on how often you do the task (and similar), of course. If it's frequently, then not learning a faster technique is getting stuck in a local optima.

hendzen · on Sept 12, 2016

For this type of task I usually spin up an ipython/jupyter notebook and hammer something out with Pandas very quickly. It's an incredibly productive toolkit for quickly performing this type of task.

continuational · on Sept 11, 2016

The task at hand could just be to answer a quick one-off question about the data. Not worth bothering the developers for that.

neverminder · on Sept 11, 2016

Indeed Libreoffice cannot handle large CSVs which is a shame, because otherwise it's an excellent tool.

jcoffland · on Sept 12, 2016

Gnumeric works better for large files. Also, Open-Source.

DigitalJack · on Sept 11, 2016

I've run into excel problems at 64k rows. It was a while ago, but this century at least.

ygra · on Sept 11, 2016

I think 2007 was the first version to support two million rows.

cm2187 · on Sept 11, 2016

For basic things yes. For more advanced features, I can't think of any competitor.

As a result office has pretty much stopped evolving the last 10 years. The only thing they are working on is making it more cross platform / web friendly. But it's pretty much a dead ecosystem.

geezerjay · on Sept 11, 2016

> For basic things yes. For more advanced features, I can't think of any competitor.

For features above the basic level, Microsoft Office is far from being an alternative as well. For example, try to reformat the figure numbering scheme, or add an appendix to an Microsoft Word document. Very basic stuff, and yet Microsoft Word fails to offer a sane option to do so.

Microsoft's offerings are only perceived as being above adequate because people don't know better.

Roritharr · on Sept 11, 2016

And what would be better?

ptaipale · on Sept 11, 2016

On one front, LaTeX. On another, e.g. Framemaker.

But as an overall documentation tool MS Word is hard to beat because you can still do most things, and it is ubiquitous.

In my experience, the worst problems with Word come out of company templates. Plain Word out of the box causes less surprises.

cm2187 · on Sept 11, 2016

Most of my troubles with word have to do with most of the properties of the items on the document being invisible. You are dealing with invisible page breaks, invisible margins, invisible objects, etc.

What would be really powerfull is a dual markup/WYSIWYG UI, like WPF in visual studio. Editing xml attributes on a markup is way more efficient than trying to guess on what you clicked and then going to menus and nested sub menus. But typing raw xml is not fun, you want to be able to type in the document directly. And you don't want to expose the markup to unsophisticated users. The markup should be way more defensive than a text editor. It should not allow you to place a node where that type of node is illegal. It should make autocomplete adds the mandatory features when typing a new node (like the closing tag, the mandatory attributes, etc). Otherwise we will spend our time dealing with corrupted documents.

Same for powerpoint, which in my opinion has pretty much converged with word. Most decks created are not on screen presentation but rather rich visual text documents, just text document with a page orientation that fits modern screens.

adrianratnapala · on Sept 11, 2016

What would be really powerfull is a dual markup/WYSIWYG UI, like WPF in visual studio

WordPerfect had this in the '80s. It's a conventional word processor, but you could press the "Reveal Codes" function key and get a split-screen. The lower half would show all the markup. You could see and delete a tag explictly, and its matching one would also go away. Though of course, the markup was not indented.

My mother, -- a secretary -- used it all the time. But when she met other secretaries who came from an MS-Word backround, they hated the very existence of the feature. I think the last few editions of WP removed it (in order to cargo cult Word?).

Then they went out of business.

ptaipale · on Sept 11, 2016

Word has had that for a long time. In modern ribbon-interface Word versions, it's in the Home-Paragraph section, top right icon ("paragraph mark"). Additionally you can/should toggle field code display with Alt-F9.

This has been in Word since, I think, Word for Windows 2.0 (circa 1992, for Windows 2.11). It might have also been in the previous DOS-based Word versions that co-existed with DOS-based WP (but since I used both, I can't always tell which features were in which).

adrianratnapala · on Sept 12, 2016

Do you mean those strange cryptic characters at the end of the paragraphs?

That is not a lot like reveal codes, or like a markup language. Though I agree they are handy.

ptaipale · on Sept 12, 2016

Maybe not markup language, but the idea is to show the markup. To get a paragraph break, you still hit Return and don't try to type that end-of-paragraph marker.

The field codes are more like an actual language; you can insert a field and then type the content in it.

wolfgke · on Sept 12, 2016

> Most of my troubles with word have to do with most of the properties of the items on the document being invisible. You are dealing with invisible page breaks, invisible margins, invisible objects, etc.

WordPerfect doesn't have this problem:

> https://en.wikipedia.org/w/index.php?title=WordPerfect&oldid...

"The Reveal Codes feature, imperfectly imitated much later in Microsoft Word, is a second editing screen that can be toggled open and closed at the bottom of the main editing screen. It was especially beloved of those who were faced with Microsoft Word, which had at the time no similar feature. The codes for formatting and locating text are displayed, interspersed with tags and the occasional objects, with the tags and objects represented by named tokens. The scheme makes it far easier to untangle problems than with styles-based word processors, and object tokens can be clicked with a pointing device to directly open the configuration editor for the particular object type, e.g. clicking on a style token brings up the style editor with the particular style type displayed."

pjmlp · on Sept 11, 2016

This is what commercial editors for Docbook and DITA allow for, yet you don't see many keen in embracing them.

cnvogel · on Sept 11, 2016

> In my experience, the worst problems with Word come out of company templates.

Oh, yes, brother. Especially as they tend to be created by the people with the least experience in using (even modestly) advanced features.

cm2187 · on Sept 11, 2016

I work for a major international corporation which official powerpoint template has hardcoded numbers on each slide, no auto-page numbering...

geezerjay · on Sept 11, 2016

> because you can still do most things, and it is ubiquitous.

That's also true for LibreOffice or other FLOSS office projects.

In fact, a copy of LibreOffice doesn't cost anyone between 100€ and 300€, or requires any subscription.

ptaipale · on Sept 11, 2016

Still, in the corporate world most people do not have LibreOffice, but they do have Microsoft Word.

And in very many cases they can't even install software themselves, as the computer is managed by the IT/IM organisation. The cost of license is negligible in this context; the actual cost comes from managing something non-standard.

Yes, you or me don't like it but it's the fact in many corporations, government organisations and even educational institutions.

tajen · on Sept 11, 2016

OTOH, I used to work as a govt contractor where they routinely give 1000-page software specifications with screenshots and non-unified titles. There was no safe way to save them, let alone work concurrently, so let's just say they're not the right tool at all for most routine tasks in large companies.

chris_wot · on Sept 11, 2016

What advanced things can Office do that LibreOffice can't?

mikekchar · on Sept 11, 2016

I don't know for sure, but I'm willing to bet that there are sophisticated mail merge features in Office that don't exist in LibreOffice. The other major killer feature that I'm aware of in Office is the track changes feature. There are work flows that absolutely depend on that for cooperative editing and sign-off.

You can do these things other ways, but once the workflow is entrenched it would be really hard to move people to something else. If you've never seen some of the insane mail merges that people do, you might think it would be trivial to implement, but alas it is not.

Having said those things, it's been quite a while since I touched a word processor (5 years or so). When I last used Libre Office there were some other obscure bugs/features that made me want to throw it out the window. The main one was the way it formats multi-lingual text. If you try to write text that switches between roman and asian fonts, good luck in keeping the baseline in place. I made an attempt to convince people to change the idiotic way it is done (by shrinking the font size!) but had no luck.

I'm not a fan of Office either for the record, but I can see why some people do not want to change.

chris_wot · on Sept 11, 2016

The mail merge functionality is constantly being enhanced. In 5.1 mail merge embedding was released - see here: https://archive.fosdem.org/2016/schedule/event/libreoffice_m...

You can track changes in LO also - starting in 4.4 it has been progressively getting better - see here:

https://wiki.documentfoundation.org/Track_changes

Curious about the font issue... I've never tried to use Asian and Roman glyphs on one line before. But multilayout is a bit of a mess and is constantly being worked on.

type0 · on Sept 11, 2016

> The other major killer feature that I'm aware of in Office is the track changes feature.

This is by far the worst "feature", it's a plague that management enforces on its employees mainly not knowing any better or just being plain ignorant. As you said this comes down to unwillingness to change and learn something new. In most cases using plain text format like asciidoc/asciidoctor with some version control system would be enormous improvement.

massysett · on Sept 11, 2016

I am a government attorney by day and a hobby programmer at night. There is absoloutely no way you are going to get any significant number of office workers to use asciidoc and a VCS.

It's hard to even get a lot of people trained in Word. The agency has training classes when a new version is installed--which is once every few years. People struggle with stuff like the ribbon. To use asciidoc and a VCS people would have to start from scratch training wise. Not gonna happen.

I routinely format sizable documents and have to collaborate on them. Word is an abomination and I hate it. From a technical standpoint other tools are far superior. But a bunch of office workers are not going to learn anything else. It would not be worth the effort. Since MS has no incentive to improve Word I grit my teeth and accept that I will have to deal with its garbage.

digi_owl · on Sept 11, 2016

My personal take is that people have difficulty with change in computing because computers are so damn good at irrecoverably destroying data.

coldtea · on Sept 11, 2016

Track changes is not just for "version control", and they don't want plain ascii text anyway.

newscracker · on Sept 11, 2016

I like LibreOffice because it's FLOSS and supports Open Document (formats). I use Calc for several of my personal spreadsheets. However, I personally do not like a few things with LibreOffice, which I think help the notion among people that MS dominance is a given and that solutions like LibreOffice are not good enough:

1. It does not support MS Office keybindings and its menu structure by default (not referring to the ribbon interface here). That's not a great thing for an application that would like more people from the MS Office camp to use it. As someone who's used MS Excel for a long time and learned several keyboard shortcuts, I still take more time to accomplish even simple things in LibreOffice and am learning different keyboard shortcuts to use Calc. Ideally, it shouldn't be this way. I don't understand what LibreOffice gains by adopting different keyboard shortcuts.

2. The UI (icons, menu fonts, colors, etc.) looks primitive and needs a redesign to keep up with the times (I've mentioned this in some other comments on HN).

3. It lags behind in feature set and ease of use compared to MS Excel (as one example, try creating, editing and changing charts in both). This is not really as big a deal for most users compared to the two points above.

wongarsu · on Sept 11, 2016

> It also does a sufficient job of reading and writing MS file formats.

Except for powerpoint presentations.

ue_ · on Sept 11, 2016

This may seem a minor gripe, but last time I used LibreOffice it didn't have a way to visually crop an image just like Word or Google Docs can, which is disappointing for the sort of work I do, especially as I'd like to only use Free Software to do it.

It means I need to spend more time cropping the image in an external program, and I can't adjust the dimensions later. It's a hassle.

davidgerard · on Sept 12, 2016

https://help.libreoffice.org/Common/Crop

blihp · on Sept 11, 2016

It's not that the file format was any better, it was just much simpler because in the DOS days you couldn't arbitrarily embed content from one application into another. Lotus' file format was just as proprietary as Microsoft's was. However, Excel had very good file importers, but also things like keystroke migration wizards etc. Microsoft invested a lot of time and money into doing everything they could to lower the transition barriers into Excel specifically for Lotus users. (and Office apps in general. They had similar efforts for WordPerfect->MS Word migration and others)

It was a fairly one-way trip in that even in the first versions of Excel, getting data back out was never as easy as getting it in was (i.e. the exporters weren't as complete as the importers... probably by design) and throughout the 90's Microsoft definitely built on and expanded the one-way nature of migration to their proprietary file formats as a key to lock-in. It wasn't so much that it was all that difficult as it was undocumented and always changing (there were multiple backwards incompatible file formats in each of the main MS Office apps in the 1990's). So if you were a customer or competitor, it wasn't so much that you couldn't reverse engineer the file format, it's that they were relatively undocumented and constantly changing so by the time you had done it, they were on to the next revision. I think it was a couple of years after the DoJ case, and a result of the settlement terms, that things really started improving on this front... but by then Microsoft had long since won. By that time most of their competitors had already figured out that the only way they could hope to compete with Microsoft was to bypass them entirely... hence the focus on web apps.

You mention web apps so I'll close with this: unless you're taking great pains to only use the open and cross-platform standards for them, in the not too distant future you're likely to find yourself in the exact same spot with your web apps that people were with desktop apps not too long ago. Different platform, same game...

mikekchar · on Sept 11, 2016

I wrote a bit about this in another place, but the key to understanding the situation is to understand that the file format does not actually tell you how to format the text. Especially in the early days when rendering a font would work completely differently in each application because you never knew what font metrics they were using.

It's easier to be second to market because you can copy the leader. In this case, if you write your formatter with the intention of importing documents from the leader, then you always have a target. If you then add features which the leader ignores, suddenly they have a much more difficult time to catch up -- their formatter is not purposely built to render your file format.

As I said in my other message, I've completely forgotten the Excel file format, but Word's format was originally just a binary dump of all their data structures. That's why it is insane. It's hard to read unless your code happens to be structured that way.

Anyway, I'm +1000 with you on the importance of open and cross-platform standards. Just think about all those stupid rails apps -- your data model looks exactly like your sql tables. Then you dump those same stupid tables into json and splat them over the wire for someone to try to make sense of. Deja vu all over again ;-) I suppose I should be happy it's not XML...

digi_owl · on Sept 11, 2016

Seems to pretty much sum up the MS MO ever since they got the DOS contract with IBM.

You could observe something similar with how they dislodged Novell from the office network, and how they responded to Netscape trying the same.

I worry that we are seeing similar tactics being deployed by other entities in an effort to grab control of the Linux ecosystem.

mwcampbell · on Sept 11, 2016

> I worry that we are seeing similar tactics being deployed by other entities in an effort to grab control of the Linux ecosystem.

Can you explain what you're talking about here?

digi_owl · on Sept 11, 2016

Typing everything into a single core middleware that changes too often to have its interfaces reliably cloned.

Thus whoever controls the agenda for the middleware controls the ecosystem (or have effectively forked it).

douche · on Sept 11, 2016

Lotus also got bought out by IBM, and quickly moved their product lines to life-support mode.

thr0waway1239 · on Sept 11, 2016

So I guess we both agree that the resulting complexity of the file format itself had a lot to do with an intention of lock-in. Which directly contradicts Joel's first statement: "are deliberately obfuscated". :-)

Also, regarding web apps, I like to think of it as the building, not of "perfect systems", but "self-correcting systems". The web, with all its chaos, it still more self-correcting than MS Office has been - in fact, even from your comment it does seem that the intention was only lock-in and not improvement of file formats (other than perhaps an improvement in MS bottom line).

Oddly, even the benefits to the MS bottom line now seems more short sighted. E.g. the ongoing fiasco that is the Windows 10 auto-upgrade. [1] Someday I expect to read post-mortems on the failure of Windows 10's so called perpetual upgrade policy (i.e the notion that there will be no more version numbers after 10) which link the failure to their refusal to build self-correcting systems.

[1] Personally, I don't think we have seen the worst of it just yet.

mikekchar · on Sept 11, 2016

I'm posting too much on this topic :-) Just a quick word from someone whose job it was to write filters for these file formats. They are not deliberately obfuscated. They are just the product of long lived legacy projects with short deadlines and poor process. It is absolutely obvious when you work with the formats that this is the case. Word in particular was clearly just a dump of the internal data structures in the app. It's not even that complicated -- the complicated bits come later and they are all for either backwards compatibility or for the fast save feature (or whatever it was called). The fast save, in particular, is complicated and is that way because it is append only. It is essentially a series of patches on the original data structures. Yes, it will make you want to rip your hair out, but it is very, very obvious why it's that way.

jimmaswell · on Sept 11, 2016

In the majority of cases, LibreOffice is a viable free replacement. There are onky a few missing features that you probably won't need. The onlu time that happened to me was trying to do a certain kind of regression curve that was only in Mictosoft Excel, for a physics class.

gbog · on Sept 11, 2016

I would really love to agree because I hate ms so much, but sadly it's false: every time I use libreoffice, it's because I need to open a doc or ppt sent to me, and the layout is messed it's barely readable

rockooooo · on Sept 11, 2016

Key features including a decent UI? I hate using LibreOffice for this reason alone.

blahi · on Sept 11, 2016

If libre office/google docs fits your needs, you are artificially limiting your needs*. Probably not even realizing it.

Unless what you do can be done with a calculator (barebones basic stuff).

sirtaj · on Sept 11, 2016

This argument extends easily - you're artificially limiting your needs by using Office, when you could be using ScipPy/Jupyter/IPython/Pandas.

wolfgke · on Sept 11, 2016

> when you could be using ScipPy/Jupyter/IPython/Pandas

Who says that the HN audience doesn't?

sirtaj · on Sept 11, 2016

I'm sure there are plenty of people here who use them. I'm responding to the person who claims libreoffice et al are artificially limited. My implicit argument is that the vast majority of people who use spreadsheet programs use them for list-keeping and simple analytics/reporting. For more complex uses, Excel is beaten by dedicated analytics tools.

blahi · on Sept 11, 2016

Spreadsheets serve different needs than the tools you mentioned.

sirtaj · on Sept 11, 2016

What needs are those? Which ones can't be served by the tools I mentioned?

blahi · on Sept 11, 2016

Planning. Oftentimes all you need is to calculate some columns and play with numbers and see how the calculations change. The same things that make Excel bad for robust statistical modeling, make it good for all sorts of planning activities.

jimmaswell · on Sept 11, 2016

What more features am I supposed to be using when I write essays and plot small line charts?

thr0waway1239 · on Sept 11, 2016

The discussions about the suitability of Google Docs and LibreOffice makes me wonder - is there a benchmark website for file format compatibility verification? If not it may be a very useful to create one and just host it on, say, GitHub pages.

We could upload complex, but reasonably common, documents in various file formats for corresponding softwares, and create easy to read checklists of the different failure modes when trying to open the file in a different software from one in which it was authored.

mike · on Sept 11, 2016

The post you're thinking of:

Strategy Letter III: Let Me Go Back! http://www.joelonsoftware.com/articles/fog0000000052.html