I remember Joel himself mentioning in some article about how Excel suddenly became dominant over a competitor (Lotus spreadsheets I think) because you could not only open Lotus files in Excel, but you could also save the Excel back into Lotus format (two way compatibility). I am supposing this is because Lotus file format was not complete crap.
If that is true, then isn't it also very possible that MS had an incentive to keep building on top of its cruddy file format without rethinking the design at any point? Isn't it true that even today, there is no viable competitor to MS Office even on non-Windows OS because MS has made file format compatibility extremely difficult?
I am very glad that the latest and greatest features of Office automation after (and including) Office 2007 were mostly still-born and we could stop worrying about this issue and use MS Office as a tool and not as a "platform". It became too unwieldy even for hardcore MS supporters, and even though no good alternatives emerged, people generally decided that the programmability of MS Office had reached its limits - and moved away to other things like replacing them with browser based apps [1] I remember creating an Infopath "app" as a contractor around 2010, and wondering "what is this crap? Have these people never seen browser based apps?"
[1] This is probably a sweeping statement. I would love to hear from someone who is certain that Office automation was a superior technology for a particular use case.
> Isn't it true that even today, there is no viable competitor to MS Office even on non-Windows OS
I'm not sure what you mean by a viable competitor but LibreOffice works very well for my needs and it runs on Windows, OSX and Linux. It also does a sufficient job of reading and writing MS file formats. For that matter, Google Docs is a pretty good competitor as well.
I think a lot of people have come to accept MS dominance as a given and then put their heads in the sand. It's quite easy to get along just fine in business these days without Microsoft or Apple for that matter.
Try opening a large CSV file in Excel, GDocs, LibreOffice. GDocs dies around 50k rows, LO around a few 100k, Excel can cope with a million easily.
And yes, this is normal use, I don't want to use a database for doing a job that's 2 mins in Excel. We use GDocs at the company, but it's a toy (a nice toy though).
Try creating a UTF8 CSV-file that will open correctly in Excel.
Hint: It's not possible. My favourite chart about excel insufficiencies:
Encoding BOM Win Mac
-------- --- ---------------------------- ------------
utf-8 scrambled scrambled
utf-8 BOM WORKS scrambled
utf-16 file not recognized file not recognized
utf-16 BOM file not recognized Chinese gibberish
utf-16LE file not recognized file not recognized
utf-16LE BOM characters OK, same as Win
row data all in first field
My relief in this area was discovering that most (western) non-software companies dealing will Excel-CSV know that encodings are a problem and many have settled on using CP-1252 instead of UTF-8.
CSV is a poorly defined format. It doesn't have a standard. No wonder it screws up when you start varying character encoding.
The first practical lesson of CSV is CSV WILL fail miserably when changing character encoding, using non ASCII characters, having quoted string, varying end of line characters...
Ex post facto. It's been a "standard" long before the RFC, so there are many CSV documents which need to be read but don't confirm to the RFC spec. You should create a CSV writer that conforms, but your reader needs to support much more. Then if you save back to the RFC spec, the document might break other downstream readers which were written long ago and which don't follow the new RFC spec. It's an imperfect file format.
Indeed. Differences in encoding or presence/lack of a BOM in UTF8 are a small set of issues. I've dealt with financial data from very large public banks and vendors that are horrible about vending proper CSV data. Either impromper or unescaped quotes (or delimiters). Takes weeks to get the data corrected, ao you manually correct the data or write a custom parser to try and navigate around it. One of my faves was a vendor that didn't escape the legal name of a company, so "Microsoft,Inc" would be 2 fields, rending most of that recors gibberish.
I generally agree with the idiom "If you write your own cav parser, you're doing it wrong", but sometimes you have no choice to work around the idiocy of others.
It is amazing how badly Excel handles CSV files. My favorite thing is where it will decide for some goddamn reason that a numeric or string column is actually supposed to be a date, or a currency, or some other random thing, and automatically format the column. Cue the calls from customers that don't know how to turn this "feature" off, when they export something as CSV and think it is corrupted.
I don't know when it became "easy" to handle a million rows in excel, but I do know that some time ago that was intractable. As late as a few months ago, I recall trying to work with ~6 million rows in excel and it flaked out miserably.
For quick, ad-hoc tasks on huge files I recommend R/RStudio.
I also use Notepad++, of all things, for cleaning-up csv data because its "operating envelope" in terms of file-size and sheer number of files is unlike anything else. The regexp search/replace combined with zoomed-out views allow for easy clean-up of text data.
Excel 2007. Prior it was impossible due to file format limits. Row limit was 64k, i think. Not sure what the Col limit was. Not sure what or if there limits are since the move to the xlsx format.
Despite doing Excel addin development in finance, i never hit a limit in post 2007 excel.
Basically thats a lie.
Yes Office is good in many ways and especially ease of use.
But opening a CSV file in Office > 2013 vs LibreOffice that contains more than 100k is ways more smooth in the newer LibreOffice version than in newer Office versions.
If you would've written anything about UX or usages or Format interoperbility, rendering their own format etc than yes Office is a clear winner, but when it comes to big files and their resource usage the newer office versions are just bad.
Well, that is how it worked for me, I used Excel 2016, the most recent LO (5?), and Google Spreadsheets. Around two months ago. I wanted to create a chart (histogram) from data in a CSV, with around 900k lines. Calling this a 'lie' is just bad manners, sorry.
And for many of my use cases (although not all) LO/OO was way ahead of Word at times (formatting using styles comes to mind as a place where I used to prefer old OO for years and I also think LO/OO is less prone to misread basically anything as an American formatted date.)
Yeah if you have a document written in LO/OO, that's true however loading still some XLS files is just akward. I mean probably due to the spec and even microsoft sometimes had bugs in their program so that at some things the spec didn't helped. like some table bugs etc.
But still I don't wanted to says XYZ is better overall or so.
I just wanted to point out that resource usage and loading bigger files is definitly not the thing were office shines at least the newer versions and also don't expect office for mac to be good, it sucks.
Back in 2005 ish I used to have access to both but still import to Writer to clean up after three other editors who didn't get (or care maybe) about styles, the export to Word again because Writer was missing some page numbering feature or something.
It is a common case in helpdesks I've collaborated with to fix corrupted Excel files by opening them in LibreOffice and resaving them to Excel. Also, Excel's practice of munging CSVs requires an initial read in LibreOffice sometimes and then saving to Excel when Excel simply can't handle the file format.
LibreOffice is slow but it is often more correct.
Update: Over the years, it seems to me that the MS formats are incented to be as incompatible as possible with other formats to lock folks into their format. It enables them to say that they are the most "compatible". . . but it really means the most "ability to read their own format" even though compatibility is better across the board with alternatives.
I agree. GD/LO spreadsheets are an appalling database engine.
I don't want to tell you how to do your own job, but if it's normal to handle millions of lines in a spreadsheet, you might be doing it wrong. (2 mins in excel; 2 seconds in a db)
Excel can(at this day) handle 1,048,576 rows which is 2^20.
If you really need to work on bigger DBs than that I'd recommend using Power Pivot(natively included in Excel 2016) which is supposed to handle way more than 2^20 rows.
> cat csvtest.sql
.mode csv
.import majestic_million.csv rankings
select TLD,count(TLD) as cnt from rankings group by TLD order by cnt desc limit 10;
> time sqlite3 < csvtest.sql
com,402802
top,96734
cn,72029
net,55131
org,45948
gdn,40246
ru,29746
cc,27117
de,19291
uk,14971
real 0m6.295s
user 0m6.089s
sys 0m0.156s
Funny, I've always considered Excel (and all the others) equally bad at scaling because none of them can cope with csv files over 1 million rows. 50k or 1 million isn't a big difference. Where's the vi of spreedsheets for handling those 300 million row db dumps with a load error on line 253,452,624?
Taking a log file, grep, sed to extract some numbers and then create a graph that shows the distribution of those numbers (histogram). It was one-off, I could have used some command-line tools to do all of those but then I'd have spend an hour learning how gnuplot works, which I didn't have time for. I'm a developer, and that's exactly why my time is expensive -- automating everything doesn't always make sense.
Depends on how often you do the task (and similar), of course. If it's frequently, then not learning a faster technique is getting stuck in a local optima.
For this type of task I usually spin up an ipython/jupyter notebook and hammer something out with Pandas very quickly. It's an incredibly productive toolkit for quickly performing this type of task.
For basic things yes. For more advanced features, I can't think of any competitor.
As a result office has pretty much stopped evolving the last 10 years. The only thing they are working on is making it more cross platform / web friendly. But it's pretty much a dead ecosystem.
> For basic things yes. For more advanced features, I can't think of any competitor.
For features above the basic level, Microsoft Office is far from being an alternative as well. For example, try to reformat the figure numbering scheme, or add an appendix to an Microsoft Word document. Very basic stuff, and yet Microsoft Word fails to offer a sane option to do so.
Microsoft's offerings are only perceived as being above adequate because people don't know better.
Most of my troubles with word have to do with most of the properties of the items on the document being invisible. You are dealing with invisible page breaks, invisible margins, invisible objects, etc.
What would be really powerfull is a dual markup/WYSIWYG UI, like WPF in visual studio. Editing xml attributes on a markup is way more efficient than trying to guess on what you clicked and then going to menus and nested sub menus. But typing raw xml is not fun, you want to be able to type in the document directly. And you don't want to expose the markup to unsophisticated users. The markup should be way more defensive than a text editor. It should not allow you to place a node where that type of node is illegal. It should make autocomplete adds the mandatory features when typing a new node (like the closing tag, the mandatory attributes, etc). Otherwise we will spend our time dealing with corrupted documents.
Same for powerpoint, which in my opinion has pretty much converged with word. Most decks created are not on screen presentation but rather rich visual text documents, just text document with a page orientation that fits modern screens.
What would be really powerfull is a dual markup/WYSIWYG UI, like WPF in visual studio
WordPerfect had this in the '80s. It's a conventional word processor, but you could press the "Reveal Codes" function key and get a split-screen. The lower half would show all the markup. You could see and delete a tag explictly, and its matching one would also go away. Though of course, the markup was not indented.
My mother, -- a secretary -- used it all the time. But when she met other secretaries who came from an MS-Word backround, they hated the very existence of the feature. I think the last few editions of WP removed it (in order to cargo cult Word?).
Word has had that for a long time. In modern ribbon-interface Word versions, it's in the Home-Paragraph section, top right icon ("paragraph mark"). Additionally you can/should toggle field code display with Alt-F9.
This has been in Word since, I think, Word for Windows 2.0 (circa 1992, for Windows 2.11). It might have also been in the previous DOS-based Word versions that co-existed with DOS-based WP (but since I used both, I can't always tell which features were in which).
Maybe not markup language, but the idea is to show the markup. To get a paragraph break, you still hit Return and don't try to type that end-of-paragraph marker.
The field codes are more like an actual language; you can insert a field and then type the content in it.
> Most of my troubles with word have to do with most of the properties of the items on the document being invisible. You are dealing with invisible page breaks, invisible margins, invisible objects, etc.
"The Reveal Codes feature, imperfectly imitated much later in Microsoft Word, is a second editing screen that can be toggled open and closed at the bottom of the main editing screen. It was especially beloved of those who were faced with Microsoft Word, which had at the time no similar feature. The codes for formatting and locating text are displayed, interspersed with tags and the occasional objects, with the tags and objects represented by named tokens. The scheme makes it far easier to untangle problems than with styles-based word processors, and object tokens can be clicked with a pointing device to directly open the configuration editor for the particular object type, e.g. clicking on a style token brings up the style editor with the particular style type displayed."
Still, in the corporate world most people do not have LibreOffice, but they do have Microsoft Word.
And in very many cases they can't even install software themselves, as the computer is managed by the IT/IM organisation. The cost of license is negligible in this context; the actual cost comes from managing something non-standard.
Yes, you or me don't like it but it's the fact in many corporations, government organisations and even educational institutions.
OTOH, I used to work as a govt contractor where they routinely give 1000-page software specifications with screenshots and non-unified titles. There was no safe way to save them, let alone work concurrently, so let's just say they're not the right tool at all for most routine tasks in large companies.
I don't know for sure, but I'm willing to bet that there are sophisticated mail merge features in Office that don't exist in LibreOffice. The other major killer feature that I'm aware of in Office is the track changes feature. There are work flows that absolutely depend on that for cooperative editing and sign-off.
You can do these things other ways, but once the workflow is entrenched it would be really hard to move people to something else. If you've never seen some of the insane mail merges that people do, you might think it would be trivial to implement, but alas it is not.
Having said those things, it's been quite a while since I touched a word processor (5 years or so). When I last used Libre Office there were some other obscure bugs/features that made me want to throw it out the window. The main one was the way it formats multi-lingual text. If you try to write text that switches between roman and asian fonts, good luck in keeping the baseline in place. I made an attempt to convince people to change the idiotic way it is done (by shrinking the font size!) but had no luck.
I'm not a fan of Office either for the record, but I can see why some people do not want to change.
Curious about the font issue... I've never tried to use Asian and Roman glyphs on one line before. But multilayout is a bit of a mess and is constantly being worked on.
> The other major killer feature that I'm aware of in Office is the track changes feature.
This is by far the worst "feature", it's a plague that management enforces on its employees mainly not knowing any better or just being plain ignorant. As you said this comes down to unwillingness to change and learn something new. In most cases using plain text format like asciidoc/asciidoctor with some version control system would be enormous improvement.
I am a government attorney by day and a hobby programmer at night. There is absoloutely no way you are going to get any significant number of office workers to use asciidoc and a VCS.
It's hard to even get a lot of people trained in Word. The agency has training classes when a new version is installed--which is once every few years. People struggle with stuff like the ribbon. To use asciidoc and a VCS people would have to start from scratch training wise. Not gonna happen.
I routinely format sizable documents and have to collaborate on them. Word is an abomination and I hate it. From a technical standpoint other tools are far superior. But a bunch of office workers are not going to learn anything else. It would not be worth the effort. Since MS has no incentive to improve Word I grit my teeth and accept that I will have to deal with its garbage.
I like LibreOffice because it's FLOSS and supports Open Document (formats). I use Calc for several of my personal spreadsheets. However, I personally do not like a few things with LibreOffice, which I think help the notion among people that MS dominance is a given and that solutions like LibreOffice are not good enough:
1. It does not support MS Office keybindings and its menu structure by default (not referring to the ribbon interface here). That's not a great thing for an application that would like more people from the MS Office camp to use it. As someone who's used MS Excel for a long time and learned several keyboard shortcuts, I still take more time to accomplish even simple things in LibreOffice and am learning different keyboard shortcuts to use Calc. Ideally, it shouldn't be this way. I don't understand what LibreOffice gains by adopting different keyboard shortcuts.
2. The UI (icons, menu fonts, colors, etc.) looks primitive and needs a redesign to keep up with the times (I've mentioned this in some other comments on HN).
3. It lags behind in feature set and ease of use compared to MS Excel (as one example, try creating, editing and changing charts in both). This is not really as big a deal for most users compared to the two points above.
This may seem a minor gripe, but last time I used LibreOffice it didn't have a way to visually crop an image just like Word or Google Docs can, which is disappointing for the sort of work I do, especially as I'd like to only use Free Software to do it.
It means I need to spend more time cropping the image in an external program, and I can't adjust the dimensions later. It's a hassle.
It's not that the file format was any better, it was just much simpler because in the DOS days you couldn't arbitrarily embed content from one application into another. Lotus' file format was just as proprietary as Microsoft's was. However, Excel had very good file importers, but also things like keystroke migration wizards etc. Microsoft invested a lot of time and money into doing everything they could to lower the transition barriers into Excel specifically for Lotus users. (and Office apps in general. They had similar efforts for WordPerfect->MS Word migration and others)
It was a fairly one-way trip in that even in the first versions of Excel, getting data back out was never as easy as getting it in was (i.e. the exporters weren't as complete as the importers... probably by design) and throughout the 90's Microsoft definitely built on and expanded the one-way nature of migration to their proprietary file formats as a key to lock-in. It wasn't so much that it was all that difficult as it was undocumented and always changing (there were multiple backwards incompatible file formats in each of the main MS Office apps in the 1990's). So if you were a customer or competitor, it wasn't so much that you couldn't reverse engineer the file format, it's that they were relatively undocumented and constantly changing so by the time you had done it, they were on to the next revision. I think it was a couple of years after the DoJ case, and a result of the settlement terms, that things really started improving on this front... but by then Microsoft had long since won. By that time most of their competitors had already figured out that the only way they could hope to compete with Microsoft was to bypass them entirely... hence the focus on web apps.
You mention web apps so I'll close with this: unless you're taking great pains to only use the open and cross-platform standards for them, in the not too distant future you're likely to find yourself in the exact same spot with your web apps that people were with desktop apps not too long ago. Different platform, same game...
I wrote a bit about this in another place, but the key to understanding the situation is to understand that the file format does not actually tell you how to format the text. Especially in the early days when rendering a font would work completely differently in each application because you never knew what font metrics they were using.
It's easier to be second to market because you can copy the leader. In this case, if you write your formatter with the intention of importing documents from the leader, then you always have a target. If you then add features which the leader ignores, suddenly they have a much more difficult time to catch up -- their formatter is not purposely built to render your file format.
As I said in my other message, I've completely forgotten the Excel file format, but Word's format was originally just a binary dump of all their data structures. That's why it is insane. It's hard to read unless your code happens to be structured that way.
Anyway, I'm +1000 with you on the importance of open and cross-platform standards. Just think about all those stupid rails apps -- your data model looks exactly like your sql tables. Then you dump those same stupid tables into json and splat them over the wire for someone to try to make sense of. Deja vu all over again ;-) I suppose I should be happy it's not XML...
So I guess we both agree that the resulting complexity of the file format itself had a lot to do with an intention of lock-in. Which directly contradicts Joel's first statement: "are deliberately obfuscated". :-)
Also, regarding web apps, I like to think of it as the building, not of "perfect systems", but "self-correcting systems". The web, with all its chaos, it still more self-correcting than MS Office has been - in fact, even from your comment it does seem that the intention was only lock-in and not improvement of file formats (other than perhaps an improvement in MS bottom line).
Oddly, even the benefits to the MS bottom line now seems more short sighted. E.g. the ongoing fiasco that is the Windows 10 auto-upgrade. [1] Someday I expect to read post-mortems on the failure of Windows 10's so called perpetual upgrade policy (i.e the notion that there will be no more version numbers after 10) which link the failure to their refusal to build self-correcting systems.
[1] Personally, I don't think we have seen the worst of it just yet.
I'm posting too much on this topic :-) Just a quick word from someone whose job it was to write filters for these file formats. They are not deliberately obfuscated. They are just the product of long lived legacy projects with short deadlines and poor process. It is absolutely obvious when you work with the formats that this is the case. Word in particular was clearly just a dump of the internal data structures in the app. It's not even that complicated -- the complicated bits come later and they are all for either backwards compatibility or for the fast save feature (or whatever it was called). The fast save, in particular, is complicated and is that way because it is append only. It is essentially a series of patches on the original data structures. Yes, it will make you want to rip your hair out, but it is very, very obvious why it's that way.
In the majority of cases, LibreOffice is a viable free replacement. There are onky a few missing features that you probably won't need. The onlu time that happened to me was trying to do a certain kind of regression curve that was only in Mictosoft Excel, for a physics class.
I would really love to agree because I hate ms so much, but sadly it's false: every time I use libreoffice, it's because I need to open a doc or ppt sent to me, and the layout is messed it's barely readable
I'm sure there are plenty of people here who use them. I'm responding to the person who claims libreoffice et al are artificially limited. My implicit argument is that the vast majority of people who use spreadsheet programs use them for list-keeping and simple analytics/reporting. For more complex uses, Excel is beaten by dedicated analytics tools.
Planning. Oftentimes all you need is to calculate some columns and play with numbers and see how the calculations change. The same things that make Excel bad for robust statistical modeling, make it good for all sorts of planning activities.
The discussions about the suitability of Google Docs and LibreOffice makes me wonder - is there a benchmark website for file format compatibility verification? If not it may be a very useful to create one and just host it on, say, GitHub pages.
We could upload complex, but reasonably common, documents in various file formats for corresponding softwares, and create easy to read checklists of the different failure modes when trying to open the file in a different software from one in which it was authored.
If that is true, then isn't it also very possible that MS had an incentive to keep building on top of its cruddy file format without rethinking the design at any point? Isn't it true that even today, there is no viable competitor to MS Office even on non-Windows OS because MS has made file format compatibility extremely difficult?
I am very glad that the latest and greatest features of Office automation after (and including) Office 2007 were mostly still-born and we could stop worrying about this issue and use MS Office as a tool and not as a "platform". It became too unwieldy even for hardcore MS supporters, and even though no good alternatives emerged, people generally decided that the programmability of MS Office had reached its limits - and moved away to other things like replacing them with browser based apps [1] I remember creating an Infopath "app" as a contractor around 2010, and wondering "what is this crap? Have these people never seen browser based apps?"
[1] This is probably a sweeping statement. I would love to hear from someone who is certain that Office automation was a superior technology for a particular use case.