Hacker News new | past | comments | ask | show | jobs | submit login
I was wrong about spreadsheets (2017) (reifyworks.com)
589 points by mooreds 8 days ago | hide | past | web | favorite | 369 comments





I work in critical infrastructure planning. My organization builds software in R, Python, and other programming languages customized for these major organizations.

So many critical infrastructures, billions of dollars in planning, and just systems are built out of Excel. It's amazing. You'd assume something that services millions of people a day would have some more sophisticated and customized solution, but you're wrong.

The reason is because most people know how to use Excel. Most people know how to handle it, use it, modify it, build up from it. You don't need to get a regular support contract from a company for your custom-built spreadsheets. If something critical in that "Excel software stack" breaks (aka Microsoft Excel), most of the time you just need to reset Excel. It's amazing. You don't need a new server, you don't need support contracts, it's just like riding a bike.

In my opinion, the limitation of Excel is actually when it comes to big-data analytics. The way Excel handles large quantities of data is slow (due to the nature of it's software structure). That's where we come in and build out these models and systems. However, in the end, our data outputs will be fed back into Excel, because that's what most people are used to.

I have deep respect for Excel. After all, Excel empowers so many users who don't know how to code to provide amazing plots and perform major calculations with ease.


Yeah it works and you don't need support contracts and then:

- Fidelity's "Minus Sign Mistake": loss of $1.3 billion - TransAlta "Clerical Error": loss of $24 million - Fannie Mae "Honest mistake": loss of $1.3 billion

Then you get employee turn over where new employees don't get "arcane" knowledge passed down by people who left and took their spreadsheet foo with them.

Excel does not have "access control", "auditing", "change tracking". Just getting work done is not enough.


> - Fidelity's "Minus Sign Mistake": loss of $1.3 billion - TransAlta "Clerical Error": loss of $24 million - Fannie Mae "Honest mistake": loss of $1.3 billion

Weigh that against <insert any company> makes $X Billion due to correct usage of Excel multiplied thousands and thousands of times over. Survivorship bias at it's finest.

Excel isn't perfect and has notable downsides, but work gets done all of the time on them for the aforementioned reasons.


I’m not convinced there’s a such thing as “correct usage of excel.”

What a lot of companies do is just scare entry level employees into being very careful. My friend’s girlfriend works at a place like this and people just accept all the problems with manually editing large spreadsheets because “that’s just life.”

Anything in excel is a hack and most of its users don’t know any better.


What's your solution? That the manager who wanted some report comes to you and begs you to do it and in 6 months it's finally delivered and all wrong?

The sweet spot where I work is online forms that pre-populate excel playgrounds and allow download. Then, you can tweak till your heart's content.

The manager goes to the people, who really know how to make a report (maybe that is "accounting"?) and asks them to make the report taking into account some single report specific things that they talk about. Then those people, who know how to do it deliver. They might have someone in their team, who knows how to work with dataframes, which would come in handy. Those people store the program written to generate the report from the data and are ready to modify when more is requested. Instead of doing code in Excel, they simply write their code in whatever language they use to process spreadsheets or other data sources.

If they have knowledgeable people there, it will take no longer than coding something up in Excel and will be more reliable.


I guess no one has used Power BI for reporting... Link the data source, save the view that "Manager A" wants every week or month, setup on premesis gateway, and your new reports are there before your are asked for them again. Always up to date. Then share your dashboard. Also, you guys should be ashamed of yourselves. "not sure who does the reports, maybe finance?" Eceryone should know how to make their own reports.

If you regularly produce reports one would suppose that you would have an existing project template and know how that would allow a reasonable individual to deliver a result that isn't all wrong in a reasonable length of time not THAT dissimilar from the time to throw up a spreadsheet.

This also wouldn't silo the data in the users laptop, subject it to being destroyed because the user dropped it, and limit frequency of updates and access to how fast the user can respond to emails.


And meanwhile the manager that broke the rules, connected their personal copy of excel straight into the database and made the company 8 figures in profit over the course of the last year just outdid you and your team of report programmers.

And meanwhile an actual data scientist looked at the data and outdid the manager by a factor of 10, by creating a more meaningful report, that is based on an actual statistically sound basis, preventing a disaster.

See, we can all conjure up imaginary people. :)


Logically most of the profit making activities may be enabled by tech it usually isn't created by tech. The managers rule breaking activities are probably unrelated to his success.

In my experience, business success is often hinged on the reports and the insights they provide. Without accurate numbers, you've got nothing to go on.

I can't not remember now the Salesforce for developers article :)

I think it will be easier to solve those problems in the Excel development space than it will be to export the Excel solutions to another solution domain.

Also the kind of mistakes you've described can (and do) happen in other programming domains that are deemed more respectable.


Unfortunately no.

Excel is hamstrung by having to maintain backwards compatibility for an endless number of hacks.

For instance, it is notorious that

0.1 + 0.2 != 0.3

in binary exponent floating point math. Excel does funny stuff with number formatting that hides this, but it is like having a bubble under a plastic sheet that moves someplace else when you push on it -- numeric strangeness appears in different places.

The right answer is to go to decimal exponent floating point math, but that is only HW accelerated on IBM Mainframes, maybe on RISC-V at some point. You'll probably crash Excel if you have enough numbers in it for performance to matter, but Microsoft would be afraid of any performance regression and it would break people's sheets so it won't happen.

On a technical basis we could use an Excel replacement that has some characteristics of Excel, and other characteristics of programming languages; one old software package to look to for inspiration is

https://en.wikipedia.org/wiki/TK_Solver

What makes it almost impossible to do on a marketing basis is that Excel is bundled into Microsoft Office so if you have an Office subscription you have Word, Powerpoint, Excel, Access, etc.


GP was about how Excel lacks "access control", "auditing", "change tracking". Then OP says it'll be easier to add those things to Excel than to make non-Excel suit the purpose.

Then you say no, backwards compatibility problems. But how does Excel's need for backward compatibility make it hard to add "access control", "auditing", or "change tracking"?


"access control" by definition means "make things not work some of the time"

Excel has change tracking, but like Jupyter notebooks and similar products it doesn't make the clean distinction between code and data that is necessary for it to be useful. (e.g. if I develop an analysis pipeline and use it for May 2019 it should be as easy as falling off a log to run it for June 2019)


To be fair in Javascript 0.1 + 0.2 === 0.3 also is false :). Not saying that Javascript is a great programming language.

It's a problem in all mainstream programming languages that Excel has a strange answer to.

The trouble with it is that it is a big distraction to the "non-professional programmer" who uses tools like Excel.


Yeah, that's the case for anything using IEEE 754 floating points.

"At its heart, VisiCalc is about numbers. One of the early decisions we made was to use decimal arithmetic so that the errors would be the same one that an accountant would see using a decimal calculator. In retrospect this was a bad decision because people turn out to not care and it made calculations much slower than they would have been in binary."

http://rmf.vc/implementingvisicalc


That sounds like you want to rebuild the whole paradigm of how 99% of software world treats numbers (which may be, at the end, a right thing to do but unlikely to happen at least in the near term) and you are laying the fault of it not happening at Excel's door. That would be very misplaced blame.

Why would anyone working with amounts in the millions or billions care about 0.1 + 0.2 != 0.3? Splitting hairs here I think.

Not so sure about that. We are possibly also talking about reports as in some other comments. Those might need to be exact. Very exact.

See that's the thing.

You don't need Access Control because it's just an excel file and whoever has the file has the planning model. In most organizations like this, there's only like 5 or 8 people who all work together on a team, so it's not like 50-some people who are separated. Excel is a tool, a tool that you use to get things done. I mean even programming solutions have issues such as Knight Capital Group's $460 million loss due to mistakes in their code and deployment processes. Planning modules does not mean Operational modules. It's a big difference/jump between the two.

Again, these are all tools. What's important is how you use it and how do you properly validate these calculations. the thing you also need to understand is that no critical infrastructure planning is perfectly to-the-dot numbers. Our systems are far too complex, built off of human operational decisions, and have so many unknown losses that planning modules are based on scenarios. Also we're talking about thousands of these agencies around the world and their critical infrastructure planning teams are mostly 5 to 8 people. The current agency I'm working with has hired around 3000 employees to operate the systems but only 5 or so are really in-the-deep running these models and building these plans.

These people are not programmers. They spent their time learning about resource planning, mathematics, operation theory, physics, and engineering. Excel is an incredibly user-friendly tool that lets them automate a tremendous amount of their tasks. They're all very intelligent and can definitely learn how to code, but that takes their time away from more critical skills and tasks they need to accomplish.

The most important solution to the problems you've stated is the workflow. You setup a proper workflow, and risks of these problems should be minimized.


>Then you get employee turn over where new employees don't get "arcane" knowledge passed down by people who left and took their spreadsheet foo with them.

Yeah, this never happens with a proprietary codebase!


Who needs historical knowledge when you can just rewrite the program every few years in a different environment with all new bugs? (and a smattering of the old ones, too)

Few work things have bemused me more in the last couple of years than watching someone re-implement a slow SQL process in an HDFS big data platform forgetting all of the things that went wrong the first time, fixing them again, and then running into weeks worth of new bugs just to get not time critical data faster.

Fidelity didn't lose $1.3B, they miscalculated by $1.3B. The same with the other numbers you quote. Spreadsheets are error prone, but they still make sense for a lot of use cases.

https://www.cio.com/article/2438188/eight-of-the-worst-sprea...



Or tests suites for checking formulae or how the said formulae are applied across the spreadsheet?

The Fidelity "minus sign mistake" didn't create a loss. The mistake was in relaying the information to the end user. It didn't actually cause a loss of that magnitude.

That is like saying if I mistyped in a word doc, that word created the loss.


Ahn, so sorry, but that's a very flawed argument.

Arcane knowledge will be an issue with magical spreadsheets, as it is with any software (and I could argue that with custom software you have this legacy knowledge over both code AND ui, which is worse than just a spreadsheet).

The last paragraph is key, though - auditing and access control are tough ones for excel, and there are many tasks that require this functionality.

The trade-off is customizability - any excel can be changed to your specific need whereas that nice and shiny enterprise software you build may lack one or two things which you will never be able to change.


> Excel does not have "access control", "auditing", "change tracking". Just getting work done is not enough.

Doesn’t the web version of Excel already solve that? I’m not familiar with it but Google Sheets does these things, at least to some extent - I’d expect Office 365 to do so as well.


Uh... You have the same problem with Cobol written 20 years ago without comments.

I just hate one single thing about Excel - all function names are localized - and in case of my native language - they are horrible and inconsistent.

While I haven't encountered localized names, localized formats make Excel an absolute pain for me.

My language uses the "European" number format of a comma for decimals and periods for thousands separators. Excel tries to adjust to that by using semicolons for argument separators (i.e. ADD(1.5, 3.5) -> ADD(1,5; 3,5)).

The problem is that their locale detection is wildly inconsistent and there isn't a good way to override it without changing system settings. When moving a file between computers, this is an utter disaster.


Excel goes one step further, and assumes that users in some locales (e.g., Dutch) want semicolons as separators in every CSV file you open in Excel.

I don't care (I just use LibreOffice Calc, which accepts any delimited values file just fine, and just asks which separator to assume), but it means that when you develop an option for users to download some statistical data as comma-separated values file (which is easy to generate), that you now have a problem if that user wants to open it in Excel (which makes sense for CSV) and that user is using one of Excel's broken locales (e.g., Dutch) where comma-separated values are not understood (again, it wants semicolons in Dutch).

I.e., the expected file format changes because of the locale Excel uses! So where someone in the US can just double click the CSV file (not to mention anyone using LibreOffice/OpenOffice Calc anywhere in the world), someone using Excel in the Netherlands will just get a spreadsheet where every line consists of that whole line of values stuffed into column A.


And that's not even the worst. You can teach people the few buttons necessary to read in the text, but once they save it again, the file is changed! Now you have a different separator and chances are good, excel also mangled date columns, removed leading zeros from numbers and switched out decimal separators! So you may not be able to further process the file. And its also sad, because if they just made some of these changes optional, excel would be a good tool to quickly work on an csv file and pass it on. But as is, I am always hesitant to work with csv in excel, because it mostly breaks and becomes unusable for further processing.

I advise people to use the Import Data From Text/CSV feature rather than opening CSV files directly. It spares everyone a headache.

Oh I didn't know that is a localization thing. I use the german version and got used to open csv's in notepad first (search and replace ; with ,). I always thought that's some kind of the usual MS vs. the rest of the world thing.

But this a whole other level of stupidity.


You can specify what separator to use as the first line in the CSV file:

sep=,

My experience is that Excel 2007 and later will correctly parse the file and use the specified separator. However, other software, such as Google Sheets, will simply render the declaration as-is.

Then there's the issue of what character encoding to use to encode the file, whether to include a byte-order-mark with UTF-8 to make Excel recognize the file as UTF-8 and the effect that has on whether the separator line is recognized (spoiler: it isn't).

Here's part of the documentation I wrote for Calcapp's CSV exporter, which digs into these issues in more detail (Javadoc):

  /**
   * The prologue of files containing comma-separated values (CSV). This
   * prologue contains an instruction detailing the separator character that is
   * used in the file. This instruction is known to be understood by Microsoft
   * Excel 2007 and later versions, but is rendered as-is by other spreadsheets,
   * including Google Sheets.
   * <p>
   * Microsoft Excel expects either a comma or a semicolon to separate values in
   * CSV files, depending on the Windows locale. The only way to produce a CSV
   * file that can be read by Excel regardless of what locale Windows is set to
   * use is to use a prologue similar to this one, which explicitly tells Excel
   * which separator is used.
   */
  private static final String PROLOGUE = "sep=" + SEPARATOR_CHARACTER + "\n";

  /**
   * The character set used to encode files containing comma-separated values
   * (CSV): UTF-16LE (UTF-16 for little-endian systems). Using UTF-16LE allows
   * characters that cannot be represented by the ASCII character encoding to be
   * correctly read by Microsoft Excel and other spreadsheets.
   * <p>
   * There is no way to formally specify the character set used by a CSV file.
   * With one exception, Excel assumes that CSV files use the ASCII character
   * encoding, unless the first three bytes consist of a byte-order mark, in
   * which case the UTF-8 encoding is used. (A byte-order mark is redundant for
   * UTF-8, as it does not depend on endianness, but is traditionally used by
   * Microsoft Windows applications to detect whether a text file uses the UTF-8
   * encoding.)
   * <p>
   * Excel only recognizes a file as being encoded with UTF-8 if a byte-order
   * mark is included, but doing so prevents Excel from recognizing the
   * information of the {@linkplain #PROLOGUE prologue}, which in turn prevents
   * CSV files from being produced which work regardless of the locale Windows
   * is set to use. This is likely due to a bug, present in Excel 2007 and
   * likely later versions as well (based on anecdotal evidence).
   * <p>
   * Fortunately, Excel does recognize another character set which can encode
   * all of Unicode: UTF-16. Excel likely uses heuristics to determine that a
   * file is encoded using UTF-16. (Text written using Western languages and
   * encoded using UTF-16 tend to include many null bytes for various reasons,
   * making the detection of UTF-16 trivial, but only for Western languages.)
   * Unfortunately, UTF-16 is dependent on endianness, meaning that it would be
   * desirable to include a byte-order mark at the beginning of the file.
   * However, that does not work due to the aforementioned bug.
   * <p>
   * In other words, using UTF-16LE should work well for CSV files containing
   * mostly Western text and parsed on little-endian systems. It is probable,
   * though, that files produced using this converter will not work if the text
   * mostly contains Chinese, Japanese or Korean characters or if the file is
   * parsed on a big-endian system.
   */
  public static final Charset CHARACTER_SET;

That separator preamble seems Excel-specific though:

https://tools.ietf.org/html/rfc4180

If I'm going to mangle CSV into a non-standard form to conform to whatever Excel expects, I might as well go the extra mile and just provide an XLSX file.

That was my solution to a data-export function in an API (it switches on the HTTP Accept header). It now offers a choice of CSV (RFC 4180), ODS, and XLSX, and people can just pick XLSX if their Excel is using one of its broken locales.

It's not too hard to generate the XML for ODS (OpenOffice/LibreOffice etc.) and XLSX (Excel) from a bunch of tabular records once you've set up minimal empty template ODS/XLSX files to inject it in. It's certainly less work than explaining to Excel users that their software is broken and how to work around it.

That said, ODS was much easier to get working.


+1000. This is ridiculous. Couldn't they at least make this optional so you'd be able to switch this off somewhere in the configuration dialog, registry or command line? I hate localized apps altogether (who even needs localized Visual Studio srsly? I can't believe anybody can be competent in a .Net programming language and relevant frameworks and practices without being able to read English) but localizning some apps makes some sense for some people, nevertheless localizing functions feels beyond reason. I feel thankful they didn't localize C# and command line commands :-)

> who even needs localized Visual Studio srsly?

What's not to like on Oracle's localized error messages that you can't google because all the documentation is in English?

Even more because some on my language omit useless words like the equivalents of "not" or "can't".


Sounds like this is being done on purpose to sell more support services and to hinder international migration of specialists.

The worst of it is that there are some very specific (and undocumented) edge cases where the translation won't be done properly. It create very 'fun' to debug issues where a spreadsheet would work in a French Excel and not in an English one.

(For instance if you use the "Row-Column" cell reference, in English it is `R1C1` while in French `L1C1` - and it won't translate)


Even worse, the token used to separate arguments is localized. So you may have to use semicolons instead of commas. But in some places you need to pass a string representation of a formula and there Excel will only understand the non-localized commas. Fun all around!

I once had a contracting gig where I was tasked with updating a bunch of Word BASIC macros (this was Office 95 days...) used to maintain the ISO 9001 documentation for a major company.

Problem was the original version was written in their Danish office, and whoever wrote it had handed it to the Norwegian office exported as text before he left, and they'd imported it and started munging it and adding lots of stuff before someone bothered to try running it and realized it was completely broken. I wish I'd dug more into how they'd managed to get themselves into a situation where they'd significantly modified the code before trying to run it even once...

Version control? What is version control?

So they had this broken version that had some Danish keywords with a bunch of Norwegian updates, that they didn't want to just re-import into a Danish version and copy over properly in a tokenized form and having to try to identify and re-apply the Norwegian updates. So I got the fun job of properly translating it and figuring out what the new code was meant to do and make it do it in the process, with no access to any of the people who had worked on it.

[it is worth pointing out that this was an office for several hundred staff of a major international company that developed large scale information systems for things like police departments; while version control wasn't everywhere in the mid 90's, a large systems integrator certainly ought to be using it... Particularly amusing that it was their ISO 9001 documentation that was being handled in such a haphazard manner; happy to have never been a customer of theirs, though]

The task was extra "fun" for certain values of fun, because unlike, say, English vs Norwegian, or English vs Danish, Danish and Norwegian are close enough that 80% of the time a term from the Danish version might look right and be valid Norwegian, but the odds seemed to be (and maybe my memory is exaggerating it due to my lasting memory of extended pain) about 50/50 that the Norwegian translator of Word BASIC had chosen a different function name to the Danish translator for no good reason. Or there were slight, hard to spot, spelling differences. And sometimes what looked like a mistake was a function defined locally.

I spent a couple of weeks reading through the whole thing and mostly rewriting code that could have been written from scratch in a couple of days if they'd just told me what the end result was meant to be instead of dumping a pile of non-functioning code on me. But I guess I was cheap compared to their regular staff back then.

It was how I learned Word BASIC (I'd taken the contract, figuring it'd be easy to pick up, so I told the recruiter that sure I knew it, on the basis that I certainly knew a couple of variants of BASIC). Never to use it again (at it was replaced by a Visual Basic variant a few years later anyway)


Don't use the localized version of Excel in general. Problem solved.

In a corporate environment you normally don’t get to choose that.

You should.

tell that to enterprise IT.

Make a new locale :)

On the same hardware, with the same table, Excel in Windows 7 does a cross-tab faster than MySQL in Ubuntu does. Indeed, MySQL chokes if there are more than a few hundred columns, but Excel just keeps going. And it uses all CPU cores.

Edit: In case anyone is wondering why I did that, I wanted a simple visualization of ping-location results for thousands of IPv4 from several hundred ping-probe locations. So that meant aggregating (getting minimum rtt for) millions of observations, and displaying min(rtt) in a IPv4 by probe location cross-tab. MySQL did a great job at the aggregation, but choked (as in, errored out) with several hundred numeric columns. Even if I converted them all to SMALLINT.


The other thing people don't realize about Excel is that it has a compressed, in-memory, columnstore analytics database built into it. It was called Power Pivot then Power Query, not sure if it changed names again, but it's actually a SQL Server Analysis Services Tabular Model, the same thing that's in Power BI as well. It's going to perform much better than a row-oriented RDBMS at aggregating columns because that's all it was designed to do.

It was a separate data processing engine within Excel developed by another team (SQL server) for the purposes of self service BI (Power Pivot/Query). Most probably, this engine is not used for normal (traditional) calculations.

You mean if you create the pivot table from data in a sheet? It's not needed for that because you'll run out of memory trying to fit any more than about a couple million rows in a sheet first. Meanwhile you can easily load 10s and possibly 100s of millions of rows into Power Query.

You're blowing my mind here a little bit. I don't use MySQL but do use Excel and the R/Python data science stacks quite a bit. I regularly open data with tens or even hundreds of thousands of columns in the latter by using lazy computations. Is this not the case with database approaches? What kind of hardware are we talking? Presumably Excel has to load everything into memory at once since it's immediately viewable, right?

In MySQL, maximum row size for non-text columns is 65,535 bytes. MySQL is limited to 4096 columns per table, and 1017 columns per table if the InnoDB engine is being used.

Excel's limits are 16384 columns and 1048576 rows.

I'm talking wimpy hardware here, I admit. Basically, VirtualBox VMs on a quad-core i5 box with SSD and 8GB RAM. With the VM having three cores and 6GB RAM. But it was the same wimpy hardware for Windows 7 and Ubuntu.


Oh, the day Excel got rid of the 64k line limit! Was a very sweet day!

Notwithstanding the row length and column count limits, did you also give MySQL more memory? Setting InnoDB buffer pool size to 3 or 4 GB might help; its default is about 134 MB.

I think so, but don't remember for sure.

> Presumably Excel has to load everything into memory at once since it's immediately viewable, right?

Depends on the file format. An XLS or XLSB file can contain special markers for where each logical row starts, so it can randomly access rows; Both also can persist "calculation chains, which are a simplified dependency graph. The binary formats also store formulae in a parsed representation allowing easy scans to see what cells have to be inspected if a file needs to be recalculated.


Aha! As far as persisting calculations, this makes sense. I guess in my head I was thinking "pure" data where everything opened was precomputed. Forgive my ignorance about database solutions, but do they not implement something similar?

But now you've got me thinking, it would be nice if libraries like Dask could allow for flagging of symbolic operations like this to be written to disk for quickly saving metadata where intermediate steps don't explicitly need to be saved.


Might be worth a try to look into Postgres. In case of the column problem, PG has a limit of 8 Kilobytes for Column Width (instead of a count of columns). You can circumvent some of the limits by using arrays, types, json or hstore. It also has pl/Python which lets you code up queries containing python fragments for data processing.

Meanwhile <side rant> open any document in Photoshop with a few layers and effects and the computer grinds to a halt, no matter the specs, no matter the year, through the ages. New hardware comes out? Booya, new Photoshop XYZ -> let's put your fancy hardware on its knees, begging for air.

>Meanwhile <side rant> open any document in Photoshop with a few layers and effects and the computer grinds to a halt, no matter the specs, no matter the year, through the ages.

Hasn't been my experience ever. Photoshop is one of the speediest image manipulation programs out there...


I'd agree, at least when compared to GIMP. It's also like 100 times more intuitive to use Photoshop than GIMP for a beginner.

while paying a monthly fee...

The monthly fee for Photoshop is around 1/100th of the old retail price. That works out at about 8 years of use before you have to 'buy' the software again. Seems fair to me.

And for those who don't like paying for Photoshop - which, given it's an astonishingly powerful piece of software, probably means "people who don't actually need Photoshop" - there's always cheap or free alternatives that provide about half the functionality.


>The monthly fee for Photoshop is around 1/100th of the old retail price. That works out at about 8 years of use before you have to 'buy' the software again. Seems fair to me.

That's because you only think of yourself. Creatives are one of the most struggling professions, and can have widely different returns per year, and salaries in different parts of the world can be much lower.

People who could afford to buy Photoshop at some point (perhaps even a used copy), later down the road might not be able to pay the subscription for a few months, but in the new scheme they lose access to the program altogether.

8 years? There are creatives that use 10 and 20 year old versions of Photoshop on Windows.


You pay per year. $120 gets you access to the photo bundle for one year. That's 30x cups of Starbucks drip coffee a year. Creative people who are 'struggling' blow more than that a month on weed, beer, coffee or Tinder. We all make choices that show what we prioritize.

Anyway, the subscription model for this kind of software works really well. Software keeps becoming better and so far Photoshop has been running circles around the limpy thing known as Gimp

P.S. I just got hit with the next year charge. I was annoyed so I tried installing CS 6. It's OK. It still beats Gimp but I would rather not drink coffee or smoke weed for a month to have money to pay for the yearly subscription for the photo bundle.

P.P.S. And if you are really really really struggling "creative" you are probably young and either go to school or have a friend that goes to school which can get you it under the EDU discount, which is peanuts.


I see many versions of Adobe Photoshop CC on The Pirate Bay, versions 2017-2019. At least one on them must work.

> And for those who don't like paying for Photoshop - which, given it's an astonishingly powerful piece of software, probably means "people who don't actually need Photoshop" - there's always cheap or free alternatives that provide about half the functionality.

Now? Absolutely! 20 years ago this was much harder. Adobe used to dominate the market so that you used Photoshop even if you often didn't really need all its complexity. But these days, there is so much competition that you can probably find something that works well enough. I would assume that hurts their bottom line, but as a consumer, I am happy.


> Adobe used to dominate the market so that you used Photoshop even if you often didn't really need all its complexity.

Still does. Part of the reason is that in the past, they turned a blind eye on people pirating Photoshop. Every kid with even passing interest in graphics or photography got themselves a bootleg copy of Adobe tools to work on; it was an easy guess what they'll be using later, as adult professionals.


Haven't noticed this problem with GIMP on my Thinkpad X100e.

I wonder how something like python/pandas would do on that.

So true, I used to run European cross-docking ops, planning, inventory, forecasting, scheduling, in Excel back the day. Successfully scaled by 100% YoY for three years. Was quite a lot of fun.

Issue I have with that approach is not using excel per se. It is using Eycel in addition to what ever system is being used the first place (SAP for example is a pretty popular thing to circumvent with Excel sheets). That and over engineering spreadsheets to the point where nobody but the creator can use them anymore. At that point you data just gets suspect. And circumventing existing systems with local offline spreadsheets just screws up everything. Now combine these two.

But that has less to do with spreadsheets themselves, see my fist point regarding cross docking, and more with the application. Used correctly spreadsheets can be incredibly powerful. And at least for my purposes I have yet to encounter an analysis issue I failed to dig through using Excel. Sure, something like Python might have worked better but consider me to be an empowered user (read: I have no clue about Python or SQL or...). And at least I knew I could trust the way the analysis was done as intended.


> It is using Eycel in addition to what ever system is being used the first place (SAP for example is a pretty popular thing to circumvent with Excel sheets). (...) And circumventing existing systems with local offline spreadsheets just screws up everything.

I usually side with the circumventers on this one. This misbehavior happens for a reason, which usually is that it's impossible or infeasible to do the work with "proper" systems. Excel sheets on pendrives is what you get when people can't exchange data using "correct" software, or when people need to continuously iterate on the shape of the data set, while getting the schema changed in the "correct" system would involve a ticket to IT and 2 months of waiting.

I think instead of paying people for doing a job and then doing everything possible to make that job difficult, companies should embrace that people will work around any deficiencies of top-down systems, whether software or procedural. But I guess this tug-of-war exists since the time of first corporations.


True that. I used to both back then, first a user and then a SAP system guy. For me it is always a death spiral: people don't trust the system or it is too cumbersome, they use Excel, the system gets even less reliable, the use Excel even more,... and so on. I always saw it as my role a system guy to give users a system they can use and then force them to actually use it. Because, when providing the systemnI talked to them first so in the end it was ind of their system too.

And then there are examples like the production planning done in Excel because SAP PP isn't just good enough. Then the guy who created said Excel tool left. Years later when production had to switch to weekend shifts zhey couldn't because no one could adopt the Excel tool. Once you reach that point you are screwed.


Agree but slight modification - goal should always be for the users to be able to do their main functional tasks in-application and maintain it as the system of record.

But there will always be a use for Excel as a tool for informal data-monkeying and analysis - exporting a table and doing some pivots to summarize or use a different view. Non-technical users will always expect this capability and flock back to it. Not everyone's a programmer or has the interest to be - you'll have to pry excel out of most accountant's cold, dead hands for example


You describe the ideal world. Tools like SAP or Business Warehouse suck at data crunching and nor everyone can use Python for example. Even ops can under the right circumstances be okish offline in Excel as long as results are imported into e.g. SAP to assure data integrity.

I feel like PowerBI is a good solution to overcome Excel's deficiencies while working well with Excel. It forces you to structure your data and define measures while being dyanamic, efficient and friendly to less structured inputs.

I have to give PowerBI a serious shot. We have it at my current place but nibody is really using so I need some spare time to dive into it. Now, the more I think about it, the combination of MS Office, Power BI and MS Dynamics with a shopfront from, say Shopify would make up a quite nice e-commerce set up.

> I have deep respect for Excel.

I do something around the same line as you describe (build better tools in R/Python for larger data problems) but I have a deep aversion to Excel. It's proprietary, has horrible standards (it does not support native UTF8 csv files for example), and is stuck in the 80's in terms of paradigm. And this is precisely the tool that prevents people from doing things more efficiently because "they can do it manually in Excel even if it takes a long time". It makes people take really, really bad habits.


Excel is a tool. It empowers those who don't know how to code to do their jobs more effectively than before without "learning to code". It's something you keep in your garage in case you need it. It depends on what their current workflow is. Sometimes I'll use Excel, sometimes I'll use R/Python.

What's important is that Excel gives you access to the computing power of modern technology with a lower barrier-of-entry/knowledge. That's one of the most valuable things that many people fail to understand. Not everyone is a programmer and for a lack of a better term, "doesn't really care about UTF-8 and just want the calculations to work". Also, in many cases Enterprise IT in critical infrastructures are resistant to supporting R or Python due to their security models (firewall rules prevents install.packages('tidyverse') for example).

This is what makes us specialists and someone they bring on board to help them out and explain to them about what the "best-use-case" is, and then watch as people try to cram a square peg into a round hole.


Well if you're using Python, you're following an imperative/oop paradigm which goes back to the sixties, so I don't know why you'd criticize Excel for using a dataflow model. I can understand not wanting to use Excel because it's proprietary (but Libreoffice isn't) but I can can understand not wanting to deal with a lot of it's legacy cruft, but for the stuff that Excel does well, it does it a lot more quickly, more interactively, and more intuitively for most people than a similar solution in Python. In fact, one of the key benefits of an Excel solution is that you can see every step of your calculation which creates a degree of self documentation which you don't get with Python, for example.

> that you can see every step of your calculation which creates a degree of self documentation which you don't get with Python, for example.

If you want to do that in Python then Jupyter Lab/notebook is a good solution to document your code and see the result of each operation. And way faster than Excel for large data tables.


Yep. My first job had a lot of Excel. I hated it. Once I learned pandas, I rarely needed Excel again for my work.

Think of all the open, modern, nicely standards compliant tools that you use. Now think of the fact that Excel lets people solve their problems more easily than all the things you like.

I'm not saying it's good, but it's better than everything else (for the people who use it.)


> it does not support native UTF8 csv files

It does though? You can export as a UTF-8 csv file...check the export options.


I am talking about importing CSV files in UTF8. If I use Japanese characters for example it fails utterly to display properly unless I import it as "Text" first.

The secret trick for importing data into Excel (without actually writing a file in the Excel file format): don't try CSV - as you have already noticed that will fail in all kind of fragile ways depending on your character encoding and operating system locale. Instead, convert your data into an html table and then paste or import that into Excel.

But not any html table, you need to have the appropriate proprietary css properties in there, such as the mso-data-placement:same-cell. Those are actually documented ...in the Office HTML and XML Reference published in 1999.


I frequently go from SQL query results to HTML table to excel in a similar manner, but I haven't had issues w/ any proprietary CSS properties. Using Paste Special -> Match Destination Formatting it always comes out without any issues

No, you should use the Data import tools--that's what they're there for!

Yes, it takes a little longer, but you should use Data > From Text/CSV for more complex text/delimited imports.

In what way is the paradigm stuck in the 80s?

But how do you handle sharing? The main issue with Excel, and its main downfall, is the sharing of data and versioning, with person X has an older version of person Y by a few hours and needs an email of person X to keep doing her job.

With a few tricks, you can actually use a professional version control system like Git with Excel, see the slides of my recent webinar: https://www.slideshare.net/xlwings/git-for-excel-files-webin...

If it's O365 Excel, then you can share and work on the same spreadsheet in a similar way to Google Sheets (seeing other peoples real time cursor in your spreadsheet etc).

Be very, very careful with this. There are some update/race condition bugs in Excel Online that result in multiple users causing bulk updates to incorrectly update the filtered/sorted view of other users.

This race condition happens often. In fact about every time I try to rely on it with my team. Such a shame Google Drive isn't seem as a "trusted platform" in my client environment.

Douglass Engelbart rolls over in his grave.

Sharing Excel sheets is extremely easy - you just mail them, or host them in some online place, like SharePoint if you've bought into the whole MS ecosystem like many companies do. Not sure what your point is there.

Versioning is of course more difficult, though Excel does support diffing in principle. I expect though that what most people end up doing is simply keeping track of versions manually, same as they would have in the days before excel.


> Sharing Excel sheets is extremely easy - you just mail them

Unless the other computer is set to a different locale and your data contains formatted numbers and dates. Then everything breaks.


Are you sure? I'm pretty sure Excel doesn't store numbers or dates in localized format.

They may have finally fixed it in the later versions. I remember it being a pain in an office that had multiple locales on various PCs.

Diffing in principle is not diffing in reality. It's terrible at best. I 'm not sure what your point is.

OneDrive now supports versioning of documents, and the latest Office 365 uses this to offer “autosave” and version history.

I’ve not really dug into the feature but it’s very visible on the latest builds.


Honestly, people handle sharing via Google Drive/Dropbox/Enterprise solution plus file naming convention.

The current enterprise network I'm on? They block Box, Dropbox, and a lot of other solutions. They instead use Citrix's ShareFile and a NAS setup with file naming.

Also remember, surprisingly most of these critical infrastructure planning isn't handled by hundreds of people, at most it's a team of 5 people who then present their findings and suggestions to the C-suite decision makers. It's not as big of a problem as many people think.


We tend to keep ours on Sharepoint - which means we could use Excel Online to work collaboratively, but usually we don't need them all at the same time, so open in Excel desktop and "check it out". If someone else needs it, they can just ping on Skype and see if you're done.

Sharepoint or Office 365

Thanks I actually didn't know that.

> It's amazing

It is amazing, just not in a good way. Maybe I should try to find and post a link to that paper where it was pointed out that Excel was munging things like gene names because they get interpreted as dates in the 'untyped' cell input, and these things were showing up in published research.

Excel is a disaster, it will seem to work until it turns out it's doing something weird behind your back, or maybe you've made a careless mistake somewhere and it has zero tools to help you catch it. And you won't know until it's far too late. Please say no, use anything else - Python, JavaScript or heck even QBASIC or whatever - but just don't use Excel.


A friend of mine works converting spreadsheets to Python/Java programs. Basically people at the company solve the problem with Excel, and when that gets to slow, he codes it up to read from a CSV and they're off to the races.

It's amazing how far these business types can get before needing a programmer to optimize things. As a programmer, I tend to think in terms of logic, and then I tried to build a complicated budget tool in Python and the turnaround was just atrocious. So, I did it in a spreadsheet instead and was far more productive.

Spreadsheets are awesome, and whoever came up with the concept should get some kind of award. I can't think of a single software tool that has enabled more productivity than a spreadsheet.


> I can't think of a single software tool that has enabled more productivity than a spreadsheet.

I submit to you: a clock, and a calculator. :)


Excel is good, but the problem is data storage. I always wanted to try and solve that problem, but it is very hard.

I do similar work but a different industry. My solution has been also to have Excel downloads, which work really well, but there are drawbacks.


"just like riding a bike" - best summary of Excel I've read!

And is it just me, or hopping on OpenOffice Calc or Google Sheet felt like a different device altogether: They all look like bikes, but accelerate differently, change gears differently etc?

same here. like coming home but someone has moved things around in a way thats subtle enough its not obvious but enough to make it feel unfamiliar

I love working with spreadsheets! In a past life, I was all about taking data from various sources, processing it, and spitting out CSV's that I would then work on with a data analytics person using Excel. Curious if y'all use a similar approach of generating CSV's or do you go even deeper and actually create excel sheets with excel native stuff in it (like formulas).

Can I ask, what is "critical infrastructure planning"? Googled it, but wasn't sure as it appears to transcend many contexts.

What I specifically handle is Water. Starting from the natural system (weather generator + Hydrology) through to the human system (Demand model (showing increase over time and temperature increase) + system operations (dams, reservoirs, treatment facilities, pipes, etc.) to delivery to customers.

The big problem we answer is "How can we continue delivering water to people while under climate change uncertainty? What policies or new infrastructure can we build so we remain robust and resilient under these new conditions?"


I would guess something like transport (train network) or maybe utilities (power grid)?

My issue with spreadsheets is that they could be improved a lot with minimal changes, not nobody seems to do it in popular office packages. Making them more database-like and making table data first-class (at least you can make named tables in excel on windows) could be used to push people a bit more towards organised data, without changing how anything works.

Half of the mess that makes excel hell comes from the fact it's too easy to put two tables of data + some random constants on a single sheet and refer to them by H3. Now it's hard to add more data, hard to move anything, and hard to create space which expands to the next row with existing formulas.

Airtable (and Access) implements this idea, but unfortunately sacrifices the generic, free-form spreadsheet along the way.

I'd be even happy with clippy popping up with "It looks like you're adding a new table in the same sheet. Would you like to learn about using multiple sheets?"


I'd recommend watching this YouTube video; it's only one person's take on the right way to use Excel, but Excel has many ways to handle these issues:

https://m.youtube.com/watch?v=0nbkaYsR94c


The link is to "You Suck at Excel" by Joel Spolsky, which is an excellent tutorial.

I sincerely wish it had a different name, though!

I would love to send this video to some of my business colleagues inside a large enterprise. They need this information and they would enjou everything about this video. However, it would not be acceptable to send them a video entitled "You Suck at Excel."

If it had a more enterprise-friendly name I would even have a link and description to it in my email footer inside the enterprise. It would really help a lot of people.


You can embed it in a more professional page, and hope that no one will notice the title at the overlay ;-)

Genius idea. I have a bunch of domains, combine it with the right subdomain... and we're onto something. Adding this to my list of things to do. Cheers!

Update: okay check out https://excel.secretgeek.net/

I've re-badged it as "Secrets of Mastering Excel" and used absolute positioning to put a label to that effect over the video's title.

Ideally I'd detect when the video starts and remove the label. Hmm. Not sure of the right approach.

I recommend (at the foot of the page) that the viewer watch it in full screen (which will remove my dodgy title)


That's a classic, definitely to be watched!

Yup, love it, keep recommending it to people.

I'd be over the moon to see one change to Excel. Native support for a language other than VBA. Perhaps the CLR. Perhaps Java/Typescript. Just something (optional) for programmers who want to use Excel and not want to deal with the garbage that VBA is.

Some workarounds exist, but afaik they require collaborators to also have the tool install, which is dead in the water.

I know MS has considered it, I'm still pretty surprised they've haven't followed through.


MS hasn't just considered it. They have done it. This is called Visual Studio Tools for Office. https://en.wikipedia.org/wiki/Visual_Studio_Tools_for_Office...

I've built plugins with VSTO (for Outlook) when I was at MS, but my memory is that it’s dependent on a plugin being distributed and not available for native use embedded into the excel workbooks.

You have to install Visual Studio, compile a plugin, register the plugin, and then from VBA you can call out to that COM interface if you'd like. Even if Excel users could overcome those hurdles, you still break the collaboration flow (i.e. just sharing an excel file).

Looking a little deeper, it looks like they're starting to support Javascript for the newer cross-platform add-in system (and also VSTO), but it looks like you still have to distribute your JS add-in via a web-service as opposed to being fully integrated into Excel and XLSM files.

The bit of just being able to share the XLSM file and users using Excel with no other installs or special network access, is really the make or break for me and non-professional programmer user scenarios I've seen.


Microsoft would also like to see that change, that’s why they tried to kill VBA. The reason for not following through (in killing VBA, because they definitely support and prefer other extension mechanisms) was pushback from users.

Wait, really? I've been working in a thoroughly MS ecosystem for the first time this year, and I can't seem to find any evidence that they've tried to kill VBA based on how often I run into situations where VBA (or worse, VBscript) is the ONLY option, especially when working with excel's developer functions or writing expressions in SSRS utilities. Did the users push back because the options they were presented with were use VBA or have no scripting capabilities?

To be fair, they denied that VBA was at risk: https://blogs.msdn.microsoft.com/architectsrule/2008/01/23/v...

But the fact is that it was missing from macos Excel for a while and they wanted people to migrate to VSTO: https://searchwindevelopment.techtarget.com/tip/On-migrating...


Google Sheet can run function and macro written in Javascript. That is what I use.

Although I don't like JS, I use that too, but it's pretty half-baked tooling compared to what Excel has. I would rather like to see just being able to point out some Google Cloud Functions and use those as spreadsheet commands. That would be powerful. Now you can but you have to do plumbing and you cannot reuse libs over sheets unless you make them public (last I checked, I could be wrong on this one but I read posts from as far back as 2011 where 'they are working on it'); you have to copy/paste. Yuck.

I wish someone (MS, ...) would do something, but I work on Linux so no Excel for me. And I do like very large datasets to be in the cloud anyway and not killing my laptop.

There should be more competitors in the space. I know there are a few, but they are not really competitors, just more niche / boutique products that attack a specific case, so you need to go back to Sheets or Excel anyway.


> I know MS has considered it, I'm still pretty surprised they've haven't followed through.

Didn't they add Javascript support in 2018?

I agree though. They brought up that they were considering Python3 integration like 2 years ago and haven't said a word about it since.


You've been able to control all the Microsoft Office and other apps like Internet Explorer through "out of process" OLE automation for ages, and Python has a great win32com module (part of pywin32) for controlling "in process" and "out of process" OLE controls / ActiveX components.

https://pypi.org/project/pywin32/

http://timgolden.me.uk/pywin32-docs/html/com/win32com/HTML/d...

Intgrating COM into Python is one approach, but another approach is integrating Python into Active Scripting. (The age old extending/embedding debate.)

https://docs.python.org/3/extending/index.html

And Active Scripting (1996) let you plug different "in process" interpreters into the web browser and other multi-lingually scriptable applications, and call back and forth between (many but not all) ActiveX components and OLE automation interfaces more directly, without using slow "out of process" remote procedure calls. (Some components still require running in separate process, like Word, Excel, etc, which work, but are just slower to call).

https://en.wikipedia.org/wiki/Active_Scripting

https://ipfs.io/ipfs/QmXoypizjW3WknFiJnKLwHCnL72vedxjQkDDP1m...

>The Microsoft Windows Script Host (WSH) (formerly named Windows Scripting Host) is an automation technology for Microsoft Windows operating systems that provides scripting abilities comparable to batch files, but with a wider range of supported features.

>It is language-independent in that it can make use of different Active Scripting language engines. By default, it interprets and runs plain-text JScript (.JS and .JSE files) and VBScript (.VBS and .VBE files).

>Users can install different scripting engines to enable them to script in other languages, for instance PerlScript. The language independent filename extension WSF can also be used. The advantage of the Windows Script File (.WSF) is that it allows the user to use a combination of scripting languages within a single file.

>WSH engines include various implementations for the Rexx, BASIC, Perl, Ruby, Tcl, PHP, JavaScript, Delphi, Python, XSLT, and other languages.

>Windows Script Host is distributed and installed by default on Windows 98 and later versions of Windows. It is also installed if Internet Explorer 5 (or a later version) is installed. Beginning with Windows 2000, the Windows Script Host became available for use with user login scripts.


I dislike VBA too, but it would be easier to stomach if Excel at least had a decent IDE. Compare the VBA editor in Excel with Visual Studio and the difference is stark.


I agree. Both https://airtable.com/ and https://www.smartsheet.com/ in my option add some value on top. Unfortunately they are not omnipresent as Google Sheets and Excel.

> Half of the mess that makes excel hell comes from the fact it's too easy to put two tables of data + some random constants on a single sheet and refer to them by H3.

That is a feature. If you can keep in a screen all the data you need, you reduce the cognitive overhead of having to switch back and forth between tabs. Of course if you start needing to add more data you need to "refactor". But it does not seem a lot more different that starting with a prototype and having to change everything to support the features of a real product.


> it does not seem a lot more different that starting with a prototype

People underestimate this point. Spreadsheets are one of the best possible approaches for prototyping data-driven applications, thanks to its dual data/code nature and the ease to build and extend data types (just add more columns to a table); it's like building the application inside a debugger.

Non-developers don't have anything else that resembles a debugger, anywhere in the common approaches of software for end-users in industry.

Combine this with spreadsheet applications working as integrated environments -a single tool that handles all the computing needs of a project, without having to build a toolchain-, and it's no wonder than it's the preferred method for non-programmers to build custom automation workflows when their needs aren't supported by any specific software.


> Spreadsheets are one of the best possible approaches for prototyping data-driven applications,

Bingo. Back when I worked in big corp Excel was the prototyping tool for the masses. If someone in a remote office built something in Excel that solved a problem that was useful to other offices my team would come in and use that work to build a real system. It was great because a lot of the requirements discovery work happened organically before we would even hear about the process/Excel.

Admittedly, we were a small team and could only take on so much work. This led to Excel being used beyond its capabilities which leads to a host of other problems.


> It was great because a lot of the requirements discovery work happened organically before we would even hear about the process/Excel.

How I wish there existed software designed specifically for that organic discovery, instead of having one which was grown from a tool for accountants.

Spreadsheets is the best end-user development model we know, but it is limited by not having a concept of instantiating new objects.


> That is a feature. If you can keep in a screen all the data you need, you reduce the cognitive overhead of having to switch back and forth between tabs.

That's true, but it doesn't necessitate different tables living in the same row/column grid.


You can also see two different tabs at the same time side by side, no need to switch back and forth.

My main issue with them is that they often aren't reproducible as the data transformations aren't logged. As a computational biologist I often deal with spreadsheets that experimentalists give me. I have no idea how the data were transformed from the raw data, and often the experimentalists themselves can't remember either. That's why I infinitely prefer real programming languages for use in data manipulation -- you can look at the code.

PowerQuery is integrated into excel, gives a visual editing platform (not purely code), and is quite powerful in terms of linking up to any datasource, manipulating data, and giving you the output you need. All the steps are listed on the right side, and you can click back to any previous step in the transformation.

It's not super popular, but it does serve as a nice middle ground for people in the space who aren't strong in coding.


> Half of the mess that makes excel hell comes from the fact it's too easy to put two tables of data + some random constants on a single sheet and refer to them by H3

This is certainly true, but that is also where almost all the power comes from, its generic nature. I had a startup that tried to reinvent Excel for 6 months and we kept moving closer and closer to the excel "sack of undifferentiated data" paradigm...


That's what Excel Tables are for. You get a table within your spreadsheet and instead of using cell references like =Sum(C2:C7), you can use structured reference like =SUM(DeptSales[Sales Amount])

I have quite actively tried to avoid excel lately (which means also avoiding to learn new tricks...), and I am not that familiar with tables. Do the tables:

1. Enforce same data type per column. I.e., if you have a number in C4, you can't enter text into C5?

2. Enforce that if there is a formula, it is applied identically to each row?

(By enforcing I mean that there is _no_ way around that short of copying the data to new sheet)

Give me those two things (preferably within sheet context instead of table context) and a decent version control and I can reconsider that I do anything but disposable ad-hoc in excel again.


Unfortunately the answer is "no" to both #1 and #2.

It does do something close to those. Column data types are automatically carried over to the next row when inserting or appending rows to an existing table. But you're free to override it if you want for a particular cell

Same with formulas. When you enter a formula in one cell, Excel will automatically copy across the whole column. But you're able to undo that auto copy if you really just want the formula in that one cell. It'll also flag cells that have inconsistent (compared to rest of table) formulas.

So, no, it doesn't enforce. But it does encourage.


The warning you receive when a formula is not consistent is a nice feature, and an advanced user will quickly learn to be very sensitive to those warnings, will spot them quickly, and avoid causing them when possible.

To strictly enforce that the formula must be the same in each cell of the column would not be very excel-like, I can't see it being done.


> advanced user

Yep, the problem is that I have this conspiracy theory that Microsoft has designed Excel to be the ultimate booster of Dunning-Krueger effect so that most people think they are advanced users [1] while they actually have no clue what they are doing. All the while giving no protection whatsoever against those Dunning-Krueger cases.

[1] I am very much afraid I belong to this group.


1. Not enforced as far as I can tell. But I believe the formatting can be set per-column only, not per-cell inside the table.

2. It's a default formula that gets repeated, but you may be able to override.


Totally. I know about it. The problem is that approximately nobody using excel does. (Yes, finance, data processing, analysts etc. do. Compared to the numbers with excel installed - still ~nobody.)

We (https://sheetjs.com/) almost exclusively deal with those types of complex applications built in spreadsheets

> they could be improved a lot with minimal changes

They can be improved for very specific use cases at the expense of others.

> Making them more database-like and making table data first-class (at least you can make named tables in excel on windows) could be used to push people a bit more towards organised data, without changing how anything works.

There are "database functions" like DAVERAGE, PivotTables and other systems for dealing with more structured data. Getting users to use them is a challenge outside of the scope of the tool. "You can lead a horse to water but you can't make it drink"

> Half of the mess that makes excel hell comes from the fact it's too easy to put two tables of data + some random constants on a single sheet and refer to them by H3.

Half of the reason Excel is so popular is the loose structure and the power that it enables.

> Now it's hard to add more data, hard to move anything, and hard to create space which expands to the next row with existing formulas.

Most of the hairy workbooks start as a solution to a problem at hand and eventually accumulate cruft after people try expanding it, not too dissimilar to other forms of software development.

> Airtable (and Access) implements this idea, but unfortunately sacrifices the generic, free-form spreadsheet along the way.

You either enforce the constraints and weaken the platform, or you give users the power to do what they want.


> You either enforce the constraints and weaken the platform, or you give users the power to do what they want.

I disagree with that. All the pieces are already there. There's nothing to weaken by education. But nothing tells the new users about them. They see the main screen and rarely ever know about multiple sheets. I've seen people using excel at work for years without knowing that. You don't have to take anything away from them, just go: hey, did you know there's a better way?

I know it's possible because I introduced a few people to named ranges, sheets, and data tables. All were happy to apply them later. (On their own)


You're both right. From experience, some people are happy to learn new things. Others, as sheetsjs pointed out, are not interested.

Clippy was a "Microsoft Agent" ActiveX component, that (or rather, who) you could script!

https://en.wikipedia.org/wiki/Microsoft_Agent

https://docs.microsoft.com/en-us/windows/win32/lwef/programm...

Clippy, Genie, Plany, Peedy the Parrot, Merlin the Wizard, Milton the Bear, Oscar the Cat, Max the Search Doggie, and all of their other happy friends were cheerfully retired in 2009, and are now merrily frolicking on a nice farm in the country side with a very loving family. Or so we are told.

http://bellcraft.com/mash/chars.aspx


My endgame is build a Excel+Access alike dev tool. I'm working in a relational language as the foundation (http://tablam.org) because Excel have another problem: The excel UI paradigm (reactive) is not the VBA macros one (almost OO). So make new functions not follow the way of excel.

I think too a spreadsheet is an interface for a database/source and must not have issues handling gigabytes of data. So I think internally it must a sqlite db for example and put on top the control. Making it virtual and reactive and is done.


> at least you can make named tables in excel on windows

I built a spreadsheet that loaded external data sources (filtered log files from a production system) into Excel tables. I then combined two log files into another table, and created a pivot table from this, which I then filtered (date ranges etc) and analysed.

That worked great at the start - until I came to update the external data and found the pivot table didn't update. Then when I force an update it lost my groupings, filtering etc.

Any idea where I went wrong?


I suggest you use Power Query to load external data. If you do, pivot tables and tables connected to those queries should all update automatically on refresh.

Power Query is really the best feature in Excel to me, eventually users will stop copy pasting from Access to Excel and I'll be happy.


It had its problems, but Apple's Numbers product got this right I feel - it was a kind of blend between desktop publishing and spreadsheets, where you could scatter arbitrary tables of data around a page and then have them refer to each other. I suspect a lot of people who are throwing multiple tables of data in the same sheet are doing so because they want both of them to be visible at once.

In my opinion - "No Code" tools will fix this gap to build real world apps yet being as simple as excel.

While airtable is just a DB as a service, a real no code tool would have ideally all the basic building blocks:

a. DB as a service (like Airtable) b. BPM capability -> to add workflows to the data c. Ui Building to abstract data d. API integration (may not advanced learning for power users)


Excel is the closest we have come to no code tools and its been around for years.

Ms Access actually has a fairly decent way to build apps without really touching code, but it requires understanding relational databases enough to use it properly. Most (non technical) people end up using excel as a database instead.


I haven't looked at Access lately, but when I was forced to use it extensively, there were limits tonhownfar you could push input validation without writing VBA. So I suspect that most Access databases have severe issues with data consistency. But at least the data is structured.

Airtable has blocks for c) and exposes an API for d)

Not sure what you specifically want for b)


> the fact it's too easy to

I'm not sure you fully appreciate why Excel is so successful.


No, I fully understand that this is great for many users and that it cannot be made harder in any way to make whatever mess you want on a single sheet - or excel would die. I only want it easier to not do that. (There's nothing guiding people to better design right now)

That's by design. People don't want to be "guided to better design," they want to get their math done and over with, leaving some trace (the Excel sheet) for the next time they or somebody else needs to revise that math.

> I only want it easier to not do that.

I'd rather want it to stay easy, but offer better refactor tools to clean up the model after the fact.

My main complaint against spreadsheets is that, to get a robust and clean model, you must plan from it from the start, negating the main value of using a spreadsheet to begin with. When moving data and formulas around, it's too easy to break the formulas and not even notice.


named ranges and DGET "sql-like" queries are in google sheets as well. (or perhaps what you meant is that few people know about these things, and don't use them, so you end up with unreadable messy code...)

Yeah, I meant that they're not right in your face and ready to be used. I know it's hard to force people to use new methods, but if anyone could do it, MS / Excel could.

It's all possible though, if you know where to look (I think excel even has table sheets now?)


I see what you're saying, but this is actually a deliberate usability principle: make the basic features easy to access (visible without clicking into a menu), and put advanced features inside menus. Then basic users who don't need more advanced features can do what they need without getting lost in a super complex UI, and power users can still get to the advanced stuff.

If people want to do advanced stuff with software, they really do need to invest the time to learn where the advanced features of the software are. I think this is reasonable.


There's nothing WRONG with the spreadsheet model, and never has been. It's immensely powerful and useful, and the growth in capability over the life of a product like Excel is remarkable and enviable. Excel is perhaps Microsoft's greatest product.

The problem with it -- and there IS a problem -- is really a problem of applicability. Excel, like Lotus before it, is the first place many people encounter the ability to create their own logical conditions.

For a huge subset of these people, it's also the LAST such tool they learn. And now they have a hammer, so everything is a nail. I don't just mean finance people who live and die by spreadsheets; I've met engineers who would solve problems in Excel macros that would've been better attacked in perl or Python.

The other "problem" is the degree to which a horrifying number of organizations end up depending on very, very complex spreadsheets full of arcane and undocumented formulas and macros, and for which change control is "save-as."

But, again, none of this is a problem with the product itself, or with the idea of spreadsheets.


Yeah i agree with you. Excel is very powerful to some degree but like all things it can be abused.

somewhere in a cube farm i can hear the shouts...

"Hey Karen did you get the latest budget sheet?"

"Is it budget_final_v2_2019.xlsx?"

"Damnit Karen that is from last week dear god dont tell me you sent that to corporate!! We are on budget_final_FINAL_v4.xlsx"


I've been showing a team of economists working on some economic modeling how to use git and R. They were amazed that they could keep working on a file without creating backup copies, or worrying about team mates overwriting their work. It's a work in progress, but they are eager to learn and understand the value there.

We see this A LOT in our implementations.

We sell & implement a project management financial metrics tool (supporting earned value analyss; google if curious). A key input is always the actual costs of work done, which has to come from the financial system of record.

A horrifying amount of the time, the actual path is through goofy undocumented Excel sheets, because nobody knows how to extract it natively from $FinSys.


I think there is a tipping point in complexity where Excel needs to be replaced by something else. In the IT world I've seen businesses that have essentially created bespoke ERP applications for themselves by evolving a web of interlinked Excel files over years.

A food distributor I worked for was handling their upstream vendors, warehouse inventories, truck inventories, customer list, order history, deliveries, invoicing, and more this way. These businesses suffer because the "Excel ERP" is constantly breaking or losing data, and no one knows how it really works anymore.

The food distributor eventually migrated to an actual ERP made for their industry, and the amount of effort and stress saved at all levels of the company was massive.


I worked with excel for some years early in my career. I did not know how to program back then beyond if else and for loops.

We used it for budget estimation at multi million / up to 100 m. I have seen other uses in Corp world in many departments.

IMHO I think it’s good for specific use cases. But it is routinely abused beyond that.

The key shortcomings for excel uses by non-programmers for key business applications:

- How do you test that your calculation is correct after update. The same argument holds for business applications without tests?

- how do you maintain the knowledge of the inner workings without having to follow the arrows around cells. Probably there are good practices but how likely are you to ensure them in your product / team.

- the data / computation is spread in user space. In case your app is useful many people would like to use it. How do you manage the updates and bug fixes beyond it’s the user’s Responsibility

- how do you avoid the black box effect: mission critical software that no one knows how to touch inside without a full rewrite?

- how do you convince stakeholders that they need to migrate to an adapted solution while they have a working one now and often are oblivious to the hidden costs of users copy pasting data / reformatting for hours sometimes to fit an existing tool that is no longer adapted.

The data frame structure in R or python solves a large portion of these issues. Yes you need training. It’s the same for excel if you want to avoid in inferno machine case you need the same concepts.

Why not train for programming with data frames and have the option to gradually extend to web app without a full it project that needs 3 levels of validation.


Data frames are going to be huge. This year I started using them for a personal project -- python, jupyter notebooks, and pandas -- and was astonished at how easy it was to get running, and how powerful.

I feel like lots of number people, once exposed to it, will realize that there's a place to go after Excel that's worthwhile.


The thing that's missing from "Google Docs" is a decent collaborative outliner called "Google Trees", that does to "NLS" and "Frontier" what "Google Sheets" did to "VisiCalc" and "Excel".

And I don't mean "Google Wave", I mean a truly collaborative extensible visually programmable spreadsheet-like outliner with expressions, constraints, absolute and relative xpath-like addressing, and scripting like Google Sheets, but with a tree instead of a grid. That eats drinks scripts and shits JSON and XML or any other structured data.

Of course you should be able to link and embed outlines in spreadsheets, and spreadsheets in outlines, but "Google Maps" should also be invited to the party (along with its plus-one, "Google Mind Maps").

It should be like the collaborative outliner Douglass Englebart envisioned and implemented in his epic demo of NLS:

https://www.youtube.com/watch?v=yJDv-zdhzMY&t=8m49s

Engelbart also showed how to embed lists and outlines in maps:

https://www.youtube.com/watch?v=yJDv-zdhzMY&t=15m39s

Dave Winer, the inventor of RSS and founder of UserLand Software, originally developed a wonderful outliner on the Mac originally called "ThinkTank" and then "MORE", which later evolved into the "Frontier" programming language, and ultimately the "Radio Free Userland" desktop blogging and RSS syndication tool.

https://en.wikipedia.org/wiki/Dave_Winer

https://en.wikipedia.org/wiki/UserLand_Software

More was great because it had a well designed user interface and feature set with fluid "fahrvergnügen" that made it really easy to use with the keyboard as well as the mouse. It could also render your outlines as all kinds of nicely formatted and stylized charts and presentations. And it had a lot of powerful features you usually don't see in today's generic outliners.

https://en.wikipedia.org/wiki/MORE_(application)

>MORE is an outline processor application that was created for the Macintosh in 1986 by software developer Dave Winer and that was not ported to any other platforms. An earlier outliner, ThinkTank, was developed by Winer, his brother Peter, and Doug Baron. The outlines could be formatted with different layouts, colors, and shapes. Outline "nodes" could include pictures and graphics.

>Functions in these outliners included:

>Appending notes, comments, rough drafts of sentences and paragraphs under some topics

>Assembling various low-level topics and creating a new topic to group them under

>Deleting duplicate topics

>Demoting a topic to become a subtopic under some other topic

>Disassembling a grouping that does not work, parceling its subtopics out among various other topics

>Dividing one topic into its component subtopics

>Dragging to rearrange the order of topics

>Making a hierarchical list of topics

>Merging related topics

>Promoting a subtopic to the level of a topic

After the success of MORE, he went on to develop a scripting language whose syntax (for both code and data) was an outline. Kind of like Lisp with open/close triangles instead of parens! It had one of the most comprehensive implementation of Apple Events client and server support of any Mac application, and was really useful for automating other Mac apps, earlier and in many ways better than AppleScript.

https://en.wikipedia.org/wiki/UserLand_Software#Frontier

http://frontier.userland.com/

Then XML came along, and he integrated support for XML into the outliner and programming language, and used Frontier to build "Aretha", "Manila", and "Radio Userland".

http://manila.userland.com/

http://radio.userland.com/

He used Frontier to build a fully programmable blogging and podcasting platform, with a dynamic HTTP server, a static HTML generator, structured XML editing, RSS publication and syndication, XML-RPC client and server, OPML import and export, and much more.

He basically invented and pioneered outliners, RSS, OPML, XML-RPC, blogging and podcasting along the way.

>UserLand's first product release of April 1989 was UserLand IPC, a developer tool for interprocess communication that was intended to evolve into a cross-platform RPC tool. In January 1992 UserLand released version 1.0 of Frontier, a scripting environment for the Macintosh which included an object database and a scripting language named UserTalk. At the time of its original release, Frontier was the only system-level scripting environment for the Macintosh, but Apple was working on its own scripting language, AppleScript, and started bundling it with the MacOS 7 system software. As a consequence, most Macintosh scripting work came to be done in the less powerful, but free, scripting language provided by Apple.

>UserLand responded to Applescript by re-positioning Frontier as a Web development environment, distributing the software free of charge with the "Aretha" release of May 1995. In late 1996, Frontier 4.1 had become "an integrated development environment that lends itself to the creation and maintenance of Web sites and management of Web pages sans much busywork," and by the time Frontier 4.2 was released in January 1997, the software was firmly established in the realms of website management and CGI scripting, allowing users to "taste the power of large-scale database publishing with free software."

https://en.wikipedia.org/wiki/RSS

https://en.wikipedia.org/wiki/OPML

https://en.wikipedia.org/wiki/XML-RPC


The sheer number of everyday business use cases that can be solved quickly and efficiently with spreadsheets is pretty astounding. Of course, there comes a point when heavily-used spreadsheets become intricate behemoths on which core business functions are run, and then things can get bad in a hurry. The challenge many programmers face is thinking that every business problem is best solved with code, when in reality the majority are solved faster, more efficiently, and more maintainably with a spreadsheet. Then the rest are worth developing "real" solutions for.

one of my first 'real' programming tasks involved replacing excel sheet with simple application.

Bcause excel sheet was doing a lookup in HUGE(i am talking about few million rows of raw data) dataset - basically a fulltext search in a database with unstructured address data.

The idea was to fuzzy join two datasets, and to make manual joining a bit easier with search functionality.

It worked fine on dev's machine, but on user's toasters it took an hour to load.


> The challenge many programmers face is thinking that every business problem is best solved with code, when in reality the majority are solved faster, more efficiently, and more maintainably with a spreadsheet.

It's not like you can't do both. Excel supported VBA macros for a long time. Now it supports JavaScript.


It still supports VBA. I've seen multiple manufacturing facilities which are absolutely reliant on VBA macros to be able to produce and ship their products.

For the last few years I've believed Excel is the most important piece of software written, no hyperbole. The reason is that it's so damned accessible. Software is eating the world, and letting people who don't write "true" code still kinda write code opens up software's productivity and automation benefits for a bunch of use cases and an insane amount of people.

For programmers, the analogy is python. It's almost never the best tool for any specific thing, but it's a really solid tool for a ton of things. It's easy enough to learn and has enough depth to keep learning. It's easy to prototype a quick answer, and can not-to-painfully grow to a large complex system. I think that's why you get people from all backgrounds using python. It enables people who don't know much about software to start writing software.

Excel's the same way, but with an even lower barrier (and probably lower ceiling). It enables people who don't know anything about software to get many of software's benefits. That's huge.

Just like python can great for a biologist whose focus is biology not software, Excel is great for the accountant whose focus is accounting, not software. It makes them a programmer, or at least close enough for many many purposes.


My biggest problem with Excel as a "solution" is that document control is kind of an afterthought. You've got X number of people in your organisation and they each have these adhoc collections of Excel spreadsheets. Usually the devil is in the details and you'll have one person who has calculated gross revenue one way and another person who has done it another way. The two approaches are never reconciled, but their output will invariably be compared as if they are discussing apples and apples. I've seen so much analysis done that was really just garbage because nobody understands where the data came from and what it means.

It's not really fair to blame Excel -- it's a calculation tool. However your solution space needs to address this problem and I very, very rarely see it happen.


It’s absolutely fair to blame excel. It obfuscates the code in favor of displaying the calculated values - it’s designed to hide the code - which, along with a number of other design decisions, makes it really hard to build an error free spreadsheet beyond trivial sized cases.

Further, its design makes reproducible data practices difficult - in contrast to R or Python which do a lot to separate code from data - and let you re-run the same code on new/updated data. Python and R (and other non-spreadsheet tools) encourage practices that make keeping raw data pristine with work being done on copies of the data. In contrast, it’s really easy to make mistakes with Excel in ways you’ll never catch. Sorting within filtered columns is a good example. Did you add another column after creating the auto filter? Surprise, data in that column won’t sort with all the other data when you use re-sort one of the original columns. Just like that, poof, silent data corruption with no easy way of reverting if the error isn’t caught quickly.


What I hate about excel is if you now need to do a very similar task, again, it can involve a lot of messing about with the GUI.

Versus changing a few parameters in the Python script.


Ironically org mode in Emacs would be better in that respect :-)

I almost cracked the org-mode code. Almost. Used it for some sheets but now I only use it for my journalling. Hopefully it will keep the embers hot until I decide to pick it up for real again.

I feel the same way about Access, too. It doesn't matter whether they're "real programming", it matters that they let real people solve real world problems for themselves, who otherwise would have to pay real programmers real money to develop real applications.

I'm in that category of people the regularly have to unwind the web of Excel or deconstruct the Access databases that grow to rival SAP in terms of intricacy...

... I actually agree with the overall premise that these tools do let non-tech types do real work and I think department power user types should have these things.

But we must also recognize that these tools are a bit like a black hole, too. Not unlike many applications created by professional developers, these users will tend to continually add features and bells and whistles until they pass the event horizon that exists between good judgement and bending these tools to be what they aren't (the most common is turning Excel into a database). On one side of that horizon you can pull back and reasonably find better technology to implement sophisticated capabilities and on the other you start to reduce efficiency as results as the errors and issues with the misuse of these systems overwhelming the benefits. Worst part is... you never really know when you've passed through that horizon... until you're spaghettified that is...

So in-house technologists need to be aware of these things and give good support, and then also provide the technical judgement on when these approaches start to break.


This is a good take.

I once build a Rails app that translated a giant Google Sheets document into a web application.

What I found really fascinating was that:

95% of my development time was spent building CRUD, access control, UI that was already more-or-less provided for free in Sheets

5% of my development time was spent writing unit tests and services that implemented the actual logic and formulae in the spreadsheet. Even though this was the tricky, business critical "thinking carefully" part, it was also the least time consuming.

It just goes to show, even compared with Rails, applications like Excel and Sheets give you a LOT for free, out of the box, accessible to everyone.


Has Access solved the corruption issue that plagued it for tens of years? While you can connect it to a "real" database engine trivially, a common use-case is via a single MDB file that could become damaged.

You could damage the MDB a number of ways including:

- Leave Access running before shutdown (or power loss)

- Intermittent network outage to the file server

- Multiple users trying to access the same MDB (or anti-virus scans/locks, even on another user's machine)

- JET inconsistent versions / Access inconsistent versions / Patch Levels

The biggest headache though with Microsoft Access was never the product itself. It was that the product didn't really have a natural evolution. You'd start with a single MDB/single employee, but one day you'd need two employees or more (and security, and more tables, and this and that), and while you could migrate the MDB into a real database engine ($$$) and use Access as the front end, the record locking was funky and scaled poorly (plus control was limited).

The whole product felt a bit like a mouse-trap. A nice shiny piece of cheese, that genuinely tasted good/worked well, but as soon as you tried to move it SNAP. I don't dislike Access, but it was always painful when a business outgrew it (whereas Fortune 500 companies live and die on Excel).


I used to build Access applications and never had data corruption issues while supporting 20 concurrent users.

The key is separating the database and front end in different files.

While I have many gripes with Access, it is a highly underrated tool.

Using a real database engine requires infrastructure and lots of red tape to set up and it might not even be approved after waiting for months.

With Access, you just put a file in a shared directory and you are done. No need to set up a web server for a front end either.


Access and FileMaker made sense before the advent of open source, specifically Rails. Django, Postgresql, and MySQL. Both products make little sense now.

> Rails. Django, Postgresql, and MySQL

These are much harder for non-programmers to use than Access. With Access, almost completely non-technical people can set up their own database and make the queries they need to answer their own questions. With Postgres accessed from a general-purpose programming language, non-technical people need to hire someone to help them with every basic task.

As far as I can tell Access has nothing to do with making websites, so it’s unclear what Django or Rails has to do with anything.


> As far as I can tell Access has nothing to do with making websites, so it’s unclear what Django or Rails has to do with anything.

You can make apps with Django and Rails. They're not just "websites". Web apps are also still useful even when they're not public facing.

> These are much harder for non-programmers to use than Access. With Access, almost completely non-technical people can set up their own database and make the queries they need to answer their own questions.

That's the thing. Access and FileMaker aren't typically run by non-technical people. Yes, they aren't initially programmers, but they're usually technical people. FileMaker & Access users, whether they realize it or not, become programmers. I feel that you're confusing Access & FileMaker users with users of Excel.

> With Postgres accessed from a general-purpose programming language, non-technical people need to hire someone to help them with every basic task.

A relational database is not a big leap from either Access or FileMaker.

I know all of this because I used to work closely with a team of these people, and at some points I've even helped maintain their code.


> I feel that you're confusing Access & FileMaker users with users of Excel.

I know several people who use or used Access / Filemaker who were non-technical with previous experience mostly consisting of Word / light Excel use.

For example, my anthropologist parents used Access for analyzing their manually gathered census data for a small rural village.

The volunteer docents at a local museum in my hometown used Filemaker for managing the museum collection.

> whether they realize it or not, become programmers

Using the graphical tool in Access to construct queries does not require becoming a proficient programmer.

> A relational database is not a big leap from either Access or FileMaker

These are relational databases. They just have user interface affordances intended for non-technical users. Postgres does not.


> For example, my anthropologist parents used Access for analyzing their manually gathered census data for a small rural village.

That's surprising. Most of the time, this is what Excel & Wordpress are used for by non-techies, since both FileMaker and Access feel daunting to most of them. Maybe this is exclusive to museums? This is anecdotal, but I've worked in a lot of different industries, and in all of them everyone maintaining FileMaker or Access were also knowledgeable enough to code in those platforms ie. they were techies before they started using FM or Access


Yeah, I hear this a lot, but that really does not make a lot of sense. I hear the same for wysiwyg builders, CMS and 'codeless/lowcode' platforms, but they are all alive and kicking despite programmers thinking that it is much more easier to whip up something with Rails/Django.

Those platforms are very far removed from 'the normal user'; you cannot just install ONE (!) installer file and then open a visual drag & drop editor and whip up what you want. In the worst case you have to install several things and some of this need to be configured; you already lost 99% of the business people there.

You forget 'little things' like; they have to know html/css/js as well, or at least know how to add and work with Bootstrap. For business use you need Bootstrap controls; you need to know how to add and integrate them (people are not going to understand what it means to just 'bind to the onclick event') while in the before mentioned packages that is jut drag & drop. Installing new controls is too.

Then, if they got as far that they run into having to securely deploy it somewhere. It is all just too hard.

The web made them less used, but not because of Rails/Django, but because they are, legacy wise, less well fit for the web, but there is still a huge market for them and there are more and more appearing that solve some of these issues for web and definitely quite a substantial group will choose them over hiring 'real coders' to make things quickly. There are plenty of web shops (with big clients) doing their db work with Filemaker (and Filemaker does support web).


Citizen developers in an organization are not a bad thing. But it is important to properly support them, with the right tools and training, but especially when the harder problems need to be addressed by specialists.

I second the "right training" part. I think any organization working with any kind of data should invest in making all their office employees Excel power users.

Wrt. right tools, I think Excel is the right tool. It may lack safety affordances, but it does its job cheaply and efficiently (and you already paid for it since you probably need to open Word documents anyway).

As for "harder problems addressed by specialists" part, it happens, but I'd argue not that often. Are they using Excel to half-ass CFD sims of their rocket engine nozzles? Sure, they probably could use a proper software for that. Are they doing their own specific munging of customer or process data? Their workflow is probably too specific and too fluid for it to be shoehorned into a "serious" application without a tremendous loss of productivity.


Yeah, Access got me started learning SQL. And Jet is actually not a bad database engine.

It's interesting how often people's response to the manifest problems caused by people using Excel in unfortunate ways is to either suggest tools with much steeper learning curves or to suggest additional Excel features. 99% of these problems can be solved by proper training and standards appropriate to the problem and context at hand.

For instance: Always separate and label inputs, calculations, and outputs.

Document where source data has come from, where one cell has had an ad-hoc adjustment made, what formulas do.

Use some type of version control and don't keep loads of concurrent versions around floating on email and local hard drives.

If the spreadsheet loads data from external sources, try and make that load automatic and live to prevent staleness.

Consistent formatting rules.

If data is tabular, put it in an Excel table. If data is tabular and we are always doing the same queries on it, and it is large then we move it to a database but that rarely happens.

Make it clear who owns spreadsheets and is responsible for keeping either/or data & functions/formulas to work.

Do all of this first, only then start thinking of replacing Excel with something else.


Hardest part of everything I do in my life is making other people follow rules, or any kind of guide lines. If you succeed with making only 1 rule of those you described consistently followed by organization of 10 people for 6 months ... please send me how you did that, then I am going to be a millionaire.

I am team lead of small dev team, if you push people by force to follow rules they will start maliciously follow them up to the point where no work is done and your spreadsheets are perfect.


Don't try to fix the user.

If we lived in a world where your suggestions were followed Excel would indeed be an OK tool. Here in the real world however Excel has just the right combination of power and usability to shoot off every left foot in a five cube radius, and frequently does.


Note that the reason why Excel is popular is precisely because people "don't try to fix the user", but instead develop software solutions tailored to address (author's misunderstanding of) a specific use case users have. The problem is, the specifics of that use case changes much faster than the dedicated software can be updated.

Processes that are thoroughly understood are easy to automate. Those that aren't, require competent people with flexible tools. Excel is a flexible tool. Competence can be gained through training.


Sure, in general, don't try to fix the user.

In our context, we use Excel as a team in a work context where our rules make our specific tasks easier. It is much easier for people joining who are already proficient with Excel to follow some formatting guidelines than to invent a custom tool and learn to use that.


People here still don’t get it. It’s not that spreadsheet are great for non programmers, they are great for programmers too.

The combination of UI and live calculation engine is unique.

How long does it take to make a pivot table with conditional formatting in Python?

How long does it take to have input validation in JavaScript?

Sure maybe a couple hours. But it takes literally 2 seconds in Excel.

Couple Excel with a fonctional programming addin and you’ll beat a Python programmer on 90% of data oriented tasks you may want to do.


Hmm, I'm not sure about that! Formatting in Python, sure - because it's not designed for outputting colours and type face. But the popular Python package, Pandas, makes Excel-like tasks a breeze. I'll often help colleagues who can't code to do something with their data that is nothing more than a single "groupby" or "apply" expression in Python+Pandas.

Implementation time vs maintainability

I will spend a couple of hours to do a good job on something that will inevitably be in use and under continued development for many years.


On prototypes? Yes.

On code meant to be shipped and/or maintained? Not even close!


Why would this take two hours to do in python? Excel is a nightmare for production tasks.

I always like these threads where programmers talk about tools like Excel and say they’re fine for small stuff but grow out of hand, as if 90% of codebases built by professional programmers aren’t exactly the same.

But professional programmers (emphasis on professional) are well aware of how codebases can grow out of hand and employ entire categories of theories, strategies and tools to keep their software maintainable, even when under delivery/time pressure. Excel doesn't really have much in this category of theories and tools.

You can actually unit test your Spreadsheets surprisingly easily with Python: https://www.slideshare.net/xlwings/automated-testing-of-exce...

You really should look at some of the links I provide above to Jocelyn Ireson-Paine's work.

The usual out of hand Excel file has a completely different size from the usual out of hand codebase.

This video by Joel Spolsky taught me a lot of new Excel tricks/QoL improvements:

https://www.youtube.com/watch?v=0nbkaYsR94c

It is amazing how rich the ecosystem is. I didn't know about pivot tables. Excel has always impressed me, and continues to do so the more I learn.


The other classic example is the disgraced Martin schkreli:

https://youtu.be/jFSf5YhYQbw


It's unfortunately not available in the Mac client, but Excel also has an amazingly powerful and useful tool called PowerPivot/Get & Transform[1].

You can connect to external data sources such as CSV files, Excel files, any database with an ODBC connector, APIs, all kinds of neat things. Ingest that data into your Excel file, create an enforce constraints and relationships within your data model, gives you incredibly robust data munging and analysis functionality[2], and then expose all of that as a PivotTable. And the functionality itself bypasses the limitations of Excel such as max data size or computationally inefficient formula implementations, as it uses a separate data storage and computational engine that's a highly compressed columnar data store.

The PowerPivot work is also mostly transferable to PowerBI and Analysis Services. Taken together, you've got all the tools to apply progressive enhancements for end users. Let them create their Excel-based stuff. When it starts to become more mission critical, non-performant, or error-prone, provide them with support to clean it up in the ways that video from Joel Spolsky mentions. When it hits growing pains from that, refactor it further to leverage the built-in data modeling capabilities to enforce some integrity, automation, and potential data volume scaling. And when you hit growing pains with that, or the underly process/usage finally matures to a state of stability, or you need to address security/access/audit-ability concerns, transfer that data model and everything to either PowerBI or an Analysis Server deployment and migrate the management to IT.

I don't see it in practice very often, but it's an incredibly effective and frictionless way to both enable your business users to innovate their work processes via the tools they know, while also providing a non-disruptive way to mitigate your business becoming reliant on apocryphal spreadsheets being passed around to support critical business functions. And by design alleviates many of the causes of "automation" projects failing.

[1] https://support.office.com/en-us/article/get-transform-and-p...

[2] It doesn't rely on the same functions exposed for Excel formulas, but rather a language called M for ETL-like needs and DAX for calculations. https://support.office.com/en-us/article/how-power-query-and...

[3] https://en.wikipedia.org/wiki/Power_Pivot#Product_history_an...


I used to work at a bank that built a huge integrated application development and deployment stack on Python. One aspect of it was a custom library implementing a Directed Acyclic Graph, basically a way to ink up properties so that when the results of one property changed, any dependent properties on the DAG would automatically recalculate. This was explicitly implemented so that we could directly translate the logic from Excel sheets into Python code.

A lot of our projects were built to accept an excel sheet as input, then generate another Excel sheet as the output. The idea was the logic and data manipulation and transformation should be in source control, but the Quants would still have Excel available to build graphs, pivot tables and do statistical analysis on the results.

The DAG also turned out to be pretty handy for implementing web applications and all sorts of other apps though.


We also found using Excel as an output to be a powerful design pattern. Excel is a great language for explaining the flow of calculations. We used to have WPF screens where a portfolio manager / trader / analyst could right-click on a number and select "show details" and he/she would get an Excel workbook where that same number would be selected and they could see the derivation by following the cell's precedents. It was not always feasible to show a complete derivation, but we'd try to show a useful amount of elaboration. This was a heavily used and loved feature, often leading to the users making edits or additions to these detail sheets and giving it back to us as a spec. Building the 'details' Excel books programmatically was a lot of work, but part of the payoff was that we also used them as a nightly check on our primary system, i.e., we'd automatically run 'show details' on everything and make sure the 'detail' result matches the primary system result. That comparison caught many bugs. Kent Beck: "More important than tests vs types is the principle of double checking. If you say something twice in independently derived ways, you're more likely to be correct than if you just say it once. Tests are a form of double checking. So are types."

Sounds like Athena at JP, or Quartz at BAML. Those off graph errors drove me nuts.

I love spreadsheets. When I'm working out a method for analyzing some dataset, I just hack it out in Calc. Or if it gets too big, Excel. Once I've blocked it out, I move to MySQL. But I can do any database operation in Excel, or even Calc. Maybe not efficiently, but well enough.

But I've also learned that basic Linux tools (grep, sed, tr, awk, sort, uniq, etc) are far more efficient for cleaning and preparing data for spreadsheets or databases.

And then I use spreadsheets for final analytic steps, stuff that SQL doesn't do efficiently, and for charting. I could learn Python and R, I suppose, but SQL and Calc/Excel have always been enough. And Gumeric, sometimes, because it can do some amazing charts that the others don't.


What Gnumeric charts are you referring to? The ones I could find [1] seem pretty standard.

[1] https://help.gnome.org/users/gnumeric/stable/sect-graphs-ove...


I was thinking of min-max. You can select an area of cells as a series, and so display numerous min-max lines. Also matrix and contour. Those aren't in Calc 5. Maybe they're in Excel now. I have an old Windows 7 version.

Excel is powerful. It does so many things really well. Sharing data between sheets in real-time isn’t one of them though (maybe I just haven’t found the right tool, so we went ahead and built one).

I worked at a large fintech company a few years ago. During my interview Excel popped up as a topic, the interviewer quipped “Excel is terrible!”, he was referencing the heavy reliance of his customers on spreadsheets rather than the “better” functionality they offered via their platform. A few years before I would have agreed, but Excel really is amazing, it allows almost anyone to just-get-things-done.

There are plenty of cases where Excel projects had grown to the point where specialised software would be better for the business and the users... and if you’re in an industry with heavy and advanced usage of excel (like the fintech space), it’s a great place to mine solid ideas for a startup. Just don’t try to recreate everything Excel does! Focus on areas it doesn’t excel in.


> Sharing data between sheets in real-time

Do you mean between sheets in two Excel files which are open in different computers? Or did you encounter performance issues in a single computer?


Two (or more) excel sheets running on different computers. Afaik it’s impossible to link cells in real-time between different PCs natively.

Having worked a decent amount in Excel, here are my key issues:

- VBA Macros and Sheets are two distinct paradigms. Within an Excel file, it is not always clear how the two interact and requires meaningful digging.

- Once a numerical model has been calculated in a sheet, it is difficult to scale it. Yes, it is possible to copy sheets but if you make a change or want to do something 100's+ of times, it's a pain.

- Data integrity is a problem. Opps I pressed the wrong key and I deleted some data. Oh shit, I don't have Git to compare what was changed.

PS article dated 2017


For a long while, my dad's job was as a consultant, and his job was to convert Lotus spreadsheets to Excel spreadsheets (and sometimes small databases). That was it. The spreadsheets he worked on were responsible for moving millions and millions of dollars around.

One of his jobs was for a major movie studio updating their sheets that calculated royalty payments. Every actor that ever worked on a show distributed by that studio relied on the accuracy of that single spreadsheet for their "money mailers".


Funnily enough, excel is a programming language. I don't mean VB Script or macros. I mean with the equations, ranges, and constraints - that's a programming languages there.

A cell is simply a computational variable (as opposed to the notion of variables in lambda calculus). A named range is a data structure (a struct). The rest are term rewriting


Isn’t the most relevant distinction when comparing a spreadsheet to programming is that it’s non-procedural? Writing spreadsheets reminds of writing Makefiles. There’s no start or end. Just a bunch of declarations about relationships. And it’s just...happens.

There's a term that's gaining popularity in the last 15 years: reactive paradigm of programming.

Is what we're talking about any different from declarative programming? That term has been in use for several decades

Reactive programs don't have to be declarative. You can express a reactive program imperatively.

The defining feature of a reactive program is that relationships between inputs and outputs are automatically tracked, and changing an input will automatically update the dependent outputs.

Facebook had a whole experimental language (now abandoned) for writing reactive programs with imperative code: http://skiplang.com/


The portion of comment I was replying to was "no start or end" . This is reactive programming.

Having said that, other than Excel and some cute UI things, I've not seen much utility in reactive paradigms. Hence I am bearish on reactive programming. Someone please correct me.


It’s the world’s most successful reactive programming platform.

Yes, GUI assisted reactive programming.

And that's why when you start to use use VBA or js in Excel, it's a fail.

Excel power users (I am not one of them) know how to solve almost every problem they are presented to, using only Excel. Like SQL with relational data. With orders of magnitude better performance.

Main issue yet is scalability, when the dataset gets too big.


This is exactly why we're building https://glideapps.com!

Hey, cool idea.

At my company we've built a slack bot that sits in front of a google sheet to manage transactions (eg. borrows) in our office library.[1]

I'd love to know whether this could be a Glide app, or whether we'd run into a technical limitation where we can't have users scan a book's ISBN and have that do a lookup in the sheet.

1. https://github.com/thundergolfer/library-management-slack-bo...


We don't have barcode scanning yet — it's a bit tricky to pull off as an iOS PWA.

Looks great! Wishing you all the best!

Few minor notes: - You've featured an app "Tournament of Books" but there are no links there. Both the title and the mock text conversation image can be links to that app - Interspersed fixed width font is jarring


Thank you for the notes — I will update!

(OP concludes ‘programming should be more like Excel’. Glide’s mission is to apply spreadsheet programming to software in general. This is the most relevant comment, downvoters be darned!)

Yes, looks really cool!

Thanks!

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: