It's a brilliant reactive data programming model that makes intuitive sense to non-technical users. They feel empowered to use it to solve problems right now with a computer. They experiment with it, try things, Google how to do more things- just like any programmer does. And they feel capable of doing this because they don't know they're programming.
Within the Amazon warehouse world, I have seen incredible innovation using it. An acquaintance of mine got into development by using it to help save soldiers lives while serving in Afghanistan.
I agree that mission critical data needs to get out of it and into a centralized system, but I still feel Excel is an incredible tool in any business.
Notes still let you see data as rows and columns and had very spreadsheet-like functions, many of which could be transferred directly from Lotus 1-2-3 - but it was a full client-server system which supported decentralized replication, built-in messaging and email, and a solid security model (yes yes I know it was also a crazy confused mess and a UI nightmare but that's a different topic).
Back in the day, organizations had teams going around finding the most useful ad-hoc spreadsheets and converting them into simple Lotus Notes databases. Other organizations would sometimes also give staff training in building simple Notes databases - not a whole lot more difficult than doing a spreadsheet - and then have expert teams come in and polish them up as needed. This made business software bottom-up rather than top-down, which turned out to be a whole lot more useful in a lot of cases.
In many ways, even after I left the Notes world for PHP and Rails and JS, my career in business software has still mostly been about converting spreadsheets into more "proper" applications.
Spreadsheets are the 'blue-green algae' of the software world: the simple base level that's everywhere and surprisingly essential.
Still true for many people in 2017. @patio11 probably said it best:
> Every spreadsheet shared in a business is an angel announcing another SaaS app still needs to be built.
FWIW we ended up in business software after one of our open source libraries to read and write spreadsheets (https://github.com/sheetjs/js-xlsx, demo http://oss.sheetjs.com/) ended up garnering lots of demand from businesses looking to build those "proper" applications
It's a pithy quote, you'll get a very different reaction from a finance or small to med-sized business professional.
Excel is the ultimate maker studio that actually lives up to the promise of "build your own mini-app". I'm not saying it can't be improved on. It's just that the core model of flexibility is so compelling, pushing everyone into a SaaS app is invariably going to take away substantial power from the user.
One computer: once you decide you want to use your smartphone, for all but the most basic sheets there are problems with Excel mobile preview, even with Microsoft's official apps.
One location: Excel's lack of a universal time (showing different times in for users in different timezones) have led to gnarly workarounds. There was an attempt to correct for it in the XLSX format (cell type 'd') but it has its own problems.
One person: Excel has no inherent sync strategy. You are forced to farm out to email (and the inherent filename versioning nonsense) or use a system like SharePoint (which has its own problems you notice when dealing with people in different timezones across the world, like accidental file locking and data loss)
Any solution, SaaS or otherwise, will have flaws. It might solve Excel's weaknesses really well, but will likely lack its strengths.
Excel is still going strong in 2017 to a large degree because its strengths continue outweigh its weaknesses and that people have found reasonable ways, or add-on tools, to work around it's weaknesses.
It will be interesting to see in the coming decade if niche industry solutions or an even more incredible swiss army knife were to come around and dethrone the incumbent.
What I mean is that Excel is essentially a write-only language. It's very quick to prototype something for a semi-technical person, but auditing / code-reviewing an Excel sheet later is essentially impossible. The ubiquitous manual loop unrolling doesn't help readability either.
It also scaled fine to thousands of employees.
Having worked with a number of finance teams, the biggest problem is not when Excel is used as an analytics tool but as a financial database. Excel as a source of truth for financial data is truly terrifying. Even financial audits are still often handled with custom data pulls exchanged over Excel spreadsheets.
Shameless plug: that's why we ended up developing one of those SaaS tools (http://modfin.io). It lets finance teams map their source data to a proper, easily auditable accounting ledger so that they don't need to do their "magic" in Excel.
I am a programmer and I use csv/Excel as the single-entry financial database for a small healthcare business ($500k revenue). The accountant is fine with it. Filtering and pivot tables are a joy in Excel, and I can still run Python scripts whenever I need (mainly to auto-categorize bank transactions). Most other businesses around here use QuickBooks, which has a proper system of accounts, but then I can't play with the data.
Excel as the source of truth for a financial system is particularly scary to me b/c it's so easy for someone/anyone to change entries or miscalculate so it's very difficult for me to get confidence in its completeness and correctness.
Completely agree about analytics and scripting, but I'd utilize the CSV/Excel reports as point-in-time analytics that can be tied back a structured source of truth (or a source of truth I can hold liable, e.g. a bank or credit card statement).
Validation: I'm confident that I can catch duplicated rows and mistyped amounts because Excel formulas verify that the sum of all transactions for each account equals the account balance, and that transfers between accounts add up to $0. Do I need anything more?
Point-in-time doesn't work because I need to keep track of info that doesn't show up in the bank statements. I run a Python script that adds pretty-prints names and adds categories, and I also manually enter check details, category adjustments, business purpose of meals and flight tickets, etc.
I don't use a proper database because to do so I'd have to give up Excel's nice features (or worse, switch back and forth between Excel and the database).
I'm sure that the people who made these billion dollar mistakes were pretty confident too... https://www.cio.com/article/2438188/enterprise-software/eigh...
"The error occurred when the accountant omitted the minus sign on a net capital loss of $1.3 billion" (Fidelity)
"There were honest mistakes made in a spreadsheet used in the implementation of a new accounting standard." (Fannie Mae, $1.36 billion loss)
There's a bunch of lower (million dollar) losses listed in the linked page too.
I think that you need to show that Excel requires more data entry, or riskier data entry, or that Excel can't catch as many errors, etc. For now I'm not convinced.
You can try to institute the same controls over Excel, either directly in code or just implicitly in terms of how it's used, but they tend to break down pretty quickly with scale.
Flip side, my feeling is that, if an analysis was worth doing, it's probably worth doing right. And to do it right, it should probably be done in a way that one can show their work. I like where the R community in particular has gone in dropping the barrier to reproducible analysis to roughly the same place as Excel.
I also wonder about google sheets capabilities. It does work so well for many people working on one spreadsheet, except for undo. The major flaw I've seen is undo, because in google sheets it was global undo, not personal undo.
Since it was a very powerful tool but struggled getting traction, they are using it as the core query tool for power BI (alongside DAX for great dashboarding).
Well done Microsoft.
With new (or prospective) clients that are SMBs, one of the first things I ask is, "so, what are you still using spreadsheets for?"
Really? I am working in company which is not handling super-sensitive data (i.e. not a Bank, for example) but I can assure you that if went around proposing to adopt a SAAS for what Excel is used internally I would not get very far.
No matter how good you are as a SAAS provider, our data stay in-house. End of story.
It's no value judgement if you want to continue using Excel for privacy, compliance, or security reasons. You do you. But SaaS companies get very good at developing privacy, compliance, and security stories, particularly as they move into enterprise. (Below enterprise it's less of a big deal, partially because customers care about them less and partially because small businesses are, as a judgement-free statement of engineering reality, abominably bad at data security.)
(off I go can often being something like: upload the result to the same or a different legacy system, again in some custom format).
Maybe I can pester my own IT dept. to add an extra option to the report so that it spouts off a .csv directly. That's all, the rest still lives in Excel.
Then I am free to play with data as much as I like (take also in account that in some cases you want these in a Excel just to be able to manipulate them better, while the old app works record by record, you can make changes across a thousand records using the Excel interface... and keep also in mind that this is maybe something you need to do once or twice a year, when you renew your catalog prices or whatever).
Now, what is the cost of:
- going out on the market looking for a SAAS that can get my data exactly in the format I use internally, applies the required transformations and send the result back in a format I can use
- assess that the SAAS vendor is indeed trustable for my desired level of security, SOX compliance, etc.
- add one more vendor to my portfolio of vendors/licenses/purchasing orders
So the reason not to go to SAAS is actually a combination of security/external resource dependency/cost/bureacracy.
Where "cost" is mostly inertia+sunk cost fallacy. I am sure that a SAAS may cost less than the hours spent manually doing all the stuff I mentioned above (including mantaining the Excel spreadsheet) but the latter is a "hidden cost" because it something that happens infrequently and is part of the normal chores of whoever is using the Excel sheet itself.
While introducing a new SAAS app will be an IT cost (to identify/approve it, add a recurring subscription etc.)
But we have to give Dan Bricklin and Bob Frankston their dues because Visicalc is the original killer app spreadsheet in the literal sense: people used to say they didn't know if Visicalc sold more Apples or if Apple sold more Visicalc. It truly was the program for which people bought the platform.
Many people seem to say the same thing but the only problem I could ever see (in occasional use) was a non-standard UI which non-technical users had trouble grasping. What made it so awful?
Notes itself has some great functionality. The UI feels a little clunky - but generally you can do what you need to do.
The main problems I encounter is the way our IT department have deployed it. Many functions are not enabled or only partially working. As far as I am aware, we have no Lotus/Domino expert in our country; head office (other side of the world) has quite a few though. Any requests made through our regional helpdesk for fixes/improvements mostly go nowhere.
I like the idea of the replicated databases for email and other business functions. It should mean staff could switch to a different machine and quickly and painlessly get access to all their email and other data. However, there is a considerable amount of convoluted configuration required (the way it is currently configured) that appears to require local admin access that makes it impossible without helpdesk assistance.
I suspect many companies deployed Lotus/Domino is similar semi-functional ways that made it less useful than it could be.
The reason many companies migrated away from Lotus/Domino though was the risk of vendor lock in. My employer is stuck with a vast amount of business process captured in Lotus/Domino. They have been slowly migrating core functionality out to other web or networked database applications over the last 12 years. Even so, I don't see any move away from Lotus Notes for email for many years to come.
In the mid-90's, I thought I was helping by bringing Notes in to the company I worked at, and people seemed to agree. A ticket tracking database we set up seemed to help. Many years later, after I left, I saw a presentation where Notes was cast as the problem, with a website proposed as the solution.
It's common to tell kids (and everyone) that they need to "learn to code." That is what Excel already is! And it's incredibly powerful, especially if you don't have any other coding skills.
But, of course, it's messy. When tens of millions of people can each write their own programs and easily modify them and change them to suit their own needs without needing to recompile or anything, then you get the mess which is Excel.
...and, in fact, that's also what you'd get if you taught everyone how to program Python in high school instead of Excel. You'd get tens of millions of non-interoperable messes of programs (which still get the job done!).
But that's fine. Give humans tools, and they'll find clever ways of using them that you don't intend. And it'll continue to drive people crazy. In some ways, the Excel hate (though obviously not unfounded) seems almost kind of like a bourgeoisie elitism. THIS is what it looks like when you give everyone the ability to code!
I disagree. Python leaves a readable instruction set, while Excel buries the logic in an opaque two dimensional grid of cells (plus or minus atrocious VBA). I think this gives a better ladder for people to improve their skills to the point of not producing unmaintainable garbage.
Which you prefer depends to some extent on which end of the problem you're coming from; if you're starting out with a lot of domain knowledge, a spreadsheet is attractive because you can look at the data, see whether the way it's being stored/processed is basically correct by spot checking it, and then automate. I'll still often reach for a spreadsheet if I don't know off the top of my head what the rules or relationships are and I need the equivalent of a whiteboard with logic.
If you have a top-down perspective and you want to automate a process whose fundamentals you know well, code is more comfortable. And you can go back and expand, modify, or adjust it later. But two things I don't like about code for handling large volumes of data are that I have to construct some sort of viewer to format the output (requiring more coding, or slower bug discovery if I press ahead without), and endless typing I like selecting things from menus and tool palettes because while I don't mind holding an instruction set in my head I don't like doing the same for every library I might want to use, especially remembering all the parameters. I hate hate hate writing glue code and want the computer to do that stuff for me.
Fortunately a lot of these problems seem to be receding thanks to tools like Jupyter and some of the promising Flow-based programming tools that let you easily visualize structure, code, and visualization without getting bogged down in details.
I should add that I prefer hacking to software engineering; that is, I'm only interested in programming as a means to an end of having a customizable tool for myself rather than building a product for other people, so I want spend as little of my time as possible on constructing the tool.
Not everyone structures programs in the same way, or puts functions in easily found locations, or names things usefully, or even approaches the problem in a way that makes sense to the majority of other people. For a concrete example, see this.
> Excel buries the logic in an opaque two dimensional grid of cells
It would be trivial to represent excel grid formulas as a set of predefined functions, or a special data structure that you register functions in, and then it could be in textual form. It wouldn't make it any easier to reason about, but then again control flow isn't always easy to reason about when a program is split into many different objects and functions, until you've internalized a fair amount of the program. Spreadsheets give people a very limited, but structured output and a very limited, but easily reasoned about way or processing. It's different than "normal" programming, but that difference allows for very intuitive use.
> I think this gives a better ladder for people to improve their skills to the point of not producing unmaintainable garbage.
How much work has been put into making spreadsheets easier to share with other people? I'm not sure the same culture has existed for spreadsheets as for programming, where people have needed to share and reason about complex algorithms and data structures. If it had, I'm sure we'd have much more in the way of how to structure your excel well so it can easily be consumed by others, or yourself in the future (if it doesn't already exist, which it may).
Yes, mixing the good old Excel dependency tree together with VBA macros creates truly dreadful and opaque spreadsheets.
That's exactly what happened with Perl when everyone learned to use, modify, and develop CGI scripts in the early days of web-based application development. Dynamically-generated website content worked and evolved into the web we know today, but those scripts were awful and did a lot of harm to Perl's reputation.
These kinds of expressions are what ultimately caused the derisive terming of Perl as "line noise". It was really unfair, to say the least, because there were plenty of ways to code in Perl and do so cleanly, with a minimal usage of regex.
Honestly, it seemed more like people were coding in regular expressions, and simply using Perl as a "run-time" wrapper for those expressions (and maybe those expressions existed first; I don't know). I'm not the greatest fan of regular expressions (to me - especially on more complex operations - it is almost like programming the tape of a Turing machine), but I do know they have their place in the world. However, because the "language" (if you will) of regex allows for an extreme number of operations and such, it almost succumbs to an "inner platform effect", allowing those with familiarity to roam free, to the detriment of future maintainers.
That said - none of this is new. If you go back to the beginning of microcomputing, you'll find the same thing happened with the various dialects of BASIC, as people self-taught themselves programming in that language, then spread the knowledge thru various means (magazines, books, and later bbs systems) - it became a horrible mess, even though it all pretty much "worked".
The problem was that Perl's performance on builtins(sorting and regular expressions in particular) was stunningly good while its performance on anything else was stunningly bad.
The idea that you would Decorate-Sort-Undecorate an enormous array because the overhead of passing a function to sort() was so much slower was ridiculous.
In the case of regexes, breaking apart the regex into readable chunks and processing the data multiple times was so much slower that everybody piled everything into the regex. Named captures didn't appear until 2007.
The key, as you mentioned, is to use builtins, or more generally to use canonical expressions. There's always more than one way to write an expression, but Perl definitely has opinions about the correct way(s), and the Perl compiler does pattern-matching against the code to recognize many of the canonical expressions. They then get compiled to higher-performing IL code. This is why using things like the spaceship operator, <=>, will run much faster than equivalent verbose comparison code.
This is over-simplified, but I always thought about it this way: the Perl compiler generates "assembly code" for the "Perl CPU" to run. It can generate verbose assembly code using lots of low-level assembly commands, but it can also recognize commonly-used expressions and compile them to a single high-level assembly command, which can then run in a single "CPU cycle" instead of stepping through verbose code. Therefore, it was always worthwhile to learn how to write the commonly-used expressions in the common way, so that the compiler can recognize and optimize them.
I don't know if other languages have this kind of optimization. JIT compiling, as I understand it, does a quick and rough compile without optimizing on the first pass in order to start execution quickly, and then recompiles with more optimization on code blocks that are consuming a lot of runtime. I don't think any of the JIT compilers look for common expressions and apply pre-determined, hand-coded optimized function calls to them.
It's obviously an exaggeration, but not by much.
Which I guess makes that test more about dedication and attitude than about raw skill.
We recommended they implement a PDM (product data management) system - it would have saved them millions of dollars (considering they had a staff of 20 whose sole job was to update the claims processing system by hand based on these spreadsheets.) Unfortunately when the 2008 financial crisis happened, management balked at the initial outlay. It's quite possible they're still using Excel to this day.
The health insurance people of various sorts who use this setup are the ones with domain knowledge. With an excel program, they can (1) understand how it works (2) add to it, modify it (3) create a copy and break it (4) debug it (5) have ideas about how it could work differently…
I’m not saying that excel is the tool for the job (sounds like it is definitely not), and those advantages erode as the setup gets more complex and/or has more people using it. But, there are reasons excel gets used this way.
The alternative is often rough. Software projects like this have high failure rates. Getting “requirements” out of users’/clients’ heads and into programmer’s is difficult. It means the standard (natural?) gap between domain knowledge and programming knowledge is restored. That has costs.
I’m going to repeat again that I am not advocating health insurance companies run policies and claims in excel. It is a bad idea, almost certainly. I am saying that we should be curious at why they are doing it this way. I think it’s a “good” question in that it leads to interesting places.
If the excel team was blessed with a plague of inspiration tomorrow morning, Is there a way they could make it so that excel is the right way to manage an insurance business?
Yes, the performance will generally be better (1 minute to run a report, not 1 hour), but I'm not sure things will generally be better. As you said, the business people can at least modify the Excel. These systems? Not so much, without an usually massive support contract (Yaaaay! for us IT people, I guess).
People sometimes make a false equivalent and imagine that the Excel jungle will be replaced by a Google-developed, 90%+ unit tested, refactored and cleaned periodically bunch of code.
I'm not convinced that's a good thing. If they can change the Excel then they can also fuck up the system, and if you fuck up a system like this then it's horrendously difficult to unfuck. I think systems like this look great when they're working, but the second they go tits up you realise why people are horrified by them.
Unfortunately, many businesses then find they have no way to know the extent of the data lost or who did it. And there’s no proper backups. Happens over and over thoughout the world.
Excel’s a great tool and there are lots of terrific uses for it. But what the article describes isn’t one of them. If we’re being honest, this isn’t the first time the issue’s been discussed and the companies likely aren’t actually switching away from excel, but rather supplementing it.
In fact, the article seems to be more a “hit job” / “marketing piece” for Anaplan Inc., Workiva Inc. and Adaptive Insights.
The alternative doesn't have to have a high failure rate. I was brought in to help reboot their in-house implementation of a product data management system - the core issue was they didn't have any real competency in software development (though they believed they did.) Ultimately I recommended they abandon the in-house effort and instead proposed using an off-the-shelf PDM system, we evaluated quite a few decent options over the course of six months and found several that met the business's requirements. It would have been the right thing to do and I believe it would have been successful. Culturally the IT department staff was plagued with not-invented-here syndrome. It was a generally dysfunctional IT group and it's quite likely that Excel got it's start as the system of record because the business needed to get the job done, and the IT department was unable or unwilling to deliver.
Alternatively you were brought in by management to make cost cutting recommendations - and your solution wasn't cheap enough.
Alternatively the company accounted for the enormous risk of the project succeeding but actually being worse than what they had in place.
You really don't know the politics of this.
Another alternative is that your plan would have resulted in sufficient job losses that the company was worried that it would demotivate other employees and damage the company more than the cost savings.
It's entirely possible that the OP was exactly correct, or close to correct. It's also quite likely that at least one of the factors you mentioned (or others) was also at play.
But..I do think that if we're thinking in terms of "users" and "developers" then we're no longer talking about excel. The whole reason excel is used this way is the continuous spectrum of sophistication.
It's no good if you fix excel by making it not excel. The whole reason excel is so popular is the intuitive, unintimidating but terribly wrong way it mixes data, code and GUI. Functions in the data, constants repeated in every row, hard coded variables all over the place. If you're just supplying a GUI and hiding the code, the user can't change, add or debug anymore. The point is that they can code, even if they don't know they are coding.
The trick is to think up a way of shipping VisualStudio, but making it less terrifying to engineers, managers and such.
Excel upgrades don't seem to break previous files as severely and have the backing of Microsoft.
I can't believe I'm defending Excel.
What a professional software developer's workflow has and Excel lacks:
* principled way of sharing code across a team with distributed source control (or similarly powerful tool),
* automated regression tests,
* input validation,
* continuous deployment tools to automatically deploy working code (default mode for Excel I guess), and a mechanism to roll back to a previous version (not so much),
* build and dependency management tools for bringing in libraries from a vast open source ecosystem,
* systems for reporting bugs and feature requests and tracking their progress,
* networked database that is the shared "source of truth" for all users,
* security and auditing,
* hardware capacity planning (run locally or in AWS? what are the cost trade offs?)
So sure, get the health insurance people doing all of those things, and have Microsoft figure out how to do all of them well in Excel, and congratulations, they are now software developers. None of these require using a "real programming language", at least not directly.
OK, I don't know who wants to work in health insurance but the basic point is Excel is a good tool for people who have specialist knowledge in a particular domain and want to automate their work. Programmers know the best way of programming but frequently overestimate their understanding of other people's problems.
Even absent that, it is difficult to explain what users need to coders who have zero idea about the domain. Unless it turns out they are the kind that just know user is idiot the moment she has different opinion on her own job, in which case expect needed time to tripple.
The homegrown excel is awesome analysis once the company decides to move on. It is concrete, you can generate test cases from it, it has features.
No... but I think the converse might be possible. If those with the programming knowledge sought to replace the Excel system but in a way which actually preserved the useful features of the Excel solution. Dirty, filthy things that programmers don't like. No referential integrity, data duplication, deferring decisions to human beings rather than lists of rigid criteria, etc. If they could swallow their bile for just a bit, they could build a system just as useful and flexible and introduce tools to add in the things like data consistency and referential integrity - tools that the user would initiate, thus accepting responsibility for the restrictions such things impose on themselves and the data.
That is not, unfortunately, how it is usually done. (I am currently literally working on a program to replace a 40K row Excel sheet that crashes multiple times a day which has a workflow centered around it that involves slicing and dicing the sheet, passing it off to multiple different groups for separate editing, etc... not sure I will be able to sell the team on this approach, however I do believe it is sound.)
If you hand a caveman a machine gun, he will club you to death with it. IOW, the casual ability for domain experts to screw up a spreadsheet is enough of a reason to decide that excel is a terrible solution to this problem.
Caveman were better than you at a lot of things. They knew how to find squirrels
I've seen this a lot where a consultant will swoop in and tell them that they need to use Peoplesoft or something and the organization spends millions of dollars and years of retraining to get a system that has a lot more friction in their actual workflow and is never quite as good for getting the job done.
Of course there are often side benefits that can't be overlooked, like a much more sane backup solution, interoperability with other providers, etc... But in the end most of the employees will tell you straight up that the old system was better. Too much of the domain knowledge was baked into the old system and doesn't make the transition to the new system.
>We recommended they get a system that manages their product data
Yep, I believe you are a consultant.
When faced with a decision b/w using a RAD tool to create something outside of IT, or spending $500,000 and waiting 6 months so real devs can deliver something Finance needed yesterday, that may not be needed tomorrow, the choice is an easy one.
Your generalization isn't entirely unfair; IT is often called on to fix a mess, but some Access solutions are built by people who understand database design/normalization and can put together a easily-maintainable application that never hits storage limits or requires upgrading to a SQL Server-managed back-end.
Corporate IT never lays eyes on many Access projects that aren't crap.
It seems to me that many programmers don't appreciate that industry practices or regulatory requirements impose quirky requirements on domain specialists that everyone knows are far from optimal but which need to be complied with anyway if the business is to turn a profit or retain its licensure.
Someone mentioned a health insurance example above: there's 50 states, with 50 different sets of insurance regulations, and then some additional territories, federal regulations, and then wird business practices of healthcare providers who are subject to their own 57 varieties of regulation, and there are powerful economic and political interests that get in the way of burning it all down and replacing it with a single system designed by a committee of one patient, one doctor, one nurse, one pharmacist, one lawyer, and one economist.
And even that highly simplistic requirements committee went through 3 versions while I was writing it up.
The domain experts already have lots of complex knowledge and experience they need to automate. They don't want to learn about databases. Unfortunately, the reverse is equally true.
I think only those that turn into consulting understand it properly, because they aren't confined into a cubicle coding all day long without direct feedback how the real world actually looks like.
At most there is some feedback coming from ticket management system or the product manager.
I have no idea how watertight this is though, it’s Microsoft quality (and some of the most ancient code they have) so I wouldn’t bet the farm on it.
A spreadsheet optimized for integration into software build processes would probably create more than a billion dollars of real value every year with even limited adoption.
While it may be true that Excel is overused, I believe that the spreadsheet is chronically underused as a model for software development.
I'm working on a plugin for Excel for dev's and other technical people. It lets the user use SQL and C# in Excel, connect to various types of databases and move data in either direction, and provides a runtime for automation.
The idea is basically to let tech users use their existing skills to make life easier. Main areas of use are:
- process (live) data in Excel via SQL and C#/linq
- move data between Excel and databases
- fetch data from various sources using .net (e.g. get data from a AD, file system, REST etc. in a few lines of code)
- replace VBA with C# and .NET for automation
It's also very useful for prototyping. Business logic goes inside C# scripts, and Excel takes care of user input, displaying results, data binding (formulas and graphs) and data storage.
It's commercial but freemium, the url is http://www.querystorm.com.
I posted about it a few times:
Here's an intro video: https://vimeo.com/242216594
My first heavy use of Excel was in a thermodynamics class in college. We modeled heat gradients for various HVAC systems with Excel. After that, I used it to calculate stresses and strains of different structures using each cell as a node in finite element analysis model.
It is brave of you to integrate directly with Excel, and I commend you for that. Maybe some day I will work up the courage to do something other than basic one-way import.
This could allow for more interesting things like gathering fields from records into a meta-table from a relational DB, that would definitely be a valuable next step.
Spreadsheets have a lot of similarities to the sort of code bases that as an industry we've been moving away from. E.g., no version control, no tests, opaque code, bugs are hard to find, write-only code, bit rot, random antique scripts used in critical spots, poor reusability, latest version is on somebody's hard drive somewhere, nobody knows which version is the right one.
I also think it's deeply unfortunate that there's no good path from spreadsheet master to real developer. As great as spreadsheets are, they're a cul de sac. I hope this new generation of tools fixes that. If not, somebody sure should.
"A spreadsheet is an aggregate of concurrently active objects, usually organized into a rectangular array of cells similar to the paper spreadsheet used by an accountant. […] A spreadsheet is a simulated pocket universe that continuously maintains its fabric; it is a kit for a surprising range of applications.”
– Alan Kay, “Computer Software”, Scientific American (1984). pp. 56–57. PDF: https://frameworker.files.wordpress.com/2008/05/alan-kay-com...
Republished as “Computer Software” (1984). pp. 6–7. PDF: http://home.cc.gatech.edu/guzdial/uploads/130/Alan%20Kay%20-...
Version control for excel is terrible at best. Code review is basically non-existent and how could it be when logic is split over several cells and variables are named $A17 or B23.
I think the win with excel is its super simple interface, that said there is plenty of room to improve on the programming language interface.
Excel lets you name tables, cell ranges, or individual cells. I have seen spreadsheets put together by people who take the time to do this, and they are remarkably readable!
Anyway, it basically worked like a stack, with the currently selected cell being the instruction pointer. So when debugging you could either look at your data and watch blocks of numbers flashing in and out of existence or changing state, or look at the macro sheet and watch the pointer bounce around as it went to different branches or sat in loops or whatever. There were commands to turn screen updates on and off (which sped things up considerably) so as you got more and more of it working it would go from looking like some confusing nightmare to a smooth list of progress bars and passed sanity checks.
They moved to VBS because programmers were finding the cell model too restrictive/weird, and for integration with Access, but doing it all in cells was remarkably and pleasantly like assembly-level debugging. You'd hack something up quickly, figure out your algorithm, then name the ranges and use and index value, and you could lock ranges and what-all else. It wasn't as easy to read as a page of code (eg no whitespace) but the crudity of it pushed you towards that sort of 1-operation-per-line discipline that assembler demands and really forced you to think about semantic efficiency. I had macros consisting of thousands of cells of code with every single operation commented. You could, in turn, name the ranges that had macro code in them and do code profiling to find out where your performance bottlenecks were, and so on.
Some HN comments here: https://news.ycombinator.com/item?id=12448545
Because of this "bit more work", it's also much less flexible in the long run.
Excel is not a great database, but it's a highly flexible one.
An idea I've been toying with for a few months now is an automation app (read: window manager) whose UI is somewhere between Excel and GameMaker Studio's drag & drop UI. Like where you just have a giant grid/canvas, and where you can define "routines" and "variables" as groups of cells. If I could figure it out, I think it would open up the idea of an automation app to a broader scope, to people who aren't programmers, and I might be able to make some money off it. Especially considering I made like $500 in donations for Hydra back when I was doing more WM stuff for Mac, so there seems to be some excitement for the idea of programming your Mac, but perhaps it could be a living if I can figure out how to reach out to power users who aren't programmers. And it was your comment that gave me the idea of making it more like a grid instead of more like GameMaker. So, thanks!
Every single department in any company can benefit from knowing it well.
The problem is getting people comfortable and competent to know it well.
This is where I think google sheets fucking failed: they should have made an effort to make sure every single gmail user had a tutorial to know how to empower their curiosities and interests via google sheets.
Why do we readers take these sorts of things at face value? You know the whole article is a sales pitch.
Or you get irritated by it, while waiting for the "inevitable" collapse of this promotional/propaganda model...
Honestly I don't know what is happening, but it seemed like the first option, which I've followed for a couple of decades is a good enough cure for disillusionment and depression. (Caused by investing in the second option)
Edit: add quotes to inevitable.
Implicit & unexplained business rules, no tests, no version control, unlabeled cell references, hardwired variables, everything is global, no encapsulation, no logs and no documentation.
Even assembly code is easier read and maintain than a complicated excel workbook.
I think that's just as true today, at least as far as business software goes.
0 - https://mitpress.mit.edu/books/small-matter-programming
Edit: UDF is user defined functions. What's LPT?
I do think this is a tooling issue that could generate code based on Excel workbooks.
You would have thought this would be obvious- it seems analogous to perl/python or other such Swiss army knife tools we use for quick one-off solutions that occasionally get pressed into permanent formal use where they shouldn't.
Anytime another programmer asks me what language they should learn I tell them Excel and they think I'm joking. Besides being a quick and dirty way to solve a variety of problems it helps you speak in the language of your users (if you work in B2B)
Indeed, I'd also argue that the most commonly used database is Excel. Although few people using it realize they're designing schema.
Can we say "one way dataflow constraint system"? They're even called "spreadsheet constraints"...
Also known as functional programming
It can be a really powerful one I agree. It's simple to use, a lot of people knows how to handle it. And it's probably one of the best tools to do some quick data analysis.
However, like every tool, it has its pros & cons. One of the main cons, IMHO, is that data tends to be mixed with processing a lot (but it also simplify usage, so...).
Overall spreadsheets have been really helpful. Retrospectively, there is a good reason why it triggered the personal computers boom in the 80ies. They are good adhoc tools that can help lot. And Excel is the dominant one.
However, there are tons of horror stories around Excel, worst case scenario, it has lead to wrong decisions affecting the world economy.
I've personally a few horror stories (fortunately, not affecting the world economy).
At my last job, we used an internal spreadsheet to evaluate costs and margins when answering contract bids. This spreadsheet was a monstrosity taking ages to compute and it was a nightmare to manage, sync, and share between all the participants.
I've also seen the traditional "on what did you work last week" spreadsheet, quite annoying to update, even more given that my workstation is under Linux...
I've also seen spreadsheets as specification documents, generating an xml file, which in turn, through some xslt, was used to generate C++/C++ headers files describing an interface between two sub-systems. The concept (automating spec to code translation) was really interesting. However, I don't know about you, but I personally hate to have MS Office in the tool chain used to compile a piece of software, specially when it targets Linux and only Linux...
But those are not the most terrifying spreadsheets I've seen. The most terrifying on was a giant spreadsheet basically managing every aspects of a huge project (thousands of piece of equipment, hundreds of sites). This document was so big it took several hours just to open it. It was like a giant CMDB of all the configuration items, and, trust me there were a lot. But it also did more, a lot more. It was generating configuration files for applications, background images for monitoring maps, firewall rules... It was also a tool that helped architect and size the number of equipment required for a given site. I probably don't know the full extent of its functionalities. Just to give you an idea of its complexity, at one point, even the number of cells in a whole excel document was reached (with Office 2003 however).
All the auto-generated stuff were always kind of wrong and had to be fixed by hand, which made for a repetitive, annoying and error prone process. Also, there was no clear naming convention, and it was a mess to take the generated stuff and load it in our automation scripts.
It was, de-facto, a huge piece of software with no versioning, tests or QA and it showed.
There was only one guy behind it, basically managing the beast full time. When he was sick or on PTO, the whole project was impacted. And when he left, it was a huge mess, nobody was able to take over.
Spreadsheets are great, but past an adhoc usage, please, please, please, implement something more serious. And also, keep track of Excel usage to avoid these kinds of monsters. Often there are real needs behind these beasts, but Excel is a really horrible choice in the long run.
I wouldn't say Excel is a programming language - unless you are referring to VBA attached to excel. Excel is a spreadsheet application. It's no more a programming language than Microsoft Word is a programming language.
I would agree with OP that Excel is the most-used programming environment of all time
Excel endures because it's the non-programmer's REPL for data. Like a Lisp REPL, using Excel is nimble and has an instant feedback loop. Click "A-Z" to sort ascending, click "Z-A" to sort descending. Move/hide/freeze columns. If one has intermediate-to-advanced skills, one can pivot the data on multiple axis. None of the centralized systems with even the slickest web GUIs can match Excel's capabilities.
That's why the centralized systems like SAP ERP, Microsoft Team Foundation, mainframe sytems, etc offer functionality to export to Excel spreadsheets. (Or they offer export to csv files -- which is 1 step removed from Excel.) With MS TFS, you can even re-import from Excel so in that case, Excel acts as a sort of offline data-entry client for the project management tool.
One middle ground between Excel data silos and the utopia of a centralized system is a shared document store like Sharepoint or Dropbox. The hierarchy would be something like:
1) Excel spreadsheets emailed around. Lots of redundancy; uncertainty of which version is canonical.
2) Excel xls file stored on Sharepoint with automatic version control. Employees must have discipline to only email links to the "one true" spreadsheet and never attach the actual xls file in the emails.
3) The data that was in Excel is migrated to centralized system and all data entry and reporting is done there. However, because the central systems are so cumbersome, this abstraction "leaks" back out to Excel because of the highly desired "export to Excel" functions! (This restarts the dreaded Excel "data silos" complaint cycle all over again.)
There isn't an IT department in the world that can develop centralized IT solutions fast enough to keep up with the ways employees slice & dice data. That's why Excel is "overused".
It wasn't very long ago that I realized that my two decades in IT could be summed up as: making proper applications out of Excel workbooks with feeble macros. I hate Microsoft as much as the next Linux zealot, but dang if they didn't create a world in which I've made a career. Where I'm at now, I'm drowning in opportunities to ameliorate horrific Excel-based workflows, and I'm fairly confident that there's enough work out there to last me until another 20 years to retirement.
If we're being uncharitable, we can spin Greenspun's 10th rule of programming as:
>"Any sufficiently complicated Excel spreadsheet contains an ad-hoc, informally-specified, bug-ridden, slow implementation of a proper centralized database."
Or, if we're being charitable, we frame it as an internal MVP (Minimum Viable Product):
>"Spreadsheets are the internal 'mvp' that proves the business value before you build the centralized systems. When the spreadsheet becomes unmaintable spaghetti formulas and the xls email workflow crushes under its own weight, that will give the company the evidence and the confidence to spend $1 million and migrate the spreadsheet to a proper centralized database."
But then the people keep asking “how can I export to excel”. One user was copying an html table split in 20 pages one by one to excel.
So I found a elegant and simple solution: an excel spreadsheet linked to the PostgreSQL db of the app via ODBC.
Now users can just click on Refresh All in excel and get the latest data. I’ve also added an “click here to open” link in excel which opens the form with the row’s id in the web app.
Now everyone is happy: the data is kept centralized and available to everyone in real time. And users can make all the pivot tables and pivot charts that they want natively in excel.
Do not underestimate the power of ODBC with excel!
(may not work for big data though, but most business app do not contain more than a few thousand rows)
...along with understanding that migrating at earlier stage would cost only $100K.
You can very easily use Excel as a client to a central database and get the best of both worlds.
Aside from that — what about MS cloud services? I mean, maybe that's not appropriate for healthcare or certain businesses, but for others it allows there to be a single source of truth if the workflow is correctly tuned, no? I can produce a doc in my local instace, push to 365, edit remotely on the web, have somebody else edit remotely, and sync it back to my local instance without a hitch.
I use Office365 for that purpose regularly at my job. I'm a developer, but I have regular communication with editors, producers, project managers, division managers, operations managers, art directors, and so on. A lot of the information we share is tabled and needs to stay updated, and editable by each party involved. It's worked well for me in that regard.
Maybe others have had different experiences?
Well, more accurately, it can write back to the database, but normal usage of Excel won't afford that result. Normal usage of Excel you have an independent value.
It is also a dubious proposition as to whether we want people to actually write back to the central database; you'd be one stray macro away from disaster. You sorta want it if someone manually corrects something, but not if they operate on it programmatically, and now we're making things really complicated.
Another company/product that I know about (though it's been over a decade since I last touched it) that attempted to fix this issue was Actuate:
Hmm - and now that I've started digging into it, I see that this company has radically changed from what I knew of it. I had known it as a company which created software for designing and running reports from various data sources (whatever could feed it, honestly - excel, odbc, csv files, etc), which internally could be configured for more advanced usage using a language that had similarities to Visual Basic (it was OO long before VB, and geared for the reporting system, not general purpose usage). Before I left my employer who used it, they had just announced a Java-based system of "business objects" and some kind of tool to interact with data in an Excel-like manner (the tool essentially had all of the functionality of Excel at the time - including pivot tables), where changes would propagate thru the system. Plus interaction could only be done thru those objects, which were defined as "views" into the dataset.
So - after research this (just now!) - imagine my surprise at finding that this company now is named:
...and they also did something called "BIRT" - which was open source and part of Eclipse:
Apparently this started in 2004 (not long after I left my employer - and that it is Java-based also fits).
I'm not sure what is what with the company any longer? If I had to guess (probably wrongly), they spun off their Actuate reporting stuff and the other tools as open source (?), and are now concentrating on other things, possibly in the same vein?
Weird. But very interesting.
Yes it seems that way but many spreadsheets exist because the centralized system is an incomplete solution.
For example, a company might have a "human resources tracking" centralized system. (E.g. Ariba, Peoplesoft, etc). It has some standard features like ingesting resumes and extracting "skills" from parsing MS Word docs, etc.
However, the company happens to use a interviewing process where the candidate must pass (1) phone interview and (2) onsite with 5 peer programmers, and (3) hiring manager.
The hiring manager wants to "capture" all of those states (phone_passed=True/False, peer interviews with 5 different numerical scores, manager_pass=True/False). The centralized system doesn't have fields that capture all that so the hiring manager creates a spreadsheet that has those extra columns. Many spreadsheets out in the wild "fill the gap" between incomplete centralized systems and custom IT programming. An adhoc xls file can be created far faster than submitting a change request to the corporate IT department to customize Ariba -- which could take months (or never) to implement.
Or spreadsheets act as the "intersection" of data between multiple systems. Again, one purpose of centralized "data warehouses" was to eliminate Excel "data silos" but again, no IT department can create data warehouse solutions (which also includes all the custom reports) as fast as someone creating spreadsheets and just emailing them around.
Having a hybrid process where Excel acts as a round-trip offline client to a central db doesn't really eliminate the fundamental reason Excel (over)use keeps multiplying like rabbits.
When it doesn't, Excel will work fine (and will only cause disaster in individual cases). A "proper" unit-tested solution will either not work at all or just won't support half the use cases.
And without someone specifying the domain knowledge, there's nothing any programmer, no matter how well-versed in unit testing, agile, or ... can do about it.
That's why the principal way you sell centralized systems to users isn't in the ease of manipulating results but in the ease / automation of data entry. The whole pain point and problem with Excel is in manual data entry, and having a single source of truth, even if you succeed in enforcing its property as a single source of truth, does not automatically solve disagreements between different human sources of data.
That's why tools like Grafana have interesting futures for business users, because they enforce a clear separation between data sources and data visualization, while offering a lot of control to the end user over statistical manipulations and visualizations that they can create, and keeping those visualizations updated in real time, so that they don't have to be re-prepared every time period for some new report.
What about Business Intelligence products such as Hyperion? They can provide reports from a defined data model which gets data directly from the database. People can create their pivots and other transformations on the data fairly easily. Is their something that Excel offers that makes it much better than one of these Business Intelligence products in certain situations?
A simple one: If I have been asked to do a report, but it turns out I don't have the correct access rights to some data, my manager can simply send it to me in an Excel sheet, or he can log an issue with IT to change the acess rights to a specific key/dimension/hierarchy.
The 'one true data source' move often correspond to tighter access controls when good auditing really would have been the correct solution, hence the above situation is fairly common.
Got a complex work stream?
Got thousand small problems that needs to be slowly figured out over 50 different main topics?
Got a team of 2,50 or 1000? No problem.
1. Just set up a bunch of jira entries while tagging them with the same labels
2. set up filters on these labels.
3. Set up dashboard(s) that give a complete realtime/update to any inquiring user.
Need to report status?
No problem, screenshot of the
Dashboard and done.
It's really great tbh.
Also Google Sheets has an API for stuff like Python.
Only the very advanced users turn to VBA programming.
Yep. Even if they get centralize data via RDBMs like SQL Server or even god forbid Microsoft Access, office workers are going to want the front-end of that data to be excel.
There is too much knowledge investment in excel for many office workers to want to switch.
Then I believe it should say stop abusing Excel.
Excel can be an incredibly useful tool, but it's definitely not universally useful. If you're using it to manually collate data from 400+ sources, you're doing it wrong.
> A year ago, Mr. Bell’s team spent hours distributing hundreds of Excel spreadsheets to regional and unit leaders each month for planning and performance tracking of the company’s 415 U.S. restaurants, he said. Now the same process takes minutes.
Reads to me like, "last year Mr. Bell's team spent hours hammering in nails by smacking them with the wooden part of the hammer's handle"
And the premise that things have changed enough to warrant reevaluating worn out business processes like Mr. Bell's here is true enough.
But dumping a tool because some out of touch C-level tool says so is a recipe for huge traction loss.
I mean ... people that use nail-guns for a living still have to use hammers.
Edit: I just noticed that I'm the third person to reply to you and we all referenced hammers.
edit: my favorite thing about using excel is that instead of spitting out an answer, you can spit out an Excel sheet that shows a lot of the work itself, therefore partially explaining the answer to the people that have to consume it. The product of your program can often be another program, giving the person receiving the report the opportunity to tweak or add new parameters to your output. Basically currying accounting calculations.
Many moons ago large bank I worked at purchased a company and soon it became apparent that their entire business ran on a Lotus 1-2-3 spreadsheet with 30,000 lines of really dense, uncommented, unindented, spaghetti LotusScript code spread over 2 or 3 functions. The kicker is it would only run in the 16-bit edition of the product for whatever reason. So we had to have a VM to run it in because Windows 2000 had a problem with either Lotus 1-2-3 16 bit edition or the spreadsheet itself.
So yeah, stop abusing Excel - Lotus 1-2-3 in this case.
Maybe I'm a bit bias having used Excel my entire career and very dependent on it, but everytime I've seen companies steer away from the desktop application with 0365/sheets/tableau its simply not as useful for adhoc data analysis and reporting.
This is one of those solved problems that companies inexplicably keep working to solve again.
I mean, this is the problem that distributed version control was designed to solve. Emailing a spreadsheet around and calling one a source of truth doesn’t solve the simultaneous edits problem.
You wanna know what's even more useful? Code notebooks.
I store everything from my sql queries to my processing pipeline to my models, and my vizualizations. Most notebooks have slide-show modes now so your presentation to business stakeholders is simple to prepare as well, and with the right plotting packages you can draw interactive charts that do all the fancy stuff. And when it's time to update the analysis, you just re-run the notebook.
But business analysts are too freaking lazy to just learn to write some python or R, and they're perfectly happy to get paid to do the same task over and over again when it comes time to making updates.
And don't get me started on source control.
In practice what I've seen is that IT and finance departments don't even let business analysts write their own SQL queries for fear that they'll write bad ones that bring the database server to its knees.
I'll take data warehouses for 600, Alex.
I know there's " automatically check stuff into git" sort of models but would really like an integrated Jupyter + GitHub + Google Drive system so that it's easier to track what's happening and to find older versions of stuff.
Installing a notebook server is an option though. IBM will be happy to let you pay to have them help, or you can do it the sane way with continuum analytics.
I happen to use and advocate for R/RStudio for ad-hoc work (I spin it as "excel on steroids" when I am trying to convince someone). But it is hard to pull people away from something that has worked for them.
The alternatives to excel in the past have been really, really unpleasant to use. A now EOL product from Oracle comes to mind. It is called "Discoverer"-- that's right, "Discover" + "er". It is a turgid POS requiring extreme suffering and handholding/permission granting from your local friendly DBA-- all this to produce 90's era bar charts that are all but impossible to tweak.
The presentation mentioned that the company is/was working on reporting add-ons, I think.
The trick is to learn what node does what, because there are lots of them. You can also download community nodes for additional behaviors. Once you learn the basics, it's pretty straightforward.
I learnt by trial and error, looking for examples and info about nodes, and now I have some workflows with data from DBs, files, APIs and all at once.
In big biz, the "finance chiefs" (or other persons with purchase authority) usually get pampered by vendors with expensive business trips or whatever... especially at the scales that require SAP or Oracle solutions. So this likely means: the CFOs bought some expensive software that promises to replace Excel without any due diligence, the staff looks at the software and decides it's crap and continues to use Excel => CFO gets mad because no one uses the expensive software.
If you don't believe me, just ask yourselves how Oracle, IBM and Accidenture are still in business despite numerous highly expensive and publicized failures. It's all due to CxO pampering (in addition to a bit of vendor lock in).
See also http://exposingevilempire.com/.
Not sure if the misspelling is "accidental" or on purpose, in which case both it and this sentence are puns.