Her PhD dissertation on the subject "Analyzing and Visualizing Spreadsheets" (http://www.felienne.com/archives/2534) is particularly detailed.
Much though some people might think that spreadsheets are not for serious work, they're being used for that right now, and that is unlikely to change. Better to figure out how to improve the tooling, rather than undertake the quixotic quest to get business people to abandon their perfectly viable programming tool.
There is an excellent video with her on InfoQ, explaining her spreadsheet formula refactoring tool Bumblebee (and the F# code behind it): http://www.infoq.com/presentations/spreadsheet-refactoring
It's not clear that there should be either, spreadsheets work really well. Efforts to move them into databases or whatever be damned. A spreadsheet is != to a database table except maybe in the most abstract possible sense in some cases. They really are different things and I agree that issues with spreadsheets should be solved by improving the tools not trying to paradigm shift spreadsheet users into an inappropriate environment that also happens to have a very long and highly skilled environment setup requirement.
1. They let novices build fairly effective user interfaces, just by entering headers for rows and columns.
2. If I understand what "data flow" programming is (based on using LabVIEW), then the formula cells in spreadsheets are such a thing. Each formula is recomputed whenever its inputs change. That creates an incredibly quick testing cycles. Displaying intermediate results by necessity creates a built in debugger.
In addition, the lack of "complete" programming features, such as loops, may make spreadsheets less forbidding for novices to create and debug.
If these things are, as I suspect, compelling enough on their own, then like you say, the rest of the stuff in the latest Excel would just be icing on the cake -- not that I'd give up any of it.
1) spreadsheets are a purely functional programming language. When you point this out to management users of it (after explaining what it is), they wouldn't want to live without it.
So in reality the most widely used programming language in the world, is a purely functional one.
2) they are NOT turing complete (assuming you stay away from VBScript)
Of course, they're worse than BASIC when it comes to naming things, which is what everyone here is complaining about. But these are major advantages, to be fair. The only thing that comes even vaguely close to how spreadsheets work are the IPython notebooks.
Felienne Hermans actually implemented a Turing machine in Excel, without using scripting: http://www.felienne.com/archives/2974
There was an HN post about this in September 2013: https://news.ycombinator.com/item?id=6416631
You could also argue that spreadsheets are kind of homoiconic, since code and data occupy the same structure. But that's a bit of a stretch.
The problem is that very often, spreadsheets are used as databases because they are the tool the user is familiar with, not because the application is inherently more suited to a spreadsheet.
> They really are different things and I agree that issues with spreadsheets should be solved by improving the tools not trying to paradigm shift spreadsheet users into an inappropriate environment that also happens to have a very long and highly skilled environment setup requirement.
There is no reason a database using a relational data model (but not the multiuser/concurrency features that are also part of the relational model) needs to have a "long and highly skilled environment setup requirement", or, in fact, be any harder to setup than installing an app just like a spreadsheet app would require.
This is a very salient point to me. A lot of people take the route of talking about why spreadsheets are so bad. In most programming language discussions, I see the more sane folks arguing that most languages aren't bad, but some have terrible idioms, or reams of ancient legacy code from before best practice was a best practice, or maybe they make it too easy to do the wrong thing, and inconvenient to do the right thing (I'm thinking of a lot of stuff Rich Hickey has said).
Instead, I like to think about how the spreadsheet environment or tooling could be altered to make it more natural to do the "right" thing, or to prevent certain classes of errors (maybe in similar ways to how type systems can). I'm not proposing a specific solution, but more a mindset for what I think is interesting (and probably not a terribly original mindset at that).
As someone who has worked rather extensively with both, no, they give a lot less than 90%, and for even fairly simple uses they are often more complex to use than SQL.
If you don't like the Pivot Table layout, you can use easily construct formulae to pull the relevant values from a Pivot Table into whatever format you want. You still get the speed of calculation/refresh and, if you label your fields well, have formulae which you can copy-paste across a large area, making the sheet easy to inspect and reason about.
Its largely used by business people because they (1) have inadequate IT support to have things developed using proper tooling, and (2) have been forbidden from using better tooling themselves by IT. Since both of these are direct products of IT policy, I don't think its a quixotic quest to fix it -- however, trying to fix it by trying to convince them to just give it up is the wrong approach. The people who need to be convinced to change the situation are CIOs.
"Excel is inherently observable since it doesn't have any hidden state and all values are there for you to see and manipulate. It's also direct. You change values in the grid, drag drop things, do calculations on selections, and so on. And it manages to sidestep a lot of incidental complexity; spreadsheets are timeless, without setup, and don't even have a notion of being run."
This hints at one of the sources of problems with Excel. One good approach to using Excel is to always be in one of two modes: changing the structure (equivalent to coding) and entering values (equivalent to using software). Of course, at the early stages of designing something you will be entering lots of dummy values (e.g. ARPU) to check that your formulae work.
However, many people don't distinguish between collections (rows, columns or other contiguous areas) of cells which _should_ contain values, and those which should contain formulae. I, like many, prefer to indicate input cells with a yellow or orange background, so that I know everything else is a formula.
You can do even better:
- If you're done designing a spreadsheet, and expect to use it for a production process, then mark the input cells as unlocked, and protect all the sheets. Then the end user won't be able to mess up the formulae.
- Always write formulae in a way that they can be copied across or down whilst pointing to the right places. This can be achieved through use of one or more $ signs to fix a reference. I've seen a fair number of spreadsheets where there are a large number of similar formulae, but they have been entered/adjusted manually. This is fine only if you never make mistakes, and if no one else needs to change the formulae later or verify they are correct.
My point is that "formula results that got turned into static data" can be avoided with just a few easy rules. However, most people aren't taught these rules :(
Except if you are using modern version of Excel, don't general use $ references for this, go one step further and use names (for fixed individual datapoints where you would use two $s) or named tables with named columns (for the most common use of single $ references.) This is more self-explanatory and less error-prone.
Except that it does -- even in terms of simple numerical values what you see is not necessarily the actual value, but instead the result of passing the value through a format string that can vary by cell.
They actually do have such a notion - in Excel you can set the calculations to Manual and then you hit F5 to refresh everything in one go. Useful when you deal with current date and time.
What I was going to say before unexpectedly finding that, is that the gap between Excel and Access is huge. I've made a living off of that gap at various points, and at some point I stopped thinking they were "doing it wrong" and started thinking, "whatever makes you happy."
The secret to success in contract software development.
I've talked to hundreds of customer prospects for Treasure Data, and by far the biggest surprise I encountered is how few people know their way around SQL. I am not talking about just "business" people but also programmers.
Contrast this dearth of SQL-proficient population with the massive, massive Excel user base: I jokingly tell my friends that Excel is by far the most popular programming language =p
I use SQL extensively, and never run into all these problems other people run into.
I also don't use ORM. there's so much you can do well and fast with SQL.
(I've also seen dreadful abominations in sql, like stored procedures that dynamically generate code for pivot tables... wait, that was me who did that)
That's very simple, if you know SQL.
edit: Based on your other comments downthread, I realized that you meant the IT department themselves should develop the application using "proper tooling", not the business people. That means that the business people will have to hand over requirements to the IT people, who will write the software. I think turn-around time may be too much for business people who want to do their own data exploration.
Not necessarily, though that really depends on the technical competencies in the organization and a number of other factors.
> That means that the business people will have to hand over requirements to the IT people, who will write the software.
Well, I kind of think that if you are going to have an organization that effectively uses technology, you need technology-aware process/system analysis embedded in operational organizations at a fairly low level facilitating process optimization including, as necessary, technology acquisition and software development. That's a pretty big culture change for most organizations that aren't technology organizations, I will admit.
> I think turn-around time may be too much for business people who want to do their own data exploration.
I don't think the problem area in Excel use is "data exploration", but, yes, I think that overly bureaucratic, insufficiently agile processes which separate software development from operations do create friction which leads to use of suboptimal tools -- including, but not limited to, use of Excel where its not appropriate.
If you just need to sum a list of numbers, or quickly calculate payroll or something, Excel does a fine job. For certain types of engineering calculations (the horror), Excel does a fine job.
The overhead of having a full developer implement the tool is absurd, because many of these problems can and should be solved in less than 10 minutes of spreadsheet jockeying.
Sure, its great for lots of one-time quick calculations. I was really referring to its use in on-going operations with changing requirements where maintenance is necessary.
The problem comes in when something that is suitable as a one-off tool -- and perhaps, as such, might make a decent prototype for a proper ongoing, maintainable tool -- instead gets pressed into service as a production tool and becomes a sinkhole of technical debt.
I can only imagine the amount of work involved in trying to extract any sort of useful information from the raw data in that format. I run into stuff like this all the time.
All I could think was that given a day and a Postgres+PostGIS server someone could have probably saved them entire man-months of their time...
Indeed. Here's a great quote from 2005: "databases are rocks, spreadsheets are water": http://www.propylon.com/news/ctoarticles/051115_master_foo.h...
In a typical non-IT company someone who needs to automate something has to wait for IT department to purchase software which would do the task. And very often it turns out that in software purchase process the purpose of that software is lost, requirments are comming from a wrong person and at the end that software is pretty useless.
It is not that someone can just download Python or Ruby or Java and start coding. No, no, company IT would never allow for this because they fear about security, patents, licences, etc. (and it does not matter that these doubts are not justified, very often it is just forbidden and that's all).
But chances are that such company has MS Office. Excel is just an interface to a better or worst programming languge. That's just better then nothing.
An unnamed overworked staffer in London copied and pasted the
wrong figures into an Excel spreadsheet, throwing the firm's
risk models out of whack.
Spreadsheet errors are reaching epidemic proportions globally. What we need is a transnational organization to contain the threat with research, best practices and conferences!
Love their compendium of horror stories. Did you know the US Securities and Exchange Commission has weak accounting because they rely on spreadsheets?
As we have reported in prior audits, SEC's general ledger system
and certain software applications and configurations are not
designed to provide the accurate, complete, and timely
transaction-level data needed to accumulate and readily report
reliable financial information...
Many of the agency’s financial reporting processes are still
manual in nature and reliant on spreadsheets and databases to both
initiate transactions and perform key control functions.
But why is this madness? It works for them. The iBanking world is the madness. You can put in place effective change controls and auditing with any system. However, if people just want it done and don't care how, regardless of whether these calculations were in Python, Mathematica or Fortran, garbage in will still be garbage out. So, let's not wantonly scapegoat Excel for a process problem here.
I agree with you 100%. iBankers use Excel because it does what they want. One of the main reasons I left was because I got tired of trying to re-invent the excel wheel with every piece of software. There was always an inherent lack of trust that any UI was "doing it properly," so we always had to add the failsafe of export to CSV so people could do it themselves in Excel. Mind-boggling inefficiency.
If they can expert it to Excel, then they can do whatever simple data analysis they just made up. No reason to wait for IT to implement whatever stats they are curious about, order it in some new way or whatever. Doing those things in excel is very fast and they get to have complete control over it.
Even if they only color cells and rows depending on some ad-hoc rule that will never be used again, excel still added them a lot of value.
Plus, I would pay gold for users that actually test system before relying on it. There is always shortage of testers and if the customer would do some of it, he deserves a discount.
Knight Capital Group?
Users make mistakes, users given the power to work with computers that can cause billions of $ to be traded can cause billions of $ in losses. The solution would seem to be to limit the exposure, to put a cap on the amount of $ that a user could trade without going through some form of verification.
Spreadsheets may not be ideal, but they're the poor mans gate to programming and I suspect they don't have a bug incidence that is much greater than regular software developed by programmers.
Of course the programmers would love to believe otherwise.
I get that programmers hate the formulae being distributed all over the place, but this is precisely the advantage for business people; they want to be able to 'walk the heap' and follow the provenance of individual values back to the source, and they want to see things in parallel at all times. CS people lean towards theoretical provability, business people are inclined towards using statistical sampling. So I would write unit tests that would lock most of a spreadsheet while running a batch of checks on the contents of individual sub-sheets, then lock the validated sheet, unlock the next sub-sheet and copy-by-value the data into that for the next round; but I would also load up random historical batches that were known-good and make sure all the final totals matched.
Sure, spreadsheets are inefficient, but a mistake in an a very efficient process can easily lead to monstrous results. And from the domain specialist's point of view (ie the accountants/business managers), the inefficiency cost of doing things in spreadsheets is far preferable to the costs of auditing a black-box process at a later date if queries arise about its integrity.
Oh wait, a user can enter a bad number in any tool, regardless of it being Excel or some custom cobbled together monstrosity that does less than the same work, but costs 100x as much.
Another website people interested in spreadsheet erros might like is Ray Panko's spreadsheet research
The author has well-intentioned advice about avoiding MS Excel but it's misguided. The criticism fails to accommodate the reason why MS Excel was used. MS Excel is the lingua franca of non-programmers. Thomas Piketty is a trained economist, not a programmer. It's not realistic to expect Mr. Piketty to set aside months (years) of his time to master C++/Python/Scala/Fortran/etc to avoid using Excel. It's more realistic for an economist to use MS Excel to back his thesis than for a Python programmer to write a bestselling book about economics.
If we then tweak the advice to be, "if Piketty is not a programmer, he should have hired a compsci graduate student as a programmer", well... you've only shifted the (human source of) errors somewhere else. Plenty of examples where software not written in Excel had fatal errors: Therac-25, Mars Climate Orbiter
Lastly, some of Piketty's errors were transcription errors. In other words, GIGO (Garbage In Garbage Out). Therefore, using Python while there were GIGO issues isn't going to solve the data problem.
The "GIGO" issue you note is irrelevant, but even if it weren't GIGO is compounded and amplified by the copy-and-paste paradigm of spreadsheet "programming", so if anything bringing up GIGO is even more damning of the use of spreadsheets for analytical work.
On the point about transcription errors: Using python could have avoided some transcription errors. If the error was in copying a single constant from a source into the program, then in python it might be buried as an unlabeled magic number deep in the source tree, but it also might be labeled as a top level constant identifier. In the former case it would have the same problem as in a spreadsheet, but in the latter, a typo would likely be caught by someone reading over the code at some point. Or if some of the data came in a file format that excel couldn't automatically import, maybe he could have used a python library instead of manually transcribing 50 data points. It's hard to think of a situation where using python would have lead to more transcription errors.
Both Harvard and Stanford economics degrees do not list Computer Science 101 as a core requirement to graduate.
Compare Stanford's requirements for Economics with Electrical Engineering. The engineering major has core classes including computing and programming. The economics major does not. (Whether or not economics studies should include it (and therefore Stanford is "wrong" in leaving it out) is a separate conversation.)
Since Thomas Piketty is from France, is there a tradition of European schools requiring computer programming language courses in their training?
As for the Python vs Excel comparisons, I say people are overestimating Python's syntax to avoid errors and underestimating Excel's visual spatial grid of cells & near-universal collaboration to also avoid certain classes of errors. For non-programmers, the rows-columns grid displayed visually at all times is almost a perfect 1-to-1 correspondence to their mental model. For Python loops, it takes programmer training to mentally "unroll" loops and see how data is munged. This visual "blindness" in programming languages ends up creating its own class of errors.
I've done programming in MS Excel + Excel macros/VBA, MS Access, Oracle PL/SQL, MS Transact-SQL, C++ Qt, C# ADO.NET/LINQ, Python,etc and I'm not convinced the syntax of programming languages leads to reduced errors. It's just a different class of errors.
I tend to disagree. Mr Piketty graduated in Mathematics (from a highly selective institution). It is realistic to expect him to be able to use those tools if needed.
More importantly the people claiming there are errors in hi work are making an even clearer error than Piketty in claiming that inequality hasn't risen. It has, and it is shown on dozens of measures. That they are trying to fight something as well established in main stream economics as evolution is in biology leads me to believe that in this case it is ideological windmill fighting more than honest critique.
Spreadsheets are a different kind of software with advantages and disadvantages compared to compiled/scripted languages.
How do you treat a spreadsheet as real software:
* One operation per cell
* Inputs, processing, and display separated into distinct sheets
* Every formula checked twice (or three times)
* Write row and column check sums
* Treat row and column locking ($F4 or F$4) as the footguns
* Name constants
Testing spreadsheets requires a different set of tools and behaviors, but to say that "quick and dirty" spreadsheets are less durable than "well engineered software" is really to say that "quick and dirty analysis" is less durable than "well engineered analysis."
Spreadsheets are remarkably easy to learn to use and to use correctly compared to scripted/compiled software. Not the least benefit being that they force abstraction in a way that is extremely intuitive to most people -- spacial context with the labels and state simultaneously displayed leads to much much lower context loading cost (for the human brain) than keeping all the abstract variables in your head when writing code.
Mr. Lemire seems to treat the tool as a toy then claim it is only good as a toy. Flub and not Flub languages alike can be run quite easily without unit tests, in which case they are no more durable to errors than excel (and arguably less as they don't display intermediate state).
I don't think this is true. IME, people who aren't primarily programmers/developers and have been using Excel for years tend to have worse practices developing Excel tools than the same people have with much less time learning, e.g., Python. Excel is much easier to learn to use to get some kind of result, perhaps, but not easier to learn to use correctly. (Part of it, I think, is because the cost of incorrect use is somewhat more deferred, which is also part of what makes it more attractive to use initially in many cases.)
Perhaps it is a difference in semantics.
It is very easy to learn to use correctly, but if you aren't taught by someone who uses it correctly (either self taught or otherwise) you're not likely to discover the best practices.
Easy to learn correctly != obvious way is correct != commonly learned correctly.
Does the author realize that Excel is not used to run banks or to track the space shuttle? Sure, some traders might base their models on Excel. But, the actual exchanges, trade routing and account general ledgers are all on very tightly controlled systems, usually mainframes in the case of ledgers. To boot, NASA does not use a VBA Excel plugin to schedule burns and execute reentry. (Excel 2007 won't even run on a 8086 chips)
I'm sure he does. He's exaggerating for effect. (Well, at least he is regarding the Space Shuttle.)
Sure, it may be stupid, but it's democratized programming. That's why spreadsheets work.
Human beings make mistakes.
A lot of humans use Excel.
Hence a lot of errors in Excel.
Would all those humans program their logic in Python - we'd have a lot of wrong calculations in Python code and an article stating to not use Python.
There are some excellent tools that are used by civil engineers where the formulas all have to be published when working on any thing. They miss a decimal point and bridges fall down. My brother in law is working on automating some of the drudge work but essentially they produce a falsifiable sequence of claims so that anyone can assure themselves they got the correct answer, and if the formualae are wrong it is immediately apparent.
I don't think it works that way. The surface area where you are permitted to make mistakes is completely different in language X compared to excel. In Excel, `="a"+4` is a cell error but a valid spreadsheet. In python it is an invalid program, throwing a runtime error. In haskell, it doesn't even compile.
For example, an inexperienced programmer in Python might use floating point instead of Decimal data type to add currency amounts. His programming loop to sum the amounts would be incorrect. In MS Excel, adding currency in cells correctly down to the penny is a no brainer. Sure, the Python programmer can be taught "best practices" to avoid this type of error but the point is that while Python helps eliminates some errors, it also creates new ones.
Another example is data munging. A programmer might write some Python to slurp a data file (exported from mainframe or whatever) and do some financial calculation on it. Load the values into an array some other memory structure. The problem is that the memory contents are "hidden" from sight unless the programmer uses a visual debugger or prints out all the values to inspect. The Python programmer may not notice that some values are misaligned or garbage. With Excel, the non-programmer imports the data file and he immediately scrolls through the worksheet as a sanity check. His eyeballs notice that the source data is dirty. Again, the Python "problem" can be corrected with best practices but the point still remains: different tools create different problems.
Lastly, Excel spreadsheets are easily emailed among dozens of people. You can't send ???.py programs to everyone because you can't expect all the Windows users to have the Python runtime. Spreadsheets are even sent to iPhones and tablets and you definitely can't expect easy Python deployment there. With less coworkers examining the ???.py file, it has potential for more errors compared to a scenario where everybody can participate in questioning the spreadsheet's numbers.
Our intuition says that on balance, the hypothetical Python/Haskell solution should have less errors than MS Excel but I can't confidently say I've seen any definitive proof of that.
I think that's what most people do when they are programming.
I suppose a good example of this would be all the affordances built into a modern highway. Softer turns, curves prior to intersections (they increase visibility and encourage the stopping traffic to slow down), embankments, etc.
Excel (and any other spreadsheet) is a nice and convenient tool to get easy tabular inputs and some results fast. Spreadsheets were one of the first useful programs on PCs.
I loved this article  from 2008, that argues that Excel and VBA were responsible for the credit crunch, that later possibly lead to Reinhart/Rogoff's flawed research using Excel... its spreadsheets all the way down!
 http://www.resolversystems.com/products/resolver-one/ (but link appears to be dead
Pyspread expects Python expressions in its grid cells, which makes a spreadsheet
specific language obsolete. Each cell returns a Python object that can be accessed
from other cells. These objects can represent anything including lists or matrices.
I can believe "flawed product" were constructed by spreadsheet but that seems a kind of shallow analysis.
Clearly, the housing bubble involved a rush to buy whatever - spreadsheet might seem the "immediate cause" but "wishful thinking" would clearly be the larger cause with various inherent tendencies of the financial system behind that.
That's a rather aggressive phrasing of what the article says, that many of the calculations involved in the credit crunch were done in Excel/VBA.
The author was (is?) a recruiter for quants and financial developers so he certainly had a unique perspective.
My dad put together his architectural plans for a new house in Excel because it is a tool with which he felt comfortable. Some people do calendars in Excel while others try and recreate a full General Ledger system. Because it works well enough, people don't see the need to invest the time in learning a new application and instead invest their time in pushing the limits of the tool.
As someone who builds financial models and who audits those built by others (and is a competent programmer by night), I think that the key source of risk in Excel models is that the tool has no knowledge of intentions. What I mean by this is that if I'm building a cash flow model or if I'm doing a pixel drawing, Excel doesn't care; as users, we are forced to create our own structure and build in our own checks and balances. If I make a balance sheet in Excel that doesn't balance, Clippy won't show up and let me know that things are broken.
I've often thought that it would be really amazing if a semantic layer could be built that uses Excel as the calculation backend. This sort of tool could understand the sorts of concepts of financial statements, projections, time-series and other concepts that often show up in financial models. It would have a built-in understanding of the domain-specific models that would let it leverage that understanding to reduce risk in the building of financial models. If I told it that I wanted to add a revenue stream that is tied to the output of production, the tool would connect the dots between the production schedule, any inflation and/or foreign exchange assumptions and would feed changes in working capital according to the associated collection terms, etc...
Before I get too carried away, the point is that this type of semantic layer would be much better at preventing and detecting anomalies and potential errors in the development of a high-risk financial model. Does anyone have experience with any such tools?
"Microsoft Visual Basic for Applications (VBA) uses unmanaged code that is tightly integrated with Office applications. Microsoft Office projects created by using Visual Studio enable you to take advantage of the .NET Framework and Visual Studio design tools."
Probably worth a look. I worked at an iBank previously and we were investigating when/how we could start using VSTO more..
Bespoke tools will be bought for tasks too complex or important for excel - finite element analysis, problems involving non-linear springs etc. Some time-consuming, repetitive tasks may also be deemed worthy of more automated tools, but on the whole the engineering industry is very much in the dark ages when it comes to modern software approaches. I often wonder what you'd end up with if you introduced a team of computer scientists into a civil/structural engineering company and told them to assist with analysis. I imagine you'd get some pretty innovative approaches to concept screening/design/cost-optimising etc.
It's just a little offputting when people talk about the tools they (have) to use -- When I first started, I would never have expected Excel to be so prominent, and Matlab so absent.
Yes, they've been exposed to programming at some level, but they are not programmers. Even as a late-20s mech engineer, I am far above most of my peers when it comes to slinging code and I'm a rank amateur. Many wouldn't even want to touch it.
You're mistakingly assuming that the path of least resistance is "teach non-CS/EE engineer to program because they are technically minded and can easily pick it up" rather than "adapt mathematical process to suit excel".
This is an absurd claim. Any good spreadsheet will contain multiple self-tests, either in live formulas or via macros.
More often than not, the fact that a spreadsheet was used meant business was doing an end-run around lack of IT support initially to build the thing and wanted the quickest thing that appeared to work without any concern for (or even knowledge of) software development process, and its almost certain that in the history of changes (even if IT was forced later to adopt the spreadsheet) have been do the minimum required to meet each new requirement without really understanding what went before (often because the one person that understood what the original was doing has left.)
Which isn't really a problem with the tool, but with the social conditions which make the tool attractive to use in the first place.
I think the biggest problem is the small crossection of these two terms.
In the past a company y that I worked for went down and one of the factors was some wacky spreadsheet in our finance workflow :-(
My problem is that Excel is terrible. Its formula system is painful and the formulas are invisible.
It's not that a light user DB/spreadsheet program is a bad idea, it's that Excel made a lot of terrible decisions 20 years ago and now they're married to them.
I can't really think of any off the top of my head, would be interested if you could share one or two that they're permanently married to.
However, I could make a very long list of easy to implement fixes & features they could add that would offer different ways of doing things than the braindead way you have to do things now, while still maintaining backwards compatibility. My conspiratorial theory is that MS is very well aware of all these things, and they will be released only if a legitimate competitor appeared on the scene, or slowly over time to encourage upgrades.
EDIT: Apparently Wikipedia has a list of "quirks". I haven't vetted any of the citations, but the page claims issues with statistical function accuracy, failures with modulo operations, confusing use of numeric precisions, and an inability to open two identically-named files
Now, replacing it with Python isn't better - any language that uses double-equals-signs is not fit for a layman-programming-platform.
Basically, highschool math introduced a lot of syntax that users will be familiar with - we're universally okay with the typical +-/* operators, for example, and ^ isn't a stretch, neither is f(). AND/OR/NOT are introductory Boolean algebra and while they're not universally taught in high-school, I find many scientists are acquainted with them (although languages using || && ! syntax are a good way to lose them). Excel's lack of boolean operators (booleans operators are functions in excel) is disappointing.
I've always found that SQL gives a good minimal arithmetic/boolean-logic toolkit for laymen. It's a good model to follow.
f(x) is different; you're right about that one.
It's basically the first thing they have to teach you.
Have you tried Ctrl-` ?
Do you know a better alternative? I have used LibreOffice and Numbers and I will take Excel any day. Saying that it is terrible is a very strong statement for a piece of software that arguably is what keep a lot of people in the Windows world.
But sure, it's a GTK app and so it looks more like Office 2003 than Office 2010.
A re-think of the spreadsheet to be a little closer to a SQL database (but still layman-friendly) would be far more sensible. I wish something like Lotus Improv had won.
You could probably get there by cutting features out of Excel -- Excel has a lot of database-like features that can mitigate some of the problems of using it, the problem is that most of the people using it to what is easiest and most discoverable for the UI -- or what they learned as a power user 20 years ago, or learned cargo-cult fashion from (perhaps through intermediaries) someone who learned then -- so a lot of the features that are more clear and maintainable are rarely used.
i.e., the next time I encounter a spreadsheet in my work that I didn't design that uses column names in named tables for references rather than row/column references will be the first.
Custom software is, to a non-hacker, the definition of "difficult to review, test, and maintain." You may as well suggest they input their data into a mystery box with some pixies inside who will crunch the numbers for them. :)
Remember, Excel (and spreadsheets in general) ARE custom software, written for this exact need, and with decades of polish and bug-fixing so that even non-wizards can use them.
I agree that we should not fly the space shuttle from an Excel macro. But I feel that suggesting economists should write software instead of use spreadsheets is not only throwing out the baby with the bathwater, but also throwing out the bathtub and the whole bathroom too. Specialty tools exist for this very use case. Let's discuss how to improve them instead of indulging our hacker instinct and reinventing a hammer every time we need to drive a nail.
Well, maybe before we attempt to send men with guns to further deprive people of their property and liberty (which is, of course, the end goal of Piketty's work and advocacy of a global wealth task) we should determine whether or not his results are, y'know, true?
I know, I know, truth is such an old-fashioned concept.
And yes, when writing software we should use the appropriate tools for the task. If we're about to advocate the confiscation under force of a significant proportion of the world's wealth, then maybe--just maybe--we should use more complex, more expensive but more complex software development methods than 'bing into Excel and fiddle with the formulæ until they feel right.'
edit: found it:
"My favorite symptom of an unmet need for software is any Excel spreadsheet which is ever updated by one employee, sent to a second employee, updated, and then sent back. Every time that happens a SaaS angel gets its wings."
If Excel code is hard to audit, that's means someone could write a tool to show all the calculations being performed to get to a result. I'm giving this idea away for free because I'm pretty sure someone has already done it.
Writing a new program from scratch has issues, too.
Even the built in features like cell naming are little known and little used by many spreadsheet creators, leading to avoidable mistakes.
I do financial modelling and our firm as developed its own formula auditing (and Excel Swiss Army knife) tool. For example, we can analyze a block of cells and have a visual overlay of which cells contain the same formulas, which cells contain formulas of different types (external file, external sheet, embedded constant, constant, etc..).
This lets us really optimize our time while giving is more confidence that we have properly reviewed an entire spreadsheet.
Maybe there are some tools that can take an xlsx file and make it secure, interactive and loggable and are also free (or nearly so?) Am I missing something?
If anything, it's more a matter of using Excel correctly. Tools like slate for excel make it easier to audit spreadsheet especially other people's.
Explicit really reduces the number of errors you run into - there's a lot of implicit going on in a spreadsheet.
As soon as the structure of your lookup table changes, your VLOOKUP formulas risk being invalidated. After being burned a bunch of times, I've switched to INDEX or a INDEX/MATCH to accomplish the same sort of thing. With INDEX/MATCH, you are required to be explicit in selecting both the range of the data and of the key.
Other people have praised the statistical work done on the book:
So given the controversial nature of the subject, I wonder if we could point to an objective analysis of the work.
I knew someone who ised the spreadsheet supplied with MS Works (a truly hateful piece of software) to create fax cover sheets and notice signs and printed letters.
The spreadsheet was also used to creat Bills of material for electronic subcontracting. These would be printed out and then only the paper copy was relevant. (This worked better than you might think - that paperwork system had been refined over years and all the kinks ironed out. Everyone knew if a part had been ordered, arrived, booked in, kitted, issued to the shop floor, paid for by us and paid for as a final product by the customer. Moving from that paperwork system to sage line 100 was painful and produced a lot of confusion and kludges.
So, people like spreadsheets because they can get stuff done. To you it might seem like they're hammering in nails with the handle of a screwdriver, but the answer is to make better documentation and better software.
(Also I see people using screwdrivers and other handtools and I wonder why noone ever told them how to do it properly.)
I cannot easily swap out one set of data for another, making it hard to confirm my models are running correctly.
How do I diff a spreadsheet?
How do I refactor a spreadsheet?
Also regarding the Rogoff data. Sure, those findings really were crap but surprise, surprise, it was crap that policy makers were oh so eager to hear. One might blame the falsity on the dumb use of spreadsheet but it seems more likely it was a case of "let's fudge data in a crap fashion till we come up with what people want to hear". Then when someone points out how it's crap, we always blame the spreadsheets.
That is a pretty bold claim.
The problem is I've mostly observed his chameleon act in radio interviews pre and post crisis. The pre-crisis interview, he claimed their was no problem and the post-crisis interview, he talked about how angry he was against the folks who claimed there was no problem, so unfortunately I can't provide a link (the files probably wouldn't be online anymore even if I took the trouble to find the link - smart shills do things by voice, it seems).
I have seen a lot of abuse/ over use of excel/VBA
I once had to write CRUD front-end with Excel/VBA activeX components retrieving data from Sybase / SQL Server databases.
Which could have been easily engineered as a simple web-app. "But NO .. IT HAD TO BE A FUCKING SPREADSHEET"
and @seansickle Thanks for linking to my stuff!
One flipside is that, if you know what you're doing, it's very easy to created orthogonal calculations and comparisons in order to check your work. And, if something doesn't compare or look right, to track back step by step and through all the precedents to find the fault(s).
Databases can also create and hide problems, particularly if they are not understood and/or designed and set up properly. And some problems thusly created can be rather opaque, particularly to those without a good understanding of or access to the underlying design and the principles upon which it rests.
As with many things, ultimately I found that the problems lay not with spreadsheets, per se, but with the people using them and with the organizations that tasked those people without consideration for their capabilities (and limits thereof) and without adequate resources to do the job right.
Personally, there has been a time or three where -- in significant measure due to such resource constraints -- I would have been sunk without the flexibility that a spreadsheet -- often in combination with some programming and database work -- provided me. For example, I could quickly and programmatically deal with the 95% that was clean enough, and then manually go through and figure out and adjust the crap.
Ideal? Perhaps not. But then, "business" seldom is. (Nor "life", for that matter.)
Admittedly one reason for this was that all student machines throughout the university had Excel installed and all the students had Excel on their laptops, while Stata was only available in a couple computer labs. And it's pretty much the same once you get out to the working world. No matter where go in the world or what office you walk into, if you sit down in front on windows machine, it will have Excel. Convincing your boss to buy you Stata or SPSS just so you can do a couple of quick statistical tests that you are perfectly capable of doing in Excel is just too much of a hassle most of the time.
One solution is to recognize when a given spreadsheets usage has increased to a point where it - or some portion of its functionality - would be better embedded in an application. Another is to have some users adopt a environment/language like R which addresses the short-comings listed above at the expense of being more complicated and less user-friendly. But there is no simple solution that is going to result in spreadsheets disappearing from use.
obviously you never worked in a big bank. Most traders use Excel a lot, if not in every moment of their work. It depends on the bank of course but in some banks even the trades are made from Excel spreadsheets.
Those millions and billions of pounds and dollars and euros swishing back and forth in the financial centres? Yeah, Excel macros. BE AFRAID.
Matlab Simulink is a very similar environment used professionally to a much higher degree.
There are extensions in some toolbox for graphical diffs, but who uses it? Who can review the diffs in a version control system?
And those systems are pretty hard to debug also.
In a 10Khz loop you cannot step through or set breakpoints and hardly do printf debugging.
1) perl does have some good libraries to read Excel files without exporting
The non technical users can't tell the difference between 2 2.0 ' 2.0'. Usually excel will display the same thing to them.
I recently replaced one of their excels with a web form. "Ughh,una mierda" (its shit).
So I asked whast excel did that the web form didn't. Basically a bit of JS to give some copy paste functionality, and she is now a lot happier than that solution than she was with the excel.
Import and export. I get a lot of love from exporting to Excel from people who use what I make. I assume this feature makes it seem that they are doing a lot of work when they can quickly produce large spreadsheets for the boss. The boss? He usually doesn't know nor care. He knows he approved a check for some computer stuff some months ago.....
The only thing worse than being able to see all these convoluted spreadsheets, is not being able to see the code behind non-spreadsheet systems.
Anyone want to teach "MATLAB for Bankers"?
It's idiocy, but I've seen the poor spreadsheet program warped into even worse uses, but not for business-critical stuff.
Most scientists do not make their code and resources easily available. I'd prefer an open excel sheet to a closed python analysis which is only slightly less likely to be wrong.
Also, you should put some contact info in your profile, or provide it in your comment, if you want to be contacted for further discussion.
I've already successfully done this. The sales cycle can admittedly be long, but it's not impossible.
You have to create a product that addresses pain points for business users. It has to be a compelling product and it has to solve problems for business decision makers. Initially, you want to sell to the business, not to the IT department. It turns out that spreadsheets are mainly used by business users and subject matter experts, instead of IT.
The next issue is price. If you are trying to sell something for (say) $1 million to a big corporation, then it is quite a a long sales cycle (but still achievable). My personal take on this is you do want to eventually be charging several million dollars per corporate client, but you also should have a low cost "entry level" version of the product. The cost has to be low enough that a business power user could charge it to his Corporate American Express without being noticed (say $1,000) and download and install your application (he will need admin rights, which he might in fact have).
> The world of finance and investment banking is a world of networking and extreme resistance to change.
Yes there is a lot of networking that goes on. This can really work in your favor, because there is a sort of herd mentality. Once people start talking to each other about your product, customers will want to talk to you. If you close a large deal, news will quickly spread and other large deals can follow.
I would not agree that there is "extreme resistance to change." In my experience, banks, hedge funds, etc, are often early adopters and one of the largest users of high technology.
> There's a reason Bloomberg's suite of products sits as pretty as it does
I have Bloomberg in my cross-hairs, next to Excel.
> and unless some great upheaval occurs will continue to do so for many years
You are correct that change often requires a catalyst. But I think that great upheavals are happening everyday, but they are hidden behind closed doors, occurring deep inside of companies and departments. In my experience, it is precisely during those times of upheaval that you have the best chance of introducing a new, disruptive technology into an organization. There are many kinds of upheavals that happen. (mergers, divestitures, running into Excel's maximum row limit, etc) and they can all be catalysts for change.
I've put my email address in profile now, thanks for the tip.
Yes spreadsheets have errors, but the only method I know of that is able to get error rates down to near zero is due process. There are good processes that reduce error rates to near zero, but there are no tools that reduce error rates to near zero. The problem is the process, not the tool.
But the tool constrains the process, which is why you should choose tools by appropriateness to the process they are going to be used in.
If it is not advisable, why is it accepted for software engineering professionals to manage their source code this way?
The E?-BNF provides a syntactical schema for the code and the rest of specification provides the semantic schema.
I think we can handle spreadsheets. While I'm a programmer and often find it easy to whip up a program to do what many would do in a spreadsheet, I don't have the arrogance to tell people to not use spreadsheets for serious work.
On the contrary, I encourage it. It's a wonderful tool, and, in the end, superior to pen and paper.