Hacker News new | past | comments | ask | show | jobs | submit login
The Future of Excel (datanitro.com)
139 points by karamazov on Feb 12, 2013 | hide | past | favorite | 113 comments



I think an important aspect of what makes Excel formulas (relatively) useable to people who don't think they're programmers is their functional nature. You don't have to simulate the program's state in your mind because there is no state. You hit return and it shows you what the result is, and if your formula gets too long you can break it up into multiple cells. Giving users a procedural programming language is only going to help a small number of Excels users - the ones who like thinking in a procedural way.

For me, the real pain point in Excel's formula language is in how the path of least resistance leads you towards unmaintainable code. "SUM(A3:K754)" is not very meaningful. If only named ranges were more discoverable and useable, we might end up with more examples like "SUM(ProductSales2012)".

Shameless plug: I'm working on a visual, functional programming tool for querying data that's aimed at people who currently only use Excel: http://querytreeapp.com. I think the advance in browser technology is going to bring a number of tools like this to market to challenge Excel's dominance and hopefully improve to quality of "shadow IT" in business.


'"SUM(A3:K754)" is not very meaningful.'

That's not true. You are looking at that formula in the context of a sheet, and excel takes great care to preserve the formula under meaningful transformations. For example, if you insert a row between rows 100 and 101, the formula will transform to "SUM(A3:K755)" without intervention.


Another thing that Excel does for the user is actively show what is being pointed to. If you open the formula [F2], Excel will highlight all the data references within the formula, making for an intuitive programming language within a data sheet.


What happens when you insert a row at the beginning or end? That's a bit less "certain", for most people, and can lead to incorrect sums.

Re the grandparent, Excel does have named ranges. Some it constructs on its own based upon column headers and whatnot. Others can be explicitly defined.

Unfortunately (again), the tools/widget Excel provided -- back a decade or more ago, when I lasted used it intensively -- were "out of the way" and fairly limited in terms of their interface and behavior/functionality.

Still, if you have e.g. a column headed REVENUE and a column headed COST, you can insert "= REVENUE - COST" into an adjacent column in the same row, and Excel will know what you mean. And... in this case, it is less likely to become "confused" (do what it's programmed to, versus "do what I mean" -- DWIM) if/when you move individual cells around.

P.S. What I do wish, or did, was that Microsoft would better document and explain publicly some of Excel's underlying behavior and edge cases. It can take some time, exposure, and mental effort to trick some of these out.

Excel is a powerful tool that does its job very well. A good part of the problem is that with its ubiquitous distribution and use, much of the time this is like handing a toddler a loaded handgun to play with.

The handgun is functioning just fine. Its deployment and use in this case is, however, less than optimal.


"what happens when you insert a row at the beginning or end?"

Excel's documentation clearly explains the rules: "You can insert blank cells above or to the left of the active cell on a worksheet. When you insert blank cells, Excel shifts other cells in the same column down or cells in the same row to the right to accommodate the new cells. Similarly, you can insert rows above a selected row and columns to the left of a selected column. "

So lets say you are looking at the formula =SUM(A3:A5). If you click A3 and hit insert, A3:A5 are shifted down to A4:A6 and the sum formula is =SUM(A4:A6) If you click A5 and hit insert, A5 is shifted down to A6 and the new row is inserted, so the formula does include the new cell and you see that: =SUM(A3:A6).

The rule is fairly defininitve, and the BIFF8 Rk transformation is fairly straightforward


"for most people"

I've seen a lot of incorrect ranges, often the result of subsequent grid manipulations.

With hopefully honest and sufficient humility, I submit that we are not "most people" -- in this respect.


> "SUM(A3:K754)" is not very meaningful. If only named ranges were more discoverable and useable, we might end up with more examples like "SUM(ProductSales2012)".

Convert your unstructured cells into an excel table, and besides all the eye-candy, ability to filter and sort your data, you already have that.


This. Say what you will about the Ribbon interface introduced in 2007, but Tables are really the reason to use Excel.


In iWork Numbers, you can refer to cells by the column and/or row name, if they have been provided. I believe this works with ranges, as well.


I'd call Excel more declarative than functional.


you have to replace Excel's every possible use case. Actually start with the top 25, and then.. you might have a "decent" Excel alternative.


It strikes me that excel appeals to people for exactly the same reasons that lighttable appeals to developers. You put numbers and equations in and instantly see the results. If you need to do something complicated you can build it up, one step at a time and see how the numbers flow through the system. You can even try different scenarios by just tweaking the numbers.


Yes, for me at least. (IAMA non-coder, Excel power user)

SQL is the same way- it suits my "power user" style by allowing me to type in a piece of script, hit the checkbox to find errors (or run it in a test environment), debug, rinse, repeat. Then I can stitch together all these pieces into one giant operation and let it run.

I would love to learn Python but as OP says I also have a job to get done. Every time I sit down with a Python tut, it seems (maybe just my perception) that there's so much fundamentals, coding, etc before I see the output what I'm actually doing. That there is a larger upfront investment than there is when learning Excel or SQL.


Since you are an Excel power user who also knows SQL, I would highly recommend that you check out the O'Reilly book "Python for Data Analysis." Pretty much the first chapter just has you very quickly run through examples of using a library of Python called PANDAS. Python is an intuitive enough coding language that you can focus on using it first, understanding it later, since large segments of it will make sense to you off the bat.

I could go on and on, but the short order is this: Pandas has an object called a DataFrame which maps 1 to 1 with a sheet from an Excel file. You can pull in two sheets of data from Excel, do merges, joins, iterations, you name it. I am a professional who also "has a job to do", and after two 1 hour sessions in the evening with Pandas, I immediately began using it at work to manage the huge quantity of spreadsheets my team is generating, combined with output from various databases. Anyone who knows SQL will love Pandas, because it allows you to very simply and easily conduct SQL style set logic on data without having any of it in a database.


...you know that there's a significant group of, uhm... "programmers", that are really afraid of SQL and do everything to wrap it in things like ORMs and hope they'll never have to write it or think about it, right? As a programmer, I only started to like and "understand" SQL after learning more about the underlying relational algebra and seeing its beauty, but it was just a stream of WTFs for me before that... and some kids that I recognize as "thinking the way I do" can learn programming in Python starting from 0 in less than week but look at SQL with a complete "wtf" face ...I guess different people really think differently ...sometimes it's scary how differently


It's almost as if a query language is a good way to query a relational database... Hmm.


indeed ...but: try explaining to someone who has just went through the effort of picking up a general purpose programming language why does he need to learn "a retarded query language that doesn't even have functions, loops and assignments" :) ...of course, you can answer him "well... but actually it does" and expose him mercilessly to the world of sql extensions before he even learned basic sql.


I don't think of SQL as programming -- programming is telling the computer what to do, SQL is describing the data that you want and letting the computer figure out how to get it to you. Regular expressions work the same way.

The worst SQL is done by people who treat it like a programming language.

I've taught SQL to non-developers and most pick it up pretty easily as long they're taught the right way.


Small correction: The worst SQL is done by people who treat it like an imperative programming language.

SQL is a declarative programming language. It's probably much easier to learn for people with experience in functional programming (where you also specify what you want, not how to get it), and tools like Excel, than people who are used to thinking about telling the computer what to do.


"SQL is describing the data that you want and letting the computer figure out how to get it to you."

This is known as declarative programming, and is the basis for functional programming languages like Lisp and Haskell. http://en.wikipedia.org/wiki/Declarative_programming

It's a different paradigm, but one worth exploring if you're only familiar with imperative OO.


You also might want to look into R (I would recommend using R with RStudio, a fairly powerful, but easy to setup IDE). I have a feeling that if you know SQL, than learning a programming language is well within your abilities.

The biggest difference that I have seen regarding the Excel vs programming issue is the level of transparency that is provided in each. In excel, you get immediate validation of your functions and operations because you can see the results at each step. However, the formulas/functions themselves are 'hidden' and as the number of operations gets larger, it gets increasingly difficult to track and control the process. This makes debugging a nightmare. With programming, the results of your operations are much less transparent, but as you get familiar with the language you learn tricks to quickly summarize your data in ways that provide meaningful feedback for what you are trying to accomplish. More importantly, your script contains a concise record of all of the data manipulations and operations that you are performing, making debugging a much simpler task (compared to a monstrous spreadsheet).

An additional benefit to programming is that, after a certain level of familiarity with the language, your productivity increases dramatically. People using excel tend to be much more deliberate and cautious when creating the spreadsheet because debugging is such a pain, which means they tend to operate at a much slower pace. With programming, you can quickly pound out a script (often without even testing the various pieces of it) because you know you can easily debug the thing afterwards. From my experience, the difference in working styles results in almost an order of magnitude difference in productivity (what would take me an hour or two to create with a programming language could take a day or two to whip up in excel) and even with the extra deliberation/time, errors are still just as prevalent in excel. (And this doesn't even take into account the fact that it is much, much easier to reuse code than it is to reuse a spreadsheet).


RExcel is also very nice for the transition steps.


If SQL is the non-coder, power-user database access, what is the formal programming access method? Genuinely curious.


Bingo. This and the ease-of-use/approachability for non-developers are why Excel is still absolutely huge in the finance world, and why it's not something you just "grow out of" when you start hiring developers.


I agree, but the easy evolvability of a spreadsheet is a weakness as well as a strength. When you lower the barriers to entry this much, you get "developers" who don't have much experience with development. This is a good thing when you are creating simple spreadsheets that deal with minor matters. It vastly lowers the cost and speeds up your ability to get answers to questions that aren't worth a lot of resources.

It's a bad thing, though, when it is used for a very consequential analysis. If the right answer is very important, you have to do what it takes to make sure it's the right answer, regardless of the tool you use. It's okay to use Excel (I love it), but you really ought to have an "expensive" developer using it.

I was doing a contract for a financial firm, and I prototyped my solution with Excel. They were so pleased, they started using the prototype immediately and begged for additional features. I complied, but told them I should do some serious testing first. They were confused by this. "What is this 'testing' of which you speak? Are you saying it doesn't work?"

I tried to explain "testing" to financial people who had been using Excel for years. They weren't buying it. "If it works, it works," they informed me. I had to sneak my testing in to the new feature dev schedule.

At a later point, I told them I couldn't add anymore features until I had refactored what we already had. Of course, if they didn't understand testing, they really didn't understand refactoring, but I had noticed similar functionality emerge in multiple locations, and I wanted to put it in one place, test it, make it unmodifiable by users, call it from wherever it was (or would later be) needed, and use my testing system to make sure the spreadsheet still produced the same tested outputs from the same inputs after each refactoring.

These experienced Excel users were quite sure that this "refactoring" nonsense was all gobbledygook that had nothing to do with Excel. "You don't have to do all that stuff if you use a spreadsheet; that's why people use spreadsheets."

Again, I think Excel is a terrific tool, but as self-serving as it is to say so, I think that if you are using it for very consequential analyses, you really need to treat it like a "real" software development platform and use "real" software dev people and practices for your development.


You can already embed Python in Excel. There's a few different projects that do this; here's one: http://www.bnikolic.co.uk/expy/expy.html

The article doesn't do any close human-factor analysis of why Excel is so much more accessible than programming languages; it's the lower abstraction bar. No need to name variables in functions when you can just click on the cell you want. No need to create loops when you can drag an already entered formula across a number of cells. Excel is a direct-manipulation programming environment, not a language, and a language can't compete here.

But I wish them luck; they diagnose the problem reasonably well, if not the solution.


How does adding Python to the spreadsheet paradigm improve the fragility problem at all? The major problem with spreadsheets is a dangerous mix of ad-hoc model definition and data. An author can build ad-hoc structure, but has no easy or default way to ensure that the structure is consistent or valid, and it's incredibly to accidentally change both data and model with bad keystrokes.

What really needs to replace spreadsheets is new ways to define models that are as easy-to-use as ad-hoc spreadsheets. For instance:

  * A single table should only contain one set of data adhering to the same schema.
  * Formulas, formatting and validation should be defined per-column, not per-cell.
  * Typing and pasting into ranges should be disabled by default.
  * Auto-import from other sources into a table needs to be much easier.
Excel actually contains some of these features, but until it removes or disables the ad-hoc nature it'll still be dangerous. Maybe it should add a "safe mode" so that financial institutions can have an easy migration path from such high risk.

I've spent a lot of time migrating financial apps away from spreadsheets. Some of the errors I've seen could have caused many millions in unnecessary losses, maybe did before I got there. It's insane, but this doesn't seem to fix it in any meaningful way.


I'm one of the (10) original developers of Excel.

I would say people love spreadsheets (Excel just happened to be a particularly well done implementation of a spreadsheet when the GUI became the most popular desktop platform) because:

a) Easy visual data entry - everyone understands a table of data.

b) Non-procedural programming model.

c) Simple analysis tools (batteries included): charts and pivot tables.

There just has not been much imagination applied to improving on this model - there are great gobs of ground for innovation for someone willing to take a stab.

1) Make spreadsheets "functional" - i.e., allow a whole spreadsheet model to be used as a function definition (I was amazed when I first learned spreadsheets and found they lacked this capability).

2) Make it easier to sling large data sets around - and especially share and work with dataset repositories.

3) Better hooks for developers to integrate spreadsheets with more advanced functionality and features. There is too much locked up in the spreadsheet as a monolithic application. I can publish my Python program easily - why can't I do the same for an application developer in "spreadsheet language"?

There has been relatively little thinking about how to organize and manage collections of spreadsheets and models. Most people still deal with one spreadsheet at a time; it's its own little island - largely cut off from the rest of the world. In the era of cloud infrastructure and hosted data, why aren't there better systems in place to manage data and models in the aggregate? What if you married the best of GitHub and Excel together, for example?


This comment is amazing. It deserves a lot more attention in this thread. Thank you!


Nice of you to say so...


"Excel is like a user-friendly nuclear power plant. An amateur operator might be able to keep it running most of the time, and most of his mistakes won't be serious. But the wrong mistake can blow up your business."

Yeah, running huge Excel sheets is a risk. Running huge Excel sheets with VBA is a risk and a pain. Powering Excel with python is a step in definitely a step in the right direction and would save me personally a lot of pain and boost my productivity right now.

Unfortunately it is a step I can not take. Excel is supported by Microsoft and Excel Bloomberg plugin is supported by Bloomberg. DataNitro is supported by a company founded last year that doesn't have single person's name in their About Us or Contact Us pages. If I was to build a sheet depending on their technology, would a wrong misstep two years down the line blow up my business?


We should not confuse procurement risks with usage risks. In general, the risk with Excel, as indicated in all these articles, does not come from the vendor but from the user. Your systemic risk is always bounded below by the highest risk you are facing, which is user incompetence. Even with power users you still get errors. The main negative differentiator for Excel is that these errors are sometimes hard to identify.


Hi Aleyan, I'm one of the founders of DataNitro. We went through YC last summer, and the company's stable and funded. While I can't guarantee that we'll be around forever, we will be here for at least a few years. If you use DataNitro for your business, we'll work with you to make sure that our technology works for as long as you need it.

I'll add some personal info to the "About Us" page soon, too.


I've used Excel Bloomberg plugins before and I can vouch for their power and frequent use in the financial industry.


Many developers think they know the solution to a problem before understanding the problem. They jump to the solution, so to speak. This is exactly what HN has been doing about Excel the last few days... if you want to replace Excel, spend a year building models in it, then you'll see how it's harder to replace than teaching someone development.


Agreed. I'm a non-programmer, Excel power user for finance. The HN community seems to think that Excel is the primary cause of all these financial catastrophes and that some new software solution will prevent them going forward. In reality, the problem is user error, missing checks and balances, and oftentimes, complete lack of common sense.

An organization that doesn't have the discipline to incorporate some checks into its influential financial models or doesn't have quants who can "back of the envelope" gut check the model is going to make errors regardless of what number crunching software it uses.

I could see a consequence of taking the calculations normally in Excel formulas and making them even more blackbox so only the software engineers can check the calculations to lead to MORE errors, not fewer because its less accessible to the people who know the math.

I could go on and on about how mid-sized private equity shops like mine would never adopt some proprietary solution over Excel because then they'd be too dependent on the developer to make changes, but that's not the point.


Guess it depends on the individual use case. Escel is great to use for data stratification and basic data analysis. One of the biggest advantages of Office 2010 was, at least from my poijnt of view, the abolishment of the 64k line limit. I never got on terms with Access and having to look at huge amounts of data right out of SAP got me in trouble with this limit before.

What Excel should never ever be used for is a real application, e.g. for production planning, if there is such a system in place. I've seen myself what damage, even if it's hard to quantify the amaount, this can do. But this is not a problem of Excel per se and more a use / user issue.

Agree that this is dependant on the size of the operation. For a small machine shop doing everthing in Excel is absolutly OK. For bigger operations, well, not so much.


> the problem is user error, missing checks and balances, and oftentimes, complete lack of common sense.

You are somewhat right, but I don't think you get what we are saying. My background: I was a developer tasked with integrating cost models that were implemented completely in Excel. The problems we had were not because the organization didn't have checks, but because Excel makes it Very Hard to do such checks on a spreadsheet with sufficient complexity -- or perhaps makes it Very Easy to make it hard to do that.

It's easy to say "The model sucked because they built it poorly", but honestly that's disingenuous. The models sucked because the tool was pushed too far. There were VB functions with it which were well-written - it was the mess of calculation logic that was hard to deal with. Here are a laundry list of the types of problems I encountered, and I'll go into more detail later on what programmers want instead.

- Debugging an error (bad value, NaN, etc) meant tracing the parent formulas through several sheets, with no way to unit tests smaller calculations that were used as part of larger ones. Large amounts of stuff were done with table transforms, because Excel makes that Easy, but it was hard to understand what it was doing because of the dependencies.

- Moving input cells, or copy/pasting output tables, often meant that later versions of the sheet had an ever-changing interface of where you write your inputs and read your outputs (its API, essentially), which had to be constantly updated, and then debugged.

- Formulas that had any decision logic in them nearly never had humanly readable names, but rather were cell addresses -- does 'intermediate_calcs'!C42 refer to the Frobulated Foo, or is it the un-normalized Foo before we frobulate it?

- Results tables had frequent copy and paste errors, where they referred to the wrong sub-range of some more-detailed results page. These are really easy to do wrong, because it's hard to tell by looking at the formulas what you're referring to -- since regions and the like are quite often left un-named.

Why is this bad? What do programmers see as missing in Excel? For me, it's all about testability and source control.

In contrast, developers are accustomed to building things by aggregating smaller pieces of logic, each of which has a unique name, and code comments to describe its purpose and intended use, and with well-defined interfaces for communication between the pieces. We write tests which ensure that each little piece works right, tests that verify that we are passing them the right data, and tests that verify that the results we get match what we expect. (This sort of testability is Very Hard to do with excel.)

The code we work in are stored in text files, which means that everything we do is traceable in Version Control -- you can ensure that you have the correct version of a file by more than its checksum or modified date, and can see the logic changes which were added to fix a problem. Variables have names like "foo_frobulated" and "foo_normalized", so it's much harder to accidentally use the wrong variable for a calculation.

Many times formulas in Excel start off simple, and later become very complex in order to account for exceptional data. Code does too, but it tends to be more readable, so you can more easily notice when you make a mistake.

[edit: finished the paragraph that I prematurely posted.]

edit 2: You make a VERY good point about the accessibility to people who know the math. The math's correctness is sometimes hard to validate when there's a mistake in the inputs -- mistakes which can be hard to find. You're correct that describing parts of it in code might make it more complicated, but I feel that the ability to more easily sanity-check the inputs, outputs, and intermediate calculations of The Maths will make for a more robust calculation tool.

Programming isn't wizardry, any more than SQL or creating complex Excel tools are. Excel presents data well, as complexity scales it's easy to introduce human errors -- whether in inputs, calculations, or outputs. Software development has mostly solved that through tests and source control, which is why we rail so hard about Excel. It would be like watching someone struggle with talking with someone remotely by spooling a Very Long wire, when you're standing there holding a cellular phone.


the problem is that excel is a very versatile swiss army knife and that any replacement has to be flexible.

It's actually not particularly hard to build the individual components (I put this together in a weekend:

http://niggler.github.com/js-xls/

http://niggler.github.com/js-xlsx/

(pure-JS in-browser parsing of simple xls and xlsx files)

but it is hard to match the workflows.


Maybe our definitions of "not particularly hard" differ, having looked through some of your source embedded in the page...


I think that's absolutely true. Many of the most highly compensated people I know are in the finance industry and work all day every day in excel. The rest of them are programmers.

If you had to hire people who had the domain talent and expertise of the former, and the programming skills of the later, you'd be looking at seven figures per in employee and there'd only be so many to go around at any price.

But that's what you'd need. A bespoke system that could only be effectively modified by programmers would be useless in short order no matter how great it was at launch.


Excel is a pretty amazing piece of software. Previous place i worked at -- a medium-size insurance company -- did all of their monthly expense calculations in uber-complicated spreadsheets and then exported the results to AS/400. Part of their issue was that training staff became an increasingly complex task because of the structure of the data in the sheets some of which had to be imported from a 3rd party source. I have never seen Excel used to this extent until i was brought in to convert these sheets to a web app. We ended up building a custom DSL based on mathematical expressions, an in-browser IDE for accountants to configure formulas in the system and a full-blown custom report builder. It was by far the most complex project I've ever designed and it had an immense impact on how i look at advanced Excel users today.


It's interesting to see this discussion of the future of Excel and complaints about its language. The fact of the matter is that the language is an evolved artifact with an incredible history. Spreadsheets started with VisiCalc. Then came Lotus 1-2-3, and Quattro. Finally, we have Excel and Numbers.

The language that Excel uses is odd because it was forged in the fire of competition between vendors. Each of them had to support their predecessors' idiosyncrasies to get adopters. No one was able to make a clean cut at spreadsheet languages and make it stick. I know there have been some attempts. Back in the days of the spreadsheet Cambrian explosion there were even some oddities like 3D and nD spreadsheets. None of them could ditch "the language" that we are left with today.

It's a shame. I think a good "ground up" redesign of the spreadsheet would be incredibly useful if it was done and people bothered to use it.

The thing that makes it hard to revisit the spreadsheet is that it is "the programming model" for non-programmers. Non-programmers just want to get their work done and they bias toward familiarity.


The beginning is silly: "Why do people keep using Excel to build giant, error-prone spreadsheets?"

This could just as easily be: "Why do people keep using programming languages to build giant, error-prone applications?"

I can't even read the rest of it. Excel is so successful because it lowers the bar for creating applications. They might not be pretty, and they may have expensive bugs, but its like everything else in the world.

I'd bet the amount of value Excel has created in the world far outweighs the consequences of any bugs in the worksheets.


There already is (and has been for some time) a "better Excell" that'a also a "better Ms Access" at the same time: Quantrix (http://www.quantrix.com/), it can replace brittle spreadsheets with much easier to manage documents and do much more

...but there's a problem with it: it's just too much for 99% of people, both as features (It's probably easier to learn a general purpose programming language) and as price :)


If someone would: (a) develop a web-based version of it, (b) make it easy to use (hard part 1), (c) make excell (ab)users realize that they really need it (hard part 2 - maybe impossible? :) ), (d) make it extendable with multiple general purpose programming languages (python, ruby, ocaml or f#...) and (e) give a "personal/student version" for free to get people hooked on it, it would really shake things up in this area...

P.S. If you get filthy rich from my 5 point plan above, please remember to give me and my company a life-long full-support license for your software :)

P.S.2. ...or open-source it and become the Linus of financial analysis software ;)


Yeah - I recall looking at them a few years ago when they allowed for "multi-dimensional" spreadsheets. One of their other big selling points was that they were less error-prone than spreadsheets.

http://www.quantrix.com/Quantrix_vs_Spreadsheets.htm


Excel is excellent for simple formulas in a single spot. It sucks in two points: complex/ugly formulas (anything that requires a switch/case statement or nested IF's) and heavily repeated(copypasta) formulas.

Here's what I can imagine as a real improvement: Allow me to write a piece of code that does the job of a function, but allows to touch multiple cells - and integrate it into excel as deeply as 'normal' formulas.

This means the following things that current macros don't really do:

a) Show the "formula" in the formula bar when selecting a result cell (it will usually be only a couple or dozen lines long anyway), and allow editing it in the sheet, not in a separate editor;

b) When a result cell is selected, highlight the data source cells (as is done for 'normal' formulas), and the other result cells;

c) When I change a source cell, recalculate all result cells automagically as for normal formulas - this likely imposes a 'no-side-effect' restriction on the code language.

d) For the "stdlib" use the same function names as current excel formulas, so you don't have to learn a new set of them.

Simple, and very useful - could LibreOffice handle that?


Perhaps in the pop-up editor, the names of the parameters could also show their current values on hover (where possible), so that one could debug the function live, in a similar way to the sheet showing live numbers most of the rest of the time (with show-me formulae).

Also, by doing functions like this, one could have some documentation too...


Matrix functions and named ranges are the most underused functionality of Excel. Collectively they can replace a large amount of the conventional spagetti VBA code, emphasizing Excel's true strength.

Simple example: CDO pricer using Gaussian Copula and base correlation https://dl.dropbox.com/u/10755342/120626_Simple_CDO.xlsx


You mean array formula (the {=....} jazz). Matrix functions refer to functions like MINVERSE


Not only that, regular algebraic operations also work with vectors/matrices as inputs, as long as you press CTRL-SHIFT-ENTER after editing. This means that most functions with complex MATCH and OFFSET can be replaced as well (i.e. SUM( (column_vector = cvalue) * (row vector = vvalue) * (matrix))


Does anyone remember Resolver One? The proprietary Excel-with-python clone? I wish they'd release the source.


Loved that tool. For those who didn't know, they EOLd it. Sigh. http://www.resolversystems.com/eol.php


The same company also did a cloud-based version, called Project Dirigible. You can still sign up for beta access, but that also now has an End of Life notice up. Maybe this time the source will appear.


Yeah it was a pretty cool tool. It's a bummer that it got EOLed.


A couple of problems with Excel [edit: actually just spreadsheets in general], which I think fundamentally restrain it:

1. Difficult change control. It is very difficult to diff spreadsheets. So changes to a model are very hard to verify for correctness.

2. Massive code duplication by design (eg. the fill function). So it is easy to introduce an error by failing to update all relevant cells.


1. The review functionality goes to a very granular monitoring of changes in a spreadsheet. It is indispensable, especially when many people have access to the file. It can be circumvented, of course, but we are talking about unintentional errors. 2. In Excel you get very evident visual indicators when a formula is dissimilar to the formulas around it. Duplication is necessary at the most ad-hoc level. When you use Tables, things are more safe.

The major issues with Excel are not of functionality but of user training on the functionality available.


> The only way to prevent people from running important systems with giant Excel spreadsheets is to present a better alternative.

I think the spreadsheet can still be improved. Macros can help a lot with this, but there are a myriad of other ways you could improve upon excel that gives a smooth transition from "casual user" to "power user". Read "A Small Matter of Programming"[0] for more!

[0] http://www.amazon.com/Small-Matter-Programming-Perspectives-...


This is one of the beauties of Matlab and similar languages. They look and act like excel but more explicit and powerful. (and harder to use) Matlab documentaiton is very similar to excel documentation and the ability to run m-files with the touch of a button gives a lot of 'reactivity' that other languages lack. Also a live, easily accessible and manipulable variable editor makes it much easier to work with data.

I would really like a program that wrapped something like Matlab around excel. Basically something that let you work in excel then let you hit a button to 'vomit' the variable space into Matlab code. I often find it much easier to manipulate data in Excel and often find myself working in Excel until it becomes too burdensome then swapping over and converting into Matlab or Python. I think seeing a language 'live coded' and being able to alter the live code would allow people to bridge the gap more easily. It would also fix the debugging issues as each spreadsheet would also contain a executable code that could be implanted into other spreadsheets easily, (it would also contain a number of assumptions made explicit like what variables it takes in and what it kicks out.) Of course how strings etc are handled is a messy question.

A similar tool exists for the open source scientific image altering toolbox-ImageJ that allows it to 'record' your actions and easily loop/automate your code by turning your calls into code as you perform them. I've found non-programmers can easily learn how to write 'code-like' actions into this macro language.


One of the problems is that Excel is used for tasks that it isn't designed for.

I'm currently plotting a lot of graphs in Excel (2007) and the experience is just dead awful, wanting to quickly change parameters and ranges and sheets to plot is a nightmare. So naturally, I went to find a good specialized tool for this but haven't come up with any. Is Excel cannibalizing the market for such tools despite it being so bad at it?


I found R to be the solution for me. I faced the exact same problem when I wanted to quickly plot lots and lots of X-ray spectra. It was slow and tedious to do in Excel because it involved repetitive plotting of similar data.

I put my data into Access, then accessed it using R via ODBC. The cost was the time to validate my data and get it into a database. The benefit was that subsetting and plotting the data became very straightforward, and the source of error was the query and plotting script. Transposition, highlighting the wrong cells, not pasting the correct data, etc were eliminated as sources of error.

Plotting via a script is to plotting by hand in excel what using a computer is to photocopying, cutting and pasting physical paper.

Repetitive plotting and storing data: tasks for which excel is not really designed.


You could code this with Python 2.x (may also be able to get it working with 3.0-3.2), matplotlib, and xlrd. You get professional, highly configurable plots in vector format, but of course give up the interactivity of Excel charts.


I've used gnuplot; though it requires you feed it plain text files.


Excel really is programming. It is classified as cell-oriented dataflow programming. Functional Reactive Programming is a close cousin to such paradigm.


Classifying such languages is actually part of my PhD thesis, so I can authoritatively :) say that while merely adding the concept of time gets you to functional reactive programming from excel, they aren't really related because FRP is so much more powerful.


This sounds like Resolver One, which was a Python based SpreadSheet (using IronPython) from a few years ago. I wish these guys better luck.


I would love to see a good open source Excel clone. It does not need to have all the unnecessary features of the recent versions. IMHO Excel 97, the version that added VBA, was just about perfect.

Adding support for Linux, other "macro" languages and easy interfaces to popular open source projects, which MS will never do, could potentially make it the dominant spreadsheet.


I abandoned LibreOffice's spreadsheet in favor of Gnumeric.



Perhaps but despite being nominally open source interference by Sun (and later Oracle) have crippled the OpenOffice project. Back in 2010 most of the developers left for LibreOffice. Currently it seems to have become an IBM sponsored project after being spun out of Oracle to Apache. Not sure if this history has made it impossible for OpenOffice to succeed.

However you are right OpenOffice does deserve another close look as does LibreOffice. I wish there were alternatives with less baggage.


Currently it seems to have become an IBM sponsored project after being spun out of Oracle to Apache. Not sure if this history has made it impossible for OpenOffice to succeed.

I'm not sure why you would think that. OpenOffice is progressing nicely in it's new home at the ASF. And calling it "IBM sponsored" might be a bit of a misnomer: IBM has contributed a lot of code and has paid developers serving as committers, yes. But one of the very points of the ASF incubation process is to ensure a sufficiently diverse community around a project, to where no one company has control of the project.

There's a lot of great work going on in the AOO project, and I heartily recommend that anyone looking for an office suite give it a look. It is definitely not the case that "LibreOffice won and OpenOffice is dead". We have two projects now, with a lot in common, but evolving in slightly different directions and - to some extent - competing with each other. Personally, I think this is good for the ecosystem as a whole and that both projects are becoming better as a result.


My understanding may be out of date. The history of OpenOffice is hard to follow: Star, Sun, Oracle, IBM, Apache. Last I remember IBM was merging Symphony back into OpenOffice and committing to continued development. Hard to tell how significant they are today to the overall effort.

It's mostly a good thing when large companies like IBM contribute to open source software. However you are exactly right when you mention the importance of having a diverse community. You can't depend on any single companies interests being aligned with all the users and developers on an important open source project. This is my big worry with MySQL today.

Almost everyone should want their to be a good alternative to Excel which seems to get worse with each version. I plan to give both OpenOffice and LibreOffice a try over the next few weeks. Eventually I want to be 100% Linux but Excel and to a lesser extent Word and Powerpoint is still keeping me on OSX (BTW - Excel on OSX is really crappy). I hope you are right that two competing projects are better than a single effort.


Last I remember IBM was merging Symphony back into OpenOffice and committing to continued development. Hard to tell how significant they are today to the overall effort.

Disclaimer: I am technically an AOO committer, but I'm not particularly active in the project right now. That said, I monitor the mailing lists from time to time, and from what I can see, a lot of the Symphony code has been merged back into AOO, but not all of it. And there's no question that IBM is a substantial part of the AOO community. In fact, that is probably one of the reasons it took so long for AOO to graduate the incubator... there was concern that the project was too "IBM heavy".

But, for better or worse, things seem to be moving forward and, personally, I'm excited for the future of the project. The one thing that disappoints me, however, is the perception that there's still some sense of conflict between AOO and LO and that the two projects don't collaborate as much as they could. I'm hoping that - over time - any sort of adversarial air will die off and the two projects will be "friendly competitors" but time will tell. shrug


LibreOffice has Python 3+ now baked-in (i.e. part of the default install).


Thanks for the tip. I would love to have access to all the Python libraries in a spreadsheet. In the past I have had to build custom C++ libraries for Windows Excel to do complex calculations quickly. It was quite tedious and involved lots of undocumented tricks.


There's a name for this product space! "Spreadsheet controls" describes the product space inhabited by various products that aim to help businesses manage the terrifying, complex Excel workbooks whose contents influence business decisions in subtle, interrelated ways.

Microsoft recognized that this was an important hole in the way enterprises used Excel, which is why they acquired a company in this space, Prodiance, in 2011. Office 2013 integrates some but not all of the company's products in its "enterprise risk management" / "spreadsheet controls" functionality; see some discussion at http://blogs.office.com/b/microsoft-excel/archive/2012/09/13... .

It was really interesting to read the prior blog post on this topic ( http://baselinescenario.com/2013/02/09/the-importance-of-exc... ) because it sounded so familiar! That blog post is spiritually identical to about the first third of the Prodiance sales pitch.

The "controls" space has a surprising amount of depth. Just a few problems that these folks think about:

(1) Excel spreadsheets are not a very good computation engine: business logic can be subtly altered by a typo that includes one too few lines in a range or a copy-paste error that accidentally includes a constant instead of a calculated cell. It turns out to be possible to programmatically inspect a workbook for "risky" or "possibly wrong" calculations and warn on their presence.

(2) Chained dependencies can be really hard to track: some businesses use one spreadsheet that refers to data in another spreadsheet that refers to data in another spreadsheet… and you have to hit "refresh" in all of them if you want your changes to propagate correctly.

(3) Data can be changed maliciously: by leaving some text white-on-white, by using Very Hidden sheets, or by deliberately changing a formula, you can put misleading information in a quarterly report to hide bad news!

(4) It's hard to find "one version of the truth": people constantly e-mail around things like "quarterly report - new calculations.xlsx" or "quarterly report - new calculations - REVISED USE THIS.xlsx" and we just have to hope that business processes use the right one. In some kinds of businesses, people really want to see a complete audited list of "important spreadsheets" and "all the versions of them" and "how they have changed and what specifically has changed".

These are all problems that can be solved at other layers — for example, this blog post advertises "move your business logic to Python with DataNitro/IronSpread" as a solution to problem (1).

If DataNitro's plugin is going to help provide "enterprise risk management", one really interesting thing to hear would be what their plan is for auditing. How can people see what ancillary Python code is attached to a workbook and how it's changed over time? That is, what assurances can someone get that the generateQuarterlyReport() method is working as designed, and can someone get an e-mail when it changes?


It's hard to find "one version of the truth"

This is where network shares, Sharepoint, and so on come in handy.

Even more so if Excel supports revisions the way Word apparently does now.


I love excel. Users love excel. Love it so much to the extent that we actively embed an excel sheet in our internal apps now instead of any third party "data grids".

Why?

They're easy to work with, copy/paste across apps, easy to create interface logic - even so much that the users can "code" up the sheet and we embed it and bind the dataset to a database or service layer.


I can't just install a random EXE file on my finance computer without knowing who you are. Even if I felt like this was a great product, there is no way I can risk my client's data to save some time.

I'm honestly not sure how you get over that hump, but you should think about it, hard.


The STEPS system provides a spreadsheet that is a simple extension of the system language: http://www.vpri.org/pdf/tr2011004_steps11.pdf


Great idea but not sure who is going to use this. I can't see enterprises installing this on their desktops and as someone already pointed out the larger enterprises have been using http://www.clusterseven.com or http://www.prodiance.com for years to control their spreadsheets


Yes, I agree that Excel is a programming language. This is one of the ways I learned programming also! Good way at looking at programming.

However, VBA is not the answer for most Excel users to move past because the expectation is set by Microsoft; it requires nothing more than introductory courses and common sense. This is where most people will use Excel.


As has been mentioned elsewhere, Excel users are programmers, whether they know it or not.

What they need is the discipline of software engineering, a completely separate discipline from programming. If Excel spreadsheets had better encapsulation, internal documentation, automated testing, and the like, the world would be a better place.


Slightly off topic, but are there any good books or other sources for experienced programmers who quickly needs to get up to speed on the more powerful features of excel/VBA? I've just been asked at work to take a look at a huge importan excel 'program' that's not quite working the way it should.


Excel VBA has libraries, they're just called references. It lacks an exception library, but it does have a similar style of attempt-fail-compensate (though you've got to put up with ugly labels). Personally I'd use IPython Notebook over it, but that's not always the reasonable choice.


Can someone explain why Excel disallows cycles (recursion) in formulas?


This is configurable. You can enable recursion and set a max for the number of iterations. You can implement relaxation-based differential equation solvers that way :).

Edit: You can also solve Sudoku this way: http://office.microsoft.com/en-us/excel-help/microsoft-excel...


The more general answer is that cell updates could just keep going forever, never settling down, or at least taking a long time to settle down. There isn't a good one-size-fits-all way to resolve this. Plus, these sorts of structures are most likely typos by the people writing the spreadsheets.

The more specific answer is that Excel performs a topological sort on the cell dependency graph in order to update cells in an efficient order.

Edit: as someone else has pointed out, if you really want cycles, there's an option to tell Excel how to terminate computation for your particular infinite loop use case.


It's possible to turn on circular calculations and to limit the number of iterations at a global level (I've never seen it limited at a local level), which could be used as recursion to some extent. I think it would be quite rare to find a case where circular calculations would be a better solution than doing the same thing in a macro, writing a custom function or finding a closed form solution.

The biggest problem with circular calculations is that they may not converge fast enough (before the limit is reached), and if the output is important that's not a good feature to have.


Suppose the formula in A1 is =A1+1. What should be the displayed value in A1? How long should it take to display that number? This is a simple case, which clearly diverges. There is no simple way to determine from a general formula whether it will be convergent or divergent. In the interests of definiteness, Excel defaults to just disallowing formulae with this kind of indecisive output.


There is a way to enable iterative calculations (which allows you to use cycles) (in the formula options) but note that it is disabled by default.


So, 6 months until the TDE (Test-Driven Excel) book comes out?


Why people don't use Microsoft Access when they need more complicated logic?


how does putting a "real" language on the backend of spreadsheets help fix what happened in London? seems like it's just a bigger gun with which to shoot yourself in the foot.


VBA should be replaced by VB.NET or (Iron)Python.


Excel can be extended using .Net for quite a while now... this even allows you to add to the functions within excel. Writing Custom Excel Worksheet Functions in C# http://blogs.msdn.com/b/gabhan_berry/archive/2008/04/07/writ... How To Create an Excel Macro by Using Automation from Visual C# .NET http://support.microsoft.com/kb/303872/en-us How To Create an Excel Macro by Using Automation from Visual Basic .NET http://support.microsoft.com/kb/303871/EN-US


Given the basic programming ability of the folks using Excel, I would expect something more along the lines of BizTalk's orchestration designer or ChipWits.

and, yes I have spent the odd moment programming such a thing


Why would you prefer those to Python? (And to every other language?)


VBA is a surprisingly flexible language (although it does have many quirks) and is visually much cleaner than python.

Read the MS-VBAL spec: http://msdn.microsoft.com/en-us/library/dd361851.aspx


> and is visually much cleaner than python

I must respectfully disagree. Python is required to be visually clean by making indentation part of the syntax.


visual cleanliness is not just about whitespace. There are other considerations like how well the clauses stand out:

    If x > 0 Then
        y = 3
    Else
        y = 4
    End If
is much easier to scan than

    if x > 0:
        y = 3
    else:
        y = 4
Even though there are more characters in the vb side, the fact that the blocks are explicitly ended give you a much better mental cue than merely shifting the whitespace indentation


It's probably subjective. I've always found capitalisation makes scanning harder.

VB was one of my early languages, and I recall trying to find a way to turn off VB capitalisation way-back-when when I was developing in it.

But it might be odd to me. Whenever I see a directory of C with a CVS directory, I feel a brief tension from all the shouting.


I used to program in VB/VBA and switched to Python several years ago. As the other person who replied notes, 'visual cleanliness' is subjective, but I vastly prefer looking at Python to looking at VB/VBA, and I immediately began to prefer it on learning Python, despite having programmed in VB for several years previous.

Part of it is that you can communicate the same information with fewer tokens and in less space. My eyes don't have to jump around as much to figure out what's going on.


a .net based language probably already has hooks into the office API's, that would be one reason to use iron python vs. regular cpython.


The problem with ironpython is that it's not always compatible with all python libraries (esp. those based on C extensions; ie. heavy, scientific libraries similar to scipy); Using regular CPython immediately solves this problem. Also, the official .Net office API (read: VSTO) is not targeted on newbie programmers (imho).


and the problem with CPython is that is wouldn't be compatible with any of the office api's that are likely based on .net technology. so which one do you think would be better for office. I haven't done ms office automation since office 2k7 but that is the way it was heading.

edit: oh I see what you are getting at. a new product from a new company created at mit in 2012. https://www.datanitro.com/

I would argue that a $500 license is also not targeted at newbie programmers either.

can you run native python from excel without shelling out $500 ?


That's exactly what DataNitro offers - the entire Excel api in (C)Python.

Oh, and it's free for non-commercial use (educational etc.)


so datanitro really is the future of excel as described in the blog post. good to know.

I'm still curious as to how putting any "real" programming language behind spreadsheets is supposed to fix the situation that happened last week. the blog wasn't very convincing and it feels like it's just giving a poor disciplined group of people a more powerful tool to screw things up with. That's not a future I look forward to.


but none of the office api



Or F#, so you're already in a more functional environment.


no for mac??? :(




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: