
The Good, the Bad and the GUI - ad93611
http://www.haskell.org/pipermail/haskell-cafe/2014-August/115586.html
======
jasode
>There is no excuse for a spreadsheet quietly taking a never-assigned cell as
zero, but indeed it does. WHAT THE HELL WERE THESE PEOPLE SMOKING?

I think the head scratching about Excel's (and probably other spreadsheets)
behavior on _empty cells_ is misguided. Yes, if you have a computer scientist
mindset then it might seem very mathematically satisfying to have strict
rigorous logic around empty cells but we're getting into 3-valued NULL
handling like databases. That type of extra logic checking is misplaced
considering the origins of spreadsheets (e.g. Dan Bricklin's VisiCalc) as a
_business computation_ tool instead of a GUI for an E.F. Codd relational db.

If a particular power user finds the typical "null" cell behavior to be
"wrong", he can add formulas to his cells such as:

=IF(ISBLANK(C3), NA(), C3)

... such that #NA# (Excel's quasi "null") propagates throughout any SUM() and
other calculations.

However, imagine if the situation was reversed and we had an alternate
universe where Excel propagated empty cells as #NA# instead of zero to satisfy
the logic of computer scientists. We'd inevitably have:

1) Microsoft adds a button wizard on the toolbar to bulk fill "null" cells
with 0.

2) User forums with power-users trading VBA macros to fill in zeros of empty
cells. _" Ugh! How do I get rid of all these useless #NA# littering my
spreadsheet???"_

3) Microsoft adds an user option checkbox to "treat empty cells as zero
instead of #NA#" \-- which everyone ends up activating as a defacto setting.
This cancels out the strict null handling the computer scientists were
imposing on the spreadsheet!

I contend this alternate universe scenario is much worse for business users of
Excel. In other words, you can't "force fit" the concept of NULL as a default
where it is not natural. The end users will just work their way around it.

~~~
danso
Yeah, hate to break it to the OP, because God knows I speak against the use of
spreadsheets whenever possible...but the most common use-case for a
spreadsheet is not as part of a reproducible, automated data pipeline. Many
times, it's hand-entered data entry, with the goal of making one of the pre-
baked Excel visualizations. For users who want type-checking, there's Access,
which is more attuned to handling non-trivial datasets.

~~~
reitanqild
> Many times, it's hand-entered data entry

Yes, many companies pays good money for people to update those Excel sheets.

Me? I see almost every single Excel sheet that is updated over more than a
week as a possible sale. But I'm not replacing Excel spreadsheets full time,
at least not yet : )

------
ef4
I get it, I really do. But I'm a programmer.

Spreadsheets are the dominant end-user-programmable tool in existence. Much of
what people user them for is horrifying to programmers who know better.

But if spreadsheets actually did the "right" thing, they would never have been
so widely adopted in the first place. That's an unprovable assertion on my
part, but I strongly suspect it's true. I think it's a clear case of Worse is
Better[0].

[0]
[https://en.wikipedia.org/wiki/Worse_is_better](https://en.wikipedia.org/wiki/Worse_is_better)

~~~
dragonwriter
> But if spreadsheets actually did the "right" thing, they would never have
> been so widely adopted in the first place.

I don't think spreadsheets are widely adopted because they do the wrong thing,
I think they are widely adopted because they provide UI affordances that
systems that do the Right Thing didn't at the time that spreadsheets became
widely adopted, and, since then, IT departments have imposed lockdown
requirements which prevent anything programmable from being accessible to end-
users except spreadsheets, _because_ spreadsheets were so widely adopted
before that lockdown began and end users simply refuse to give them up.

Its not, IMO, a worse-is-better situation, its simply better-is-better (in
terms of UI affordances at the time of wide adoption) combined with non-
technical, socially-imposed constraints which have locked the dominance in
place by preventing any competition in the end-user accessible programming
space.

------
noonespecial
I've learned to love Excel. Despite the horribleness mentioned and the
seemingly intractable mess people make when trying to codify business
processes with it, its still the very best way I've found for the end user to
tell me what they want the "real" program to do when I finally write it for
them. It forces them to stop thinking about specs in a genie-in-a-lamp way and
start thinking about how an actual machine might make it happen. They glimpse
the programmer's world through a glass darkly.

~~~
ianamartin
So much this. People who can't be convinced to write out a spec in a way
that's meaningful and unambiguous are often more than happy to take a
spreadsheet that mirrors the database, write the formulas that generate the
outputs they want and give that to me. Hours and hours of back-and-forth are
eliminated when I can convince people to do things this way.

------
radicalbyte
This should be in lesson 1 of any programming course.

    
    
      > As for "the developer is supposed to implement it",
      > next week I'll be giving my annual ethics lecture 
      > and I'll be pointing out to students that the codes
      > of practice of the various professional societies 
      > all agree that your duty goes beyond simply doing 
      > what you are told.
      >
      > If you are told to write consumer software that
      > gets its sums wrong, you should not do it.

~~~
teamonkey
I think rule #1 should be don't try to second-guess what the end user wants or
needs when you are writing or maintaining a piece of software that you
yourself don't use or need.

------
Practicality
This is a cultural issue. Most business I know see a program telling you data
is invalid just means the program is annoying and getting in the way.

They want the program to just assume you know what you are doing.

This is what they want, so this is what they get.

------
todd8
Back in the mid 1980's I used software by Javelin software that could be used
to serve the same purpose as a spreadsheet. Javelin was different in that one
wrote rules based on variables not cells. It was much easier to avoid the
errors that so often lurk hidden beneath the visible cells of a spreadsheet.

Unfortunately, I never met another user of the software and the company
eventually went nowhere. The spreadsheet was a concept very easy for
bookkeepers, accountants and other non-programmers to understand. Not so much
with (the better) Javelin.

Here is an add for the software from 1985:
[http://www.thecomputerarchive.com/archive/Software/Applicati...](http://www.thecomputerarchive.com/archive/Software/Applications/Javelin%20Plus%20\(1987\).pdf)

~~~
infinite8s
The closest modern equivalent to Javelin is Quantrix Modeler, although it
doesn't seem to have the breadth of functionality that Javelin did.

------
lifeisstillgood
In every language I know there is a set if conventions around Null values, not
set values, true, false and zero

We have only had the concept of zero for a thousand years and null for about
fifty. Expecting there to be a common way of handling this in all domains and
industries is a bit much and expecting it to be handled the way the OP prefers
is highly unlikely (although I do agree with their opinion)

If the convention in accountancy is BLANK == 0 then that's their convention.
Implement it using a single method that's nice and easy to instrument and
discover for later on, perhaps raising warnings out to the reports produced.

Leave the worrying over not set values for when Unicode and UTC handling in
excel is brought up :-)

~~~
flynnieous
The concept of zero is at least 3,700 years old. The Egyptians used it in
their accounting. See
[http://en.wikipedia.org/wiki/Egyptian_numerals#Zero_and_nega...](http://en.wikipedia.org/wiki/Egyptian_numerals#Zero_and_negative_numbers)

------
tehwalrus
I continue to be terrified at how spreadsheets are used in business
applications. Squashing data that shouldn't be in tables into tables "so you
can work with them" was the worst thing we ever taught non-programmers to do.
We should have taught them how to program instead.

(you can get close to programming in a spreadsheet if you know what you're
doing with Insert > Name > Create, and one day I'd like to see a spreadsheet
that lets you write your macro functions in Python rather than VB, but nobody
is taught to use Excel that way except people who also program.)

~~~
_delirium
It's not so much Excel's tables that people want, as its dataflow programming.
Excel is a widely available environment that lets you define data slots in
terms of functions on other data slots, with the whole chain updated live as
values change (no "manual" update logic, just functions of cells with auto-
update). The table layout is just a default way to view the slots. Until very
recently that programming style was not widely available elsewhere, especially
with a GUI. The only other semi-widely used system I can think of that _sort_
of has that functionality is Mathematica (where you can link slots in a
notebook), but Mathematica is more niche and expensive than Excel.

Heck, even without the GUI requirement, "real" programming languages have only
very recently added competitive functionality, with the exception of Common
Lisp, which had the Cells package ages ago. Now it's getting more common to
find various kinds of dataflow/reactive/data-binding constructs in mainstream
languages other than Excel, but it's quite new.

~~~
tehwalrus
In terms of auto-update, sure. But I was talking mostly about _how to
structure your data_ \- for most business activities, lists of objects with
properties (or dictionary keys) are far superior to cells, and collections of
cells, with obscure names.

How much easier would spreadsheets be to understand if you had

    
    
        sum([widget.price for widget in catalogue])
    

instead of

    
    
        SUM(F3:F405)
    

?

Teaching people to think more about their data structures, rather than
teaching them to squash everything into a table even if it doesn't really fit
into one, would dramatically expand their skills in this type of analysis!

~~~
dmethvin
You can essentially do what you're asking in Excel using named ranges. Getting
novices to use advanced Excel features is just as difficult as getting novices
to use advanced features in any other language.

~~~
tehwalrus
Indeed, I mentioned this in my original post (child of the article) - such
features are only taught to users so advanced that they are (more than likely)
also programmers (or certainly should be). I think this is a shame.

------
kalleboo
In the posters variant of a spreadsheet where a blank value is invalid, how
would the common practice of SUM(column A) be handled, where column A has an
unknown number of rows (that keeps on being added to)?

Require all formulas that refer to column A be continually changed to
reference the true amount of rows? Stop at the first blank row? Stop at the
last blank row? It seems more confusing to define rules for that case than to
assume "undefined values are zero".

~~~
dragonwriter
> In the posters variant of a spreadsheet where a blank value is invalid, how
> would the common practice of SUM(column A) be handled, where column A has an
> unknown number of rows (that keeps on being added to)?

IME, that's not really all that important of a case, because this:

> Require all formulas that refer to column A be continually changed to
> reference the true amount of rows?

Is the normal way to handle it in Excel, so normal in fact that Excel has a
couple of automated ways to handle it.

If you are doing a table (which is the only case where "sum of a column"
really makes sense), then for Table T and column C, the formula is
SUM(T[[#Data],[C]]).

The other is the fact that Excel _automatically_ updates ranges in formulas as
rows are inserted, etc. (though this doesn't work if you are appending, only
if you are inserting within the range.)

The two cases I've seen where relying on blank row handling is a common way of
dealing with data that can be added rather than relying on either structured
references in tables or excels automated updates are:

1) Spreadsheets written prior to Excel supporting structured references in
tables (or by people who learned Excel that long ago and haven't updated their
skills), and

2) Creating forms with a fixed number of rows (usually, to be printed in
hardcopy form, so that page layout is an issue), where simply zero-filling the
relevant cells with a formula that doesn't display zero values would be a
simple solution (since you have a fixed set of cells to fill.)

Of course, you could also have function fail by default on blank cells but
take an optional parameter to treat blank cells as the appropriate identity,
the same way Excel lookup functions rely on sorted data by default but can be
given flag that tells them to find an exact match without relying on sorted
input.

------
Tloewald
1\. Has anyone ever audited a non-trivial computer program and found it to be
free of errors? (I'm sure I've created trivial spreadsheets with no errors.)

2\. What if spreadsheets started with 0s in every cell, would this "solve" the
problem? I don't think so, and yet it would address this argument.

------
asuffield
I was particularly impressed by this detail further down the conversation:

Let A1 = 1, A2 = blank, A3 = 3

PRODUCT(A1:A3) is 1

A1 * A2 * A3 is 0

~~~
lifeisstillgood
Well that's just inconsistent and annoying and wrong.

I think the original comments are still over the top, as it seems to be arguin
that a defined convention exists but the OP disagreed with it.

However if your example is what they were actually complaining about then hell
yes.

~~~
endersshadow
Within Excel, ranges ignore blank cells in computation, but a reference to a
blank cell returns 0. The idea being that if you specify a cell in a
computation, you're expecting something to be there, whereas when you have a
range, you may have sparse data throughout that range, and therefore, the
result shouldn't break.

SQL behaves similarly. See:
[http://www.sqlfiddle.com/#!15/37025/5](http://www.sqlfiddle.com/#!15/37025/5)

------
kazinator
That appalled user would have a heart attack at Awk, a language in which a
previously unseen variable serves as zero or an empty string.

    
    
       count++    # if count doesn't exist, it becomes 1
    

A previously unseen array reference materializes automatically too:

    
    
       count[$1]++
    

(I think this is garbage too, but it leads to throwaway programs that are very
concise. If you write anything large, it's going to bite you on the butt:

    
    
       mispelled[foo] += bar; # wee

------
mischanix
I personally would prefer a spreadsheet that treated null as the identity for
whatever function it was acting as an input to (0 for addition, 1 for
multiplication, e for exponentiation, Identity matrix for matrix
multiplication, etc.), only throwing an error when a null does not resolve to
a single value (i.e. the same cell is used for both addition and
multiplication). But that would apparently require me to smoke some stuff.

~~~
cousin_it
Actually, I wouldn't even throw an error. It's quite intuitive for the user
when "sum" and "product" apply only to non-empty cells, and empty cells are
ignored. If all cells are empty, the return value should be the identity of
the function, i.e. 0 for sum and 1 for product. Also it's nice to have a
function "count" that counts non-empty cells, with identity value 0.

    
    
        A1 = 1, A2 = blank, A3 = 3
        sum(A1:A3) = 4
        product(A1:A3) = 3
        count(A1:A3) = 2
        average(A1:A3) = sum(A1:A3)/count(A1:A3) = 2
        geometric_mean(A1:A3) = product(A1:A3)**(1/count(A1:A3)) = sqrt(3)
    

Note that the last line wouldn't work under your proposal, because the
identity for "product" isn't the same as the identity for "count". I think the
user would be massively confused by an error like "using conflicting identity
values for cell A2", and would prefer a spreadsheet that just gave them the
damn geometric mean.

~~~
mcguire
But what is the geometric mean of 2, 3, 4 and potato?

~~~
cousin_it
An error.

------
dllthomas
I think the right thing might be, "default empty to a _tainted_ zero", and
then propagate and display taint. That gives immediate feedback and lets you
get some sense of things playing with an incomplete spreadsheet but won't
silently give you bad data that you think is good.

