
Why numbering should start at zero (1982) - bajsejohannes
https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html
======
chrispeel
I always like to point to the Julia package TwoBasedIndexing
([https://github.com/simonster/TwoBasedIndexing.jl/](https://github.com/simonster/TwoBasedIndexing.jl/))
which is IMHO provides the very best way to index an array. Yes, Julia
defaults to 1-based indexing, and allows zero-based indexing
([https://github.com/JuliaArrays/OffsetArrays.jl](https://github.com/JuliaArrays/OffsetArrays.jl))
or indeed whatever sort of indexing you want, yet I think the world is going
to eventually come around to _THE ONE TRUE_ indexing scheme of being two-
based.

:-)

~~~
krylon
I remember reading a funny quote that the obvious compromise between 1- and
0-based indexing was 0.5, and that it was dismissed without proper
consideration. Unfortunately, I do not remember where I read that or who
wrote/said it. :(

~~~
ScottBurson
Stan Kelly-Bootle:
[https://www.azquotes.com/author/27328-Stan_Kelly_Bootle](https://www.azquotes.com/author/27328-Stan_Kelly_Bootle)

~~~
krylon
Thank you! Sad to see he passed away, though.

------
JamesCoyne
The discussion of how to denote the bounds of a sequence may be missing one
consideration: Intuition from language. When speaking we articulate a range by
stating the first and last permissible value. E.g. "Pick a number between one
and ten". My point is understanding the length of the sequence is rarely as
important as quickly grasping its bounds.

~~~
wodenokoto
But when we say "between 1 and 10", the bounds are 1 and 10, but the length of
the sequence is - as far as I can count my fingers - also 10.

~~~
beobab
I seem to remember using this in early versions of Visual Basic:

    
    
        Dim myArray(1 to 10) As Integer
    

Don't know if it still works, though.

------
rhacker
When I was learning QBasic some billion years ago I recall that I was happy
with arrays generally starting at 1 (even though this is user specified in
Basic). When I learned C I initially wasn't thrilled about re-learning arrays,
but eventually grew to only like 0-based arrays.

I'm thinking this is like people who drink coffee will cite studies saying it
is good for you, and people who hate it will cite opposite studies.

~~~
perl4ever

      my @arr = (1, 2, 3);
    
      for my $i ($[...$#arr) {
         ...
      }
    

It is however, advised not to change $[.

~~~
perl4ever
And of course, there's Ada where you can use 'First and 'Last.

------
jkabrg
Starting at 1 is more intuitive, and therefore IMO better.

Regarding some of the advantages of zero-based:

    
    
      - Indexing backwards from the "end". Your language can always add an `end` keyword like Matlab does, and this stops
     being an advantage.
      - Indexing cyclically using modular arithmetic. Yes, this is an advantage. Albeit a rare one for me.
    

I commit less off-by-one errors with 1-based, and I don't have to double-check
as much -- so on balance, I prefer it.

[edit]

The amount of karma this comment is getting is undergoing something like
Brownian motion.

~~~
umanwizard
> Starting at 1 is more intuitive, and therefore IMO better.

This is totally subjective. Starting at 0 is more intuitive to me.

My intuition is very simple. `x` is the name of some memory location. `x[k]`
is the location `k` spaces away from `x`.

~~~
throwawaymath
I can agree that's intuitive, but I'd like to point out that's not quite the
same thing as numbering a sequence.

If we have a sequence a_1, a_2, a3, ... we can talk about a_3 by calling it
the third term. If we have a sequence a_0, a_1, a_2, ..., the third term is
actually a_2.

Whether or not we should index starting at 0 or 1 is probably dependent not
only on intuition, but the application at hand. For most analytic purposes
it's generally more useful to talk about the nth term, and we don't need to
know a specific number to reason about the distance between any two indices.
For other purposes, such as programmatic ones, it is useful to know e.g. the
traversal distance between two items in a list.

In my opinion it's best to first consider whether you're working in more of a
mathematical or programmatic context, and then secondarily who will have to
read it later on.

~~~
umanwizard
You don't have to talk about memory locations, or computers at all, for the
intuition to work. `a_2` is two spaces away from the beginning of the list.

The fact that our ordinal numbers are closely connected with the off-by-one
cardinal numbers (e.g., "third", meaning the element of a sequence in position
2, is closely etymologically related to the word "three") is an unfortunate
defect of language.

------
mayoff
As usual, let me point out that Dijkstra's handwriting is a treat if you
haven't seen it before. Here's the handwritten version of this note:

[https://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF](https://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF)

~~~
Scoping_dindus
Is that his real handwriting??? That's not a fancy font?

Wow, it really IS a treat. I've always heard people say how much a person's
handwriting can tell you about that person. Mine is a mangled mess, and I'm
pretty disorganized in most areas of my life.

~~~
rhplus
It's real, but if you'd like to emulate it, there's a TTF font floating around
out there:

[http://lucacardelli.name/indexartifacts.html](http://lucacardelli.name/indexartifacts.html)
(go to Artifacts > Fonts tab).

Or,
[https://www.google.com/search?q=dijkstra+ttf+font](https://www.google.com/search?q=dijkstra+ttf+font)

------
Aardwolf
Programmers seem in general to know that 0-based works best, but
mathematicians (and mathematical and statistical programming languages) seem
to prefer 1-based, and I don't understand why that is? Isn't mathematics also
easier with 0-based?

E.g. if you divide a matrix of 100 columns into 20 vertical bands of width 5
each.

Mathematicians use 1-based indexing for both the element index and the band
index, so there band n would start at coordinate "(n - 1) * 100 / 20 + 1"

For a programmer, band n would start at "n * 100 / 20"

That's two correction terms that you need to add in math which programmers
don't!

I had to use Matlab for microphone arrays once and it was full of + 1's and -
1's everywhere due to that.

Another example of mathematics and off by one errors: a polynomial. They call
it "degree n" if the highest power is n, except I see n+1 coefficients in
there and need to allocate an n+1 sized array to contain its coefficients, so
why not call its degree the amount of terms, including the "x^0" one. The
powers themselves in the polynomial are already hinting at 0-based indexing in
this case.

Mathematicians, please use coordinate "0,0" for the top left element of a
matrix :)

~~~
gizmo686
Interesting that you mention polynomials, as the constant term in them is
gennerally given the subscript 0. In fact, it sounds like you want some form
of 1 based indexing, where constant polynomials are degree 1, linear degree 2,
etc.

~~~
Aardwolf
No, I don't want form of 1 based indexing at all :)

an array with elements at index 0,1,2,3 has size 4

~~~
gizmo686
And the degree on a polynomial means the highest index of a non-0 coefficient.

I don't see how you can view calling a linear polynomial degree 2 an example
of 0 based index.

~~~
renox
Easy: the constant part of the polynomial is x^0.

------
protonfish
The initial sentence has always bothered me

> To denote the subsequence of natural numbers 2, 3, ..., 12 without the
> pernicious three dots

What's so pernicious about them? I don't see it. It seems like a clear and
intuitive way to communicate a sequence to me. I even wrote a little range
generator in JS to explore parsing declarations like that.
[https://github.com/chrisbroski/iterize](https://github.com/chrisbroski/iterize)
It seems to work fine.

~~~
umanwizard
That notation is ambiguous. It doesn't cause a problem for humans since we can
infer the correct meaning from the context, but it would for computers.

For example, does 3, 5, ... 11 mean the odd numbers between 3 and 11
inclusive, or the primes between 3 and 11 inclusive?

~~~
SiempreViernes
No, it is ambiguous for humans too as evidenced by the gazillions of problems
of the form "N1, N2, N3, what number comes next?".

Just decide on some rule for how decide and let people make their own standard
if they don't like it.

~~~
kvakil
Of course the answer is N1 - 3 N2 + 3 N3, since that naturally completes the
cubic interpolating polynomial.

------
js2
The part of his argument I find most convincing is his first remark that in
practice convention (a) has the fewest programming errors.

Aside, Antony Jay cowrote of _Yes Minister_ and _Yes, Prime Minister_.

[https://en.wikipedia.org/wiki/Antony_Jay](https://en.wikipedia.org/wiki/Antony_Jay)

I'm familiar with the shows but didn't recognize his name.

------
tzs
Once upon a time when I was young and foolish, I was doing something [1] in
C++, which has 0-based arrays. There were a lot of places in the code where
1-based arrays would have made the code quite a bit clearer, but a lot of
places where 1-based arrays would have made it much less clearer.

I wanted the best of both worlds, and so overloaded the () operator so that if
arr is an array, then arr(i) = arr[i-1].

This worked reasonably well, especially for arrays where it was always clearer
to go 1-based, or arrays where it was always clearer to go 0-based, so that
the array was always accessed with the same operator.

The only places it was questionable whether or not it made the code clearer
was where I used both in the same section of code. E.g., something like b =
a[i] + a(i) is arguably less clear than b = a[i] + a[i-1] or b = a(i+1) +
a(i).

[1] I think it was implementing an arbitrary precision integer library using
algorithms from TAOCP, but I don't remember for sure.

------
Sharlin
There's one important use case for Dijkstra's alternative (c), that is, an
interval closed at both ends. That is when your set has a maximum element and
you want to be able to express an interval including the maximum. This is, of
course, important in any programming language whose basic integer types have a
bounded range. For instance, the following loop in C never halts:

    
    
        for(uint8_t i = 0; i < 256; i++) { ... }
    

What's even worse, the following loop has undefined behavior because signed
integer overflow is not defined:

    
    
        for(int8_t i = 0; i < 128; i++) { ... }

~~~
OscarCunningham
I don't quite understand what you're saying. If you change the code to

    
    
        for(uint8_t i = 0; i <= 255; i++) { ... }
    

then it still doesn't halt.

~~~
Sharlin
You're right, of course. Apparently my brain doesn't work well today.

------
jkabrg
There are cases where zero-based indexing is more natural than one-based
indexing. An example is naming the centuries: It would be nicer if this were
called the twentieth _an_ century instead of the twenty-first. When people
talk about the fifteenth century, I have to think for a bit to understand what
they mean. As such, I propose adding a suffix "an" to the end of any ordinal
number to indicate a zero-based convention is being used -- this century would
then be the twentiethan century [twεntiθən] as well as the twenty-first. A bit
like degrees vs radians.

------
closed
In another thread last week, someone mentioned that 1-based indexing is more
common in numerical analysis, so it has appeal for languages that are "math
focused". Anyone have insight on that?

As far as I can recall, I've seen both 0 and 1-based indexing in statistics.

------
dnautics
Maybe indexing should be considered harmful. We're living in the age of really
smart compilers anyways. It's time to move on. Elixir sets the example - do
everything with TCR, provide an Enumerable interface, and beyond that use
map/reduce.

~~~
alanbernstein
Fits perfectly with Dana Scott's suggestion "a trend towards the removal of
explicit parameters."

[https://github.com/hypotext/notation#mathematical-
notation-p...](https://github.com/hypotext/notation#mathematical-notation-
past-and-future-by-stephen-wolfram-2000)

------
rhacker
So random question because I know I would get this wrong in an interview, but
would a 0 based indexing system have fewer assembly language instructions to
calculate a memory offset than a 1 based system?

~~~
ummonk
Nope, not in most architectures. In a 0-based indexing system, the base
pointer points to the first element of the array. In a 1-based indexing
system, the base pointer points to the slot in memory that isone spot behind
the first element in the array. This is how Julia (and Ada) does arbitrary
indexing.

------
ternaryoperator
Stan Kelly-Bootle's famous bon mot on the subject: "Should array indices start
at 0 or 1? My compromise of 0.5 was rejected without, I thought, proper
consideration."

~~~
flukus
That's a genius way to avoid fence post errors
([http://www.dsm.fordham.edu/~moniot/Opinions/fencepost-
error-...](http://www.dsm.fordham.edu/~moniot/Opinions/fencepost-error-
history.shtml)).

------
syphilis2
I see now

~~~
umanwizard
No it doesn't. Let's define the natural numbers to start with 42. Then the
left side of the sequence 42, 43, 44 being represented as 41 < x is unnatural,
since 41 is not a natural number.

It's clear that the author structured his essay carefully to avoid this exact
assumption. He explicitly avoids considering whether the natural numbers start
at 0 or at 1 until after he has chosen "a)".

