
Why numbering should start at zero (1982) - aaronchall
http://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html
======
TOGoS
You use zero and one-based counting for different things. Mixing them up is
where fencepost errors come from.

Say you're counting the length of a fence, so you are numbering the posts. Of
course it's preferable to number the posts starting at zero, because after
having numbered N posts, you know the length of the fence is N.

On the other hand, if you are counting the posts themselves, you would want to
say "one!" after the first post, as you have just counted one post. In this
case, the number you assign the post is the number of posts counted _after_
encountering it. Otherwise you end up having to +1 at the end.

Which is all to say: I've always felt that the disagreement between whether to
start at one or zero is due to people not making clear what it is they're
counting.

If you're numbering the points _between_ spans (as you tend to be when giving
names to dates, memory locations, and fence posts), always start at zero. If
you're giving numbers to the spans themselves, it's often convenient to use a
physically-based number such as the location of the center of the span (when
writing graphics applications I will often refer to the pixel in the upper-
left corner as 0.5,0.5, rounding down to get its address).

~~~
Stratoscope
Those are great insights, thanks for laying it out like that.

This is also why rulers start with 0 (even if you can't see the 0) - because a
ruler isn't about counting the marks, it's about measuring the space _between_
the marks.

There's an excellent discussion and illustration of this in the original
Inside Macintosh - a few us us were kicking it around a couple of years ago
here:

[https://news.ycombinator.com/item?id=6601515](https://news.ycombinator.com/item?id=6601515)

------
mayoff
I've always been charmed by EWD's handwriting:
[http://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF](http://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF)

~~~
jonahx
That literally looks like it was typed in a hand drawn font -- remarkable.

------
fdej
One big advantage of zero-based indexing is that it's consistent with division
with remainder. This is especially nice when working with multidimensional
arrays.

~~~
TheLoneWolfling
I'd argue that in the presence of one-based indexing, remainders being defined
as 1..n, inclusive, makes more sense anyways.

~~~
shasta
So 8 divided by 4 is 1 with a remainder of 4?

~~~
TheLoneWolfling
No. Two with a "remainder" of 4. Ideally I'd call it by another name to avoid
confusion (and the implication that x == x//y * y + x % y - although this is
often incorrect in the case of negative values _anyways_ ), but I do not know
what other "better" name to call it.

Think about it. If you have a series of bins with 4 items each, and you ask
someone to grab the 8th item, where do they go? To the 4th item of the second
bin.

~~~
shasta
Looking at division and mod as a pair finding group and subgroup indices is
interesting. For zero based indexing, we want division to round down and the
usual zero-based mod. So Group(8,4) = 8/4 = 2, Index(8,4) = 8 % 4 = 0. The
invariant is x == x/y * y + x % y. With one based indexing, we want division
to round _up_ and want the modulo operator you're describing. So Group(8,4) =
8/'4 = 2, Index(8,4) = 8 %' 4 = 4. The invariant is x == (x/'y - 1) * y + x %'
y.

------
domador
I've long been a zero-based-index-loving geek, but I'm now coming around to
preferring one-based indices in programming. Hopefully in the near future I'll
write a blog post detailing why.

Regarding the article itself... sorry, Dijkstra, but I think conventions C and
D are more beautiful, as I prefer discrete ranges where the signs on both ends
are the same. Convention C is my favorite, as it only makes mention of members
of the range, not numbers that lie outside of it.

~~~
learnstats2
Reading Dijkstra's own argument convinced me that 1-indexing is better.
Convention C (consistently non-strict) is the one that we normally use to
describe ranges in English.

~~~
baddox
While the article is explicitly only dealing with natural numbers, convention
C is insufficiently expressive for anything more granular than the natural
numbers.

------
aaronchall
"When dealing with a sequence of length N, the elements of which we wish to
distinguish by subscript, the next vexing question is what subscript value to
assign to its starting element. Adhering to convention a) yields, when
starting with subscript 1, the subscript range 1 ≤ i < N+1; starting with 0,
however, gives the nicer range 0 ≤ i < N. So let us let our ordinals start at
zero: an element's ordinal (subscript) equals the number of elements preceding
it in the sequence. And the moral of the story is that we had better regard
—after all those centuries!— zero as a most natural number."

~~~
mjevans
To put this another way; as mentioned by the fence post example another poster
above mentioned.

If you label the fence posts with numbers, from 1 to N, the value at a given
address along the length of the fence is the distance from the start of the
fence.

In this case 0 posts from the start is the value 1. What makes this confusing
is the arbitrary choice of the value of the object at location 0. Dijkstra
does a reasonable job of explaining why logically beginning numbering at 0
makes sense for the index/address, but does not reasonably explain that the
fault in the logic of those whom would count an item is failing to use that
logically derived index/address as the identity of the object.

That is, the numeric id of the object at address 0 should also be 0, not 1, as
many assume it should be based on the conventions that were instilled in basic
education.

------
stared
..and why shouldn't:

if (someStr.indexOf("John") !== -1) { ... }

vs

if (someStr.indexOf("John")) { ... }

Or when _teaching_ it's kind of funny saying "So, we take the first element,
that is the zeroth, then the third (I mean the real third, not the one with
index three)...".

OK, but to be entirely honest, I both sides have pros and cons. And as long as
a language/program/person is reasonably consistent (vide
[https://xkcd.com/163/](https://xkcd.com/163/)), there is not a big
difference.

~~~
icebraining
Your example should really be (using a polyfill if necessary)

    
    
      if(someStr.includes('John')) {...}
    

While I don't mind it for containers, I've never liked using Boolean coercion
for numbers. Zero is obviously a special case, but I don't think it maps well
with the True/False dichotomy.

~~~
jewel
Zero doesn't have to be a special case. In ruby 0 is a truthy value, as well
as the empty string. The only things that are falsey are false and nil.

Truthy/falsey means when used in a conditional or after a unary !. So you
still get strong typing, e.g. false == nil is not a true statement.

------
DawkinsGawd
Can someone explain this sentence to me:

"Consider now the subsequences starting at the smallest natural number:
inclusion of the upper bound would then force the latter to be unnatural by
the time the sequence has shrunk to the empty one."

The previous statement asserts that inclusion of the lower bound is preferred
because the sequence then starts with a natural number (so for a set 1<=x<12
the sequence starts at 1 instead of 1.000000 .... 1 in 1<x<12). How is
inclusion of the upper bound forcing the "latter" (what is he referring to by
latter) to be unnatural by the time the sequence has shrunk to the empty one.

~~~
to3m
"The latter" here refers to the upper bound. By "unnatural" he means negative,
as he is discussing the natural numbers (positive integers and zero). By
convention these are the numbers used to index arrays. (So, for your examples:
for the set 1<=x<12, the sequence is 1,2,3,4,5,6,7,8,9,10,11; for the set
1<x<12, the sequence is 2,3,4,5,6,7,8,9,10,11.)

As for what he's saying: suppose you decide your array indexing starts at the
lowest natural number, i.e., 0. And imagine an array with 1 element. And
assume you've adopted the notation a<=i<=b. This array's indexes are then
0<=i<=0.

But what if you have an array of 0 elements? What then? You can't say
0<=i<=-1, because -1 is not natural and therefore not a valid array index.

But if you adopt the convention that a<=i<b, then your 1-element array's
indexes would be 0<=i<1, and your empty array would be 0<=i<0\. (Or, indeed,
1<=i<1, or n<=i<n for any n you choose - the first section deals with how to
denote the range, so the choice of n isn't the key thing.)

~~~
DawkinsGawd
Thank you for the explanation.

------
nfoz
I love one-based indexing. I prefer array[1..n] to array[0..n-1]. The reason
is because with one-based indexing, I don't have names for things that are
invalid. begin, end: these are valid places to be. Your "end" token can mean
"the last element", and you don't have to think about "being behind the thing
that represents the thing past your last element".

I acknowledge it's domain-specific, but I find there's something mentally
relaxing about one-based indexing. And I grew up on 0-based.

------
gnufx
The Fortran (not FORTRAN) standard current when Dijkstra wrote that allowed
arbitrary lower bounds in array declarations, and an array could be
equivalenced to one with different bounds. The language doesn't impose the
choice.

------
vacri
> _That is ugly_

This is the defining reason? Seems an awful lot like personal preference, to
me.

~~~
qnaal
ugly in that "it's 'ugly' to have to use a signed int when referring to a
range of unsigned ints"

that's pretty ugly

------
0xdeadbeefbabe
Starting with one considered harmful.

~~~
Gibbon1
I always had this feeling that with one based indexing, 70s era compilers
ended up subtracting one from the index. For each and every element access
made. Because memory decoding circuits don't start at one, they 'start' with
0000...000. Course it would be possible to just waste the first element, but
70's era memory... Waste a 16 bit word here, and there and suddenly you're
talking real money.

