

Dijkstra: Why numbering should start at zero - thunk
http://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html

======
lucumo
For my MSc thesis I'm programming some stuff for the Cell B.E. processor. It
provides AltiVec vector instructions which allow quick parallel computation.
It's great for getting nice speed ups, but it requires a lot of thought and
bit twiddling from time to time.

It has two C "intrinsics" called vec_mule() and vec_mulo(). These take in two
vectors of integer datatypes (char, short) for multiplication and produces one
vector with items that are twice as big. char becomes short, short becomes
long. To make room for these larger datatypes it only multiplies the even
(vec_mule) or the odd (vec_mulo) items.

It's a nice construction, until you start to wonder which elements are even.
Is "element 0" even, or is the "first element" odd? Should I really have to
wonder about that?

I have never questioned the utter coolness of starting at index 0, until I
encountered this. I do not know why we start at 0 if the rest of the world
starts at 1, but it seems silly and without reason.

~~~
jacquesm
Well, the rest of the world also starts at 0:

1 sheep -> the number of sheep present when there is one

0 sheep -> the number of sheep present when there are none

Because you normally only enumerate things that are there it seems like you
are starting at '1', but really you start relative to the situation where
there are 'none'.

You can't have 'negative' presence of anything so it took a while longer to
realize that there might be negative numbers too.

As for element 0 being odd or even, it's even.

So, it isn't silly and it has a reason.

0 is one of the biggest inventions in the history of mankind, programmers have
given it it's natural place because we could not make our computers work if we
didn't do that.

Imagine binary logic based on '1' and '2'....

~~~
thunk
Er, that's all worng. The question isn't _number of_ , but _index of_. The
index of the first element in a collection is 0. The index of the first sheep
in a flock is (conventionally) 1. The total elements up to index n (inclusive)
is n+1 -- the opposite parity -- which can be initially confusing.

~~~
jacquesm
An index is a label, a convenient way of finding something when you remember
where you left it. You can start your indexing at any offset, you don't even
need to use numbers (you could for instance look at an associative array where
you store stuff based on a hash of some value).

index = position, label

The count is the number of occupied slots and is totally independent of the
positions or labels.

Otherwise how would you count the number of elements in a sparse matrix ?

~~~
wcarss
In this particular case though, where you have operations depending on
evenness/oddness of some attribute related to position in a set (either
position by index, or count) and the index starts with 0, there is absolutely
confusion to be had over which implementation would be used.

Your logic is sound, but it's counter-intuitive to think of Obj[2] as a member
of 'mulo', an odd number, "because 2 is really the 3rd element". A person
implementing the system could be logical by your ideal (making 2 odd) or
logical by the ideal of common sense and readability (making 2 even). Either
could validly hold in a person's mind.

Given the ambiguity, I think it just shows that you shouldn't base operations
on oddness/evenness, or should rigidly and obviously define them if you must
use them.

------
Radix
I'm happy to see this. I should have submitted it when I found it. The version
I found appears to be handwritten by Dijkstra and I like it better that way.

<http://www.cs.utexas.edu/~EWD/ewd08xx/EWD831.PDF>

~~~
thunk
Absolutely. His penmanship is as thoughtful as his bearing. I just hate pdf
links.

------
Locke1689
It's fairly straightforward if you're a C programmer and think of arrays as
contiguous blocks of memory.

int * a = (int * )malloc(5);

a is a pointer to the beginning of an integer (default 4 byte) array. a[i] ==
a + i == ((uint8_t * )a)+i*sizeof(int). Therefore, a[0] == a + 0 == the first
element in a contiguous block of memory.

~~~
discojesus
which would also illustrate what K&R said about array indices just being
syntactic sugar for pointer arithmetic.

My mind has been blown.

~~~
gjm11
If that blows your mind, you might want to avoid looking at the following:
"foo"[2] == _("foo"+2) ==_ (2+"foo") == 2["foo"]. And yes, you really can do
that: in general a[b] and b[a] are exactly the same thing in C.

~~~
hc
not quite. "foo"[2] is a char, ("foo"+2) is a string.

~~~
gjm11
oh, bother, I forgot what HN does to asterisks. In what I wrote above, look
for the bit in italics and mentally insert asterisks on each side of it. It
was right when I typed it in, I promise...

------
BillGoates
I am starting to think Dijkstra is about the worst thing ever happening to
software development.

He arguments that option A is the least ugly one, but forgets that computers
have maximum numbers.

According to Dijkstra, you should test if a variable is a valid integer like:
0 <= var < MaxInt + 1. The best possible outcome of such a test would be a
compiler error.

His argument that sequences defined as min <= i < (max + 1) because (max + 1)
- min = total number of items is just silly. Maybe true for maths, not so for
programmers. When reading code, you want to know if the code is valid, not how
many items there are in a list. And reading i <= Max instead of i < (max + 1)
is simpler.

Secondly, his article is about 0 or 1 based arrays, not random selections. And
in case of 1 based arrays, the max = total number of elements.

So option C is best.

About whether 0 or 1 based arrays are better. They both have their uses. 0
based for spatial coordinates, 1 based for lists and character positions.

~~~
thunk
1) The memo has _nothing to do_ with testing for MAXINT. Just test it
inclusively.

2) You _frequently_ want to know the length of a subseq, and you frequently
want successive subseq ranges to dovetail.

3) What in the world does rand() have to do with this?

~~~
BillGoates
1) He is using real world math arguments why computer arrays should be 0 or 1
based. His example seems valid, but once you use Maxint it fails, proving his
argument wrong.

2) Yes, but that has nothing to do with anything I wrote. Also calculating the
length of a subset out of a 0 or 1 based array is the same, making your point
completely irrelevant.

3) Where in the world do you read anything about rand()?

~~~
jongraehl
For fixed size ints, no rule for mapping pairs of ints to a range of
consecutive ints can allow both the empty range and the range of maximal size,
e.g. you can't have both "", the empty range, and "0,...,Maxint".

You really need to give up one of the possibilities, or use an extra bit.

~~~
BillGoates
The upperbound = maxint

The total number of items = maxint + 1

------
JMostert
Did you read Dijkstra's article? He's making the case for 0, though he's doing
it a little abstractly. Simply put, if you start numbering at 1, you are
setting yourself up for more boundary problems and off-by-one errors than if
you start at 0 (and by extension, inclusive lower bounds and exclusive upper
bounds). That's not to say that some algorithms are not in fact easier
expressed by numbering things from 1, just that they're not the majority.

Oh, and obviously 0 is even. Why? Because 0 mod 2 = 0, 0 is evenly divided by
2, and that's what "even" means. If you need more intuition, though: 1 is
indisputably odd, and even and odd numbers alternate, so 0 is even. The rest
is philosophy -- you can probably find definitions for "odd" and "even" where
0 is a problem. That's fine, but those don't help. Ignore them. Mathematically
there's no problem whatsoever.

The trickiness only comes if you insist in thinking in terms of "the first
element", rather than "element number 0". Some people use "zeroth", but this
seems to invite more confusion because it induces two meanings for all the
other ordinal forms -- if there's a zeroth element, does that mean the "first"
element is in fact the second element? Best to avoid ordinals altogether --
you usually don't need any more than "first" and "last" anyway.

~~~
lucumo
_> Did you read Dijkstra's article?_

Yes, I did. Does it hurt to offer a different perspective? I feel it doesn't.
Approaching a subject with an open mind or from a different angle tends to be
good for discussion. Offering a contradicting opinion or observation helps
everyone to understand an issue better.

 _> Oh, and obviously 0 is even._

I might counter this with a similar question you asked me: "did you read my
post?" I didn't make a case that 0 was odd, I asked how I should look at it:
as element 0, or as the first element. If you look at it as element 0, the
first element of the vector is an even element. If, on the other hand, you
look at it as the first element, you will think it is an odd element. This is
the case I made, not that 0 is even.

~~~
JMostert
_> Yes, I did. Does it hurt to offer a different perspective?_

No, I was questioning whether you had considered his arguments at all, given
the "why do we start counting at 0" question which he tried to answer.

 _> I didn't make a case that 0 was odd, I asked how I should look at it: as
element 0, or as the first element._

You can look at it either way, as both are correct.

 _> If you look at it as element 0, the first element of the vector is an even
element. If, on the other hand, you look at it as the first element, you will
think it is an odd element._

Ah, I see your point now. To me this wouldn't be a question _because_ everyone
knows programmers start counting at 0 -- a machine instruction would operate
on _indexes_ , not _ordinals_. "The first element" is the element with index
0. It is not "element number one", or if you do want to see it that way,
"number one" is the ordinal you use when you start counting on your fingers,
which is something we deliberately ignore.

~~~
lucumo
_> No, I was questioning whether you had considered his arguments at all_

Ah, yes. No, I wasn't trying to disprove him. I just gave a different approach
to the issue. My mind is not made up on this subject. I've been a programmer
for too long: 0 is burned in my fingers and my mind.

 _> a machine instruction would operate on indexes, not ordinals_

The interesting aspect isn't the difference between indexes and ordinals, but
the difference between indexes and cardinals. The cardinals match the ordinals
for normal people. For normal people, indexes have no meaning.

But the difference between indexes and cardinals aren't that easy for
programmers either. In an example presented by Dijkstra a young programmer
used 0 in everyday language as a cardinal replacing 1. Indexes have little
meaning in real life, but cardinals do.

Good response, BTW. Thanks. It's interesting to link the concepts of ordinals
and cardinals to the discussion.

------
compay
People sometimes complain about Lua's tables, which use 1 as their first index
when treated like arrays. I suppose it could make some algorithms more
complicated but in my day to day programming it's never made any difference to
me so far.

------
JBiserkov
I wish I was shown this in my "introduction to programming course".

All they told us was "because that's how things are. [MEMORIZE IT!]"

Perhaps I should read more of Dijkstra's writings.

Edit: I love when directory indexing is NOT forbidden.
<http://www.cs.utexas.edu/~EWD/ewd08xx/>

------
nishta
"Consider now the subsequences starting at the smallest natural number:
inclusion of the upper bound would then force the latter to be unnatural by
the time the sequence has shrunk to the empty one."

Maybe it is too late, but I don't understand this sentence.

A sequence starting at the smallest natural number would be: 0, 1, 2. The
variant Dijkstra is arguing about is: 0 <= i <= 2. How can the latter be
unnatural? And what does he mean with "shrunk"?

~~~
jongraehl

        [0,1) = (0)
        [0,0) = ()
        [0,0] = (0)
        [0,x] = ()
    

Probably you'd say x=-1 but if you're using unsigned indices, then you can't
distinguish the biggest possible sequence from the empty one (this is true of
all schemes, actually, but at least the other don't require negative numbers).

------
edw519
I write in primarily in 3 languages, 2 start indexes at 0 and the third starts
at 1.

Not withstanding Dijkstra's logical arguments, I have adopted starting at 1
for 2 reasons:

1\. I don't have to think about "shifting" data in order to look at it in the
third language.

2\. I don't have to "think" at all. It's intuitive that the first element is
"1".

I just insert a null in the first position of any array in a 0-starting
language.

~~~
loup-vaillant
If you ever have to pack multidimentional data in a flat array, I wish you
luck, then. In this case, starting at 1 messes up lookup and insertion. You
have to insert "-1" or "+1" pretty often. Too hard to get right in my opinion,
I prefer the easier way: to start at 0.

~~~
pbhjpbhj
Shouldn't the "index start value" just be a compilation directive?

------
TweedHeads
I don't care if computers think in 0s and 1s or if arrays should start at 0
because it is the way computers think.

Computers were made to serve us and as such they should translate all their
inner thoughts to more human consumable data.

Five is 5, not 101.

So no, numbering should start at ONE, even if most programmers have already
been hardwired to start counting from zero.

~~~
TweedHeads
Look at your hand and start counting your fingers while naming them:

[1] thumb

[2] index

[3] middle

[4] ring

[5] pinkie

so we have a simple range [1..5]

which can be represented in so many ways in computer programs:

for i=1 to 5 print finger[i]

for(i in [1..5]) print finger[i]

my first finger[1] is my thumb

my last finger[5] is my pinkie

how many fingers we have? as many as the last index in the list: 5

-

now, C programmers like to count this way:

for(i=0;i<5;i++) print finger[i]

where finger[0] is thumb

and finger[4] is pinkie

how many fingers we have?

as many as the last index in the list plus one: 4+1

how human-like!

And that's why you get so messy when trying to get the string position of a
substring:

if(pos>-1) exists, since pos=0 means the substring is in the starting position

again, how human-like!

But how dare I argue with C programmers without being burned at the stake?

~~~
jongraehl
What makes you think people who 0-index arrays and prefer half-open intervals
count any differently? Is this argument directed at a four year old?

4+1=5 never enters into it. What a rubbish argument. I might as well complain
that your [a,b] has (b-a+1) integers in it. (5-0)=5 so the half open interval
has 5 things in it.

What's the measure of a real interval 1<=x<=5? 4. 0<=x<5? 5.

~~~
TweedHeads
Don't try your cheap tricks on me.

We go from 1 to 5, so the math would be 5-1 +1

You go from 0 to 4, the math would be the same, 4-0 +1

Using 5 as your upper bound, then starting from 0 to 4 is the same as me using
6 as my upper bound then going from 1 to 5.

