Why numbering should start at zero (1982)

Animats · on Jan 27, 2020

Depends on whether you're measuring or counting.

The article mentions Mesa's interval syntax, which gave programmers all the possible options:

[1..100] (0 .. 101) [1 .. 101) [1 .. 101)

were all the same interval, 1 to 100 inclusive. Bad idea. Great way to get off-by-one errors.

Another bad idea from that era - Wirth's insistence that you shouldn't be allowed compile-time arithmetic in Pascal. Like.

    const
        tabsize = 100;

    var 
        tab: array [0..tabsize-1] of integer; { NOT ALLOWED }

No, you had to have a separate named constant for "tabsize-1", and it couldn't be initialized as "tabsize-1", either.

maxdamantus · on Jan 28, 2020

> Depends on whether you're measuring or counting.

You run into issues when using 1-based counting as well as measuring.

Our calendar years use 1-based counting, which results in things like "2020" being the "20th year" of the "21st century". If we used 0-based counting instead, we would call the current year "2019" and it would be "year 19" of "century 20".

Explanation of conventional year counting:

  1st century began in year 1
  2nd century began in year 101 (100 years later)
  ..
  21st century began in year 2001

If you run the same sequence as above, but start counting from 0, you get seemingly much more sensible numbers (0, 0; 1, 100; 20, 2000; x, x*100...). I suspect most people probably think that the 21st century began in 2000 anyway.

It should also be noted that the use of the number 0 in general is relatively recent. The Roman numeral system (which was initially used for our year numbering) has no representation for it, as it was only used conceptually hundreds of years later. It seems to me that the main reason people count from 1 is historical.

supportlocal4h · on Jan 28, 2020

Counting from 0, 2019 would still be the 19th year of the 21st century. I'm puzzled why you completely changed grammar to make it seem like counting from zero also solved the century mismatch. If anything it suggests that counting from 1 is more consistent. Then 2020 is year 20 of century 20.

But then 2000 becomes year 0 of century 20, which is confusing English. But "century 20" isn't great English anyway. Calendars should be 0-based like any other measuring tool. The 3rd minute of the 2nd hour of a marathon is 1:02. The 3rd day of the 2nd month of my marathon is 1/02. If that happens to be the 3rd day of the second month of the year, it's 2/03?

maxdamantus · on Jan 28, 2020

> Counting from 0, 2019 would still be the 19th year of the 21st century.

I think you meant to say the "20th" year (off-by-one error!), since 2000 would be the "1st" year under your grammatical assumptions.

I think it's safe to assume that if we conventionally counted from 0 instead of 1, we wouldn't refer to the initial thing as "first" (or "1st"). If we did maintain that construction, we would probably just refer to the initial century as the "0th century" and then 2019 would be the "19th year of the 20th century".

> But then 2000 becomes year 0 of century 20, which is confusing English.

I think it's only confusing because it's unconventional, not because there's something inherently more confusing about it. There have been plenty of languages that haven't even had counting systems that do anything more than "alone, pair, many" .. sure, you can find contexts where counting in general is confusing.

> The 3rd minute of the 2nd hour of a marathon is 1:02.

Right, under current 1-based counting convention, but the point of this discussion is that this inconsistency exists for basically no reason (or rather, historical reasons—look up the history of "0"). When people want to do arithmetic on ordinal numbers, they end up subtracting 1 to turn them into a zero-based natural number, then add 1 again to turn them back into an ordinal:

  century(year) = floor((year - 1)/100) + 1
  century(2020) = 21

You can reduce the likelihood of off-by-one errors by just changing your representation of ordinals such that they start from 0:

  century(year) = floor(year/100)
  century(2019) = 20

I suspect most off-by-one errors can indeed be seen as due to this inconsistency between conventional ordinals and the more arithmetic-friendly zero-based naturals. Again, since the former is simply convention, I claim it would be better if our convention were different.

rebuilder · on Jan 28, 2020

At 1:02 the second minute has ended. The notation shows time elapsed, not the current time.

kijin · on Jan 28, 2020

There's another gotcha. There is no year 0 CE. It goes straight from -1 to 1. That's going to be a problem for zero-based counting!

OscarCunningham · on Jan 28, 2020

It doesn't go from -1 to 1, it goes from 1 BCE to 1 CE. Negatives aren't used at all. Personally I think we should use zero and negative CE to extend backwards instead. So the year we currently refer to as 1 BCE becomes 0 CE, 2 BCE becomes -1 CE and N BCE becomes -N+1 CE.

maxdamantus · on Jan 28, 2020

Like this: https://en.wikipedia.org/wiki/Astronomical_year_numbering

amatic · on Jan 28, 2020

Yet, when it comes to decades, we seem to be using the zero based count, perhaps with some confusion. The naughts, teens, twenties, etc.

wongarsu · on Jan 28, 2020

That's because we don't count decades. We say we live in the third millennium, and in the 21st century, but nobody calls this the 203rd decade. Instead we call it based on a common property of all the years in the decade (the count of these years can be abbreviated as twenty-something).

Someone · on Jan 27, 2020

”Bad idea. Great way to get off-by-one errors.”

I would think that depends on whether that gets taught at schools early in life.

Intervals that are closed at the low and open at the high end also are convenient when using zero-based indexing (you want an easy way to express 0,1,2,…,n-1, and if you’re used to the notation [0,n) is nicer than [0,n-1])

I would think overloading the meaning of the various parentheses makes parsing and generating clear error messages harder, though (but not as hard as when one would follow ISO 31-11 and allow such things a ]3,7] for an half-open interval excluding 3, but including 7)

I think the desire to not overload parentheses with yet another meaning is why ‘Modern’ languages tend to use an infix operator for ranges, e.g. ‘1 to 5’ or ‘1...5’, with Swift having the half-open variant ‘1..<5’ (IMO ugly, but clear, so I guess one would get used to it).

FisDugthop · on Jan 27, 2020

Swift may have borrowed it from E, where the half-open variant is `1..!5`, read "from 1 up to, but not including, 5". It is extremely useful for 1-to-n sorts of counting tasks.

WalterBright · on Jan 28, 2020

> 'Modern' languages

After having many fencepost errors, I finally came to the conclusion that all my code shall henceforth be zero-based. I'm much happier not having those errors anymore. I.e.:

   for (i = 1; i <= N; ++i)

The <= is always a huge red flag for me, and I rewrite it as:

    for (i = 0; i < N; ++i)

and add a `1` in the body if I must.

chopin · on Jan 28, 2020

Any specific reason to use ++i over i++? I always use the latter but only out of habit. I've rarely seen code with the former (your idiom).

tsimionescu · on Jan 28, 2020

It's usually a C++-ism, since in C++ generic iterators overwrite operator++, and when writing generic code, you must allow for this, since ++i is equivalent to i.operator++(), while i++ is equivalent to `auto x=i; i.operator++(); return x`.

Given the author, I assume that this is also common in D.

jstimpfle · on Jan 28, 2020

You can override the postfix increment operator as well.

The argument I've heard for why in C++ prefix is preferred is that compilers have a harder time optimizing out the temporary object cruft (especially older compilers).

WalterBright · on Jan 28, 2020

> You can override the postfix increment operator as well.

You can, but if there are side effects in the object's implementation then it can be less efficient.

lmm · on Jan 28, 2020

In theory extremely naive compilers may copy `i` if you use `i++` (since `i++` evaluates to the old value, whereas ++i can always be a destructive update), so some programmers have a habit of defaulting to ++i.

WalterBright · on Jan 28, 2020

It's just better form. Also, if `i` is a struct with an overloaded postinc operator, it can be more expensive.

jstimpfle · on Jan 28, 2020

I think that is arguable. The form "array[idx++] = foo;" has undeniable elegance, and I still use it sometimes, even though I've developped a pretty verbose coding style in general.

chopin · on Jan 28, 2020

Yeah, but why is it better form (I am coming from Java where it shouldn't make any difference)?

WalterBright · on Jan 28, 2020

It's better form to not ask for a characteristic that you don't need.

For example:

1. increment x

2. take the value of x, then increment x

(1) is better form if you don't need the value of x.

chopin · on Jan 30, 2020

I just checked in Java, it is:

1. increment x, then take the value of x

2. take the value of x, then increment x

Both are expressions in Java returning a value. My C is very rusty but I remember it being the same in C.

a1369209993 · on Jan 27, 2020

> you want an easy way to express 0,1,2,...,n-1

I'm partial to the viciously set-theoretic[0]:

  for i in 5: print i # 0;1;2;3;4

0: finite ordinals[1] are 0={},1={{}}={0},2={{},{{}}}={0,1},3={{},{{}},{{},{{}}}}={0,1,2},Sx=x∪{x}={0,...,x}

1: aka naturals or nonnegative/unsigned integers

Zarel · on Jan 28, 2020

That only works if you're ranging from 0 to n-1, though.

A popular way to partition an array at i and j is to do:

    arr[0..<i], arr[i..<j], arr[j...-1]

a1369209993 · on Jan 28, 2020

  a0,a1,a2 = arr.splitat({i,j})

maybe? It avoids repeating i and j twice and arr thrice, at least. Admittedly, there probably are cases where you really do want a range [x..^y] for arbitrary x and y, but it seems like most cases aren't.

beagle3 · on Jan 28, 2020

Nim (which is inspired in part by Pascal and Ada) uses “a..b” to denote a closed interval and a..<b to denote an interval half-open on the right.

The language is flexible enough to also let you define a<..b and a<.<b to complete the quartet, but I don’t know if anyone who ever did.

augustk · on Jan 28, 2020

> Another bad idea from that era - Wirth's insistence that you shouldn't be allowed compile-time arithmetic in Pascal.

This restriction has since long been relaxed in Wirth's later languages Modula-2 and Oberon which both supersede Pascal.

https://www.miasap.se/obnc/oberon-report.html#appendix

mikorym · on Jan 28, 2020

The set of natural numbers should have zero, otherwise it's not a ring. However, from the perspective of Peano's axioms, it doesn't matter as far as I am concerned what the symbols are called, as long as you have a successor function.

Apart from that, I agree that it depends whether you are counting [1] or doing something else.

[1] That's why I do HN references like this.

Smaug123 · on Jan 28, 2020

Pedantry: the set of natural numbers doesn't form a ring even if it has 0; you mean a semiring (which need not have additive inverses).

mikorym · on Jan 28, 2020

Sorry, yes, I mean the ring of integers.

pjmlp · on Jan 28, 2020

Except that even Turbo Pascal for MS-DOS allowed it.

People should actually stop using late 70's Pascal examples, specially when ISO Extended Pascal in 1990 fixed most of them, not to mention famous dialects like UCSD, Apple's Object Pascal and Turbo Pascal, or its modern variants.

lordalch · on Jan 27, 2020

Isn't that just how you get `const ONE = 1` cruft?

hinkley · on Jan 28, 2020

Oof. I bet it was a barrel of laughs working out how many minutes are in the month of July around him.

sedatk · on Jan 27, 2020

I think modern Pascal implementations fixed some of those.

raymondh · on Jan 27, 2020

Python follows convention the first convention, just as Dijkstra recommends. Mostly this worked-out well except for the case of descending sequences.

With half-open intervals, that case proved to be cryptic and error-prone, so we added reversed() to flip the direction of iteration. That allowed people to easily bridge from the comfortable and common case of looping forwards. Instead of range(n-1, -1, -1) we can write reversed(range(n)).

mikepurvis · on Jan 27, 2020

I was intrigued to see how this worked out in practice for Python 3, where range returns an iterator rather than a list— like, would reversing the iteration require allocating the whole list in memory?

    $ python3 -c 'print(reversed(range(10)))'
    <range_iterator object at 0x7fdb53ae96f0>

It turns out that reversed() only works on objects that expose the __len__ and __getitem__ methods:

https://www.python.org/dev/peps/pep-0322/

If you try to use reversed() with your own generator it will fail with "TypeError: argument to reversed() must be a sequence", until you wrap the generator invocation in an explicit list().

Very nice.

kragen · on Jan 27, 2020

Alternatively the object can expose a __reversed__ method, which is why reversed(range(10)) is a "range_iterator object" rather than a "reversed object".

0-_-0 · on Jan 27, 2020

BTW this also works:

    python3 -c 'print(range(10)[::-1])'
    range(9, -1, -1)

thedirt0115 · on Jan 27, 2020

Some background on Python's choice for 0-based indexing:

http://python-history.blogspot.com/2013/10/why-python-uses-0...

See Guido's comment on calendar :)

a1369209993 · on Jan 27, 2020

> Haven't you done enough damage?

Well, clearly they (people who prefer 1-based indexing) don't consider the amount of damage they've done enough.

jacobolus · on Jan 27, 2020

With the new proxy feature, we can emulate Python 3 ranges in Javascript:

https://observablehq.com/@jrus/itertools#range

protonfish · on Jan 27, 2020

The first time I read this, I was ready to be convinced and left disappointed. His entire argument is based on avoiding "the pernicious three dots."

What's so pernicious about them? They are short, clear, unambiguous, easy to type. Why don't we just get our computer languages to understand what "2, 3, ..., 12" means? Is there any other argument to start counting at 0 other than not defining a range using the three dots? If not, maybe starting at 1 (like everyone does outside of computing) is the better option.

burlesona · on Jan 27, 2020

I would argue that starting at zero comes more naturally if you’ve been working in binary and are often using bits to select or signal things. In that case the single bit zero is a rather important piece of information and where everything starts. Then when you translate these ideas up into higher languages it continues to feel somewhat natural — just as the hardware may be selecting for register addressed as 000000000 etc, so the items in an array would start from zero. We go ahead and represent the indexes after one in decimal, for convenience, but there’s a feeling that they are like the hardware memory addresses under the hood.

This isn’t so compelling in today’s world of mostly people who live in high level programming languages with no real notion of what goes on in the hardware. But backwards compatibility is a thing, and “following conventions” as well, so I see zero based indexing used in modern high level languages as a logical extension of where this all came from.

What’s hard, really, isn’t using zero based indexing, it’s switching back and forth between systems.

marcosdumay · on Jan 27, 2020

> This isn’t so compelling in today’s world of mostly people who live in high level programming

Array indexing is less compelling on higher level languages, so I think the needs of those languages can be safely ignored here anyway.

goatlover · on Jan 28, 2020

Don't R, Matlab & Julia make heavy use of array-like data structures?

VHRanger · on Jan 28, 2020

Yes, but those arrays are based on the concept of the mathematical vector.

Whereas C and descendents see arrays as chunks of memory.

In C arrays it makes sense to think of the index as an offset (0 elements away from the start, etc.).

In math its more consistent with classic matrix notation.

goatlover · on Jan 28, 2020

Agreed, what's weird to me is how high level languages like Python and JS adopted C's indexing. But I guess that's the huge C influence on the programming community. Most of us learn programming being taught zero-based indexing.

afiori · on Jan 28, 2020

I like the idea of counting from one because I like to think at the zero index as a fat pointer

tkfu · on Jan 27, 2020

Also, I find his case that notation option A is best to be extremely unconvincing. He just lists 2 properties that he thinks are good:

0. Subtracting the lower bound from the upper yields the length of the sequence.

1. For adjacent sub-sequences, the upper bound of the lower sequence will equal the upper bound of the lower sequence.

One could just as easily argue for option C (a ≤ i ≤ b) by pointing out that

0. It's much closer to natural language ("all the numbers from a to b"), and thus less likely to be casually misunderstood.

1. It uses the same inequality symbol twice, making it quicker and easier to understand.

2. If either of the bounds of one sub-sequence is inside the bounds of another sub-sequence, those two sub-sequences overlap.

arh68 · on Jan 27, 2020

Yes, I would agree.

His logic is that we should cater to the empty sequence case, and it would be "unnatural" to write 2 <= x <= 1, so we have to write 2 <= x < 2. That is just not an improvement. Ask somebody to write down an empty sequence, starting at 2, they'll think you're crazy. The Common Case is give the First and Last.

Option C is, to me, obviously best. How you can replace 2 .. 12 and not use 2 & 12 is beyond me. (/s Dijkstra's opinions considered harmful /s)

crooked-v · on Jan 27, 2020

> Why don't we just get our computer languages to understand what "2, 3, ..., 12" means?

Some languages already do: for example, in Ruby you can use (-5..-1) to get [-5, -4, -3, -2, -1]. It's lazy evaluation, so you can even do stuff like:

    Infinity = 1.0/0.0
    (1..Infinity).step(2).take(5) #=> [0, 2, 4, 6, 8]

burlesona · on Jan 27, 2020

Just FYI you can get infinity without evaluating an expression: it’s defined on Float, Float::INFINITY

acqq · on Jan 27, 2020

> Ruby

And before, Perl:

https://perldoc.perl.org/perlop.html#Range-Operators

I like ranges in Perl.

noisem4ker · on Jan 27, 2020

Same as in Haskell.

https://wiki.haskell.org/List_comprehension

mark-r · on Jan 27, 2020

I remember seeing a rather convincing argument that 1.0/0.0 should not be infinity, but zero.

QuinnWilton · on Jan 28, 2020

This is how Pony handles division. There's a good summary of possible motivations here: https://www.hillelwayne.com/post/divide-by-zero/

QuinnWilton · on Jan 28, 2020

I pasted the wrong link above, but my mobile app isn't letting me edit it. This is the link that describes their reasoning: https://tutorial.ponylang.io/gotchas/divide-by-zero.html

mark-r · on Jan 28, 2020

Thank you! The mention of Pony gave me just enough ammunition to search for the article I remembered seeing. https://news.ycombinator.com/item?id=17736046

P.S. the link you provided is exactly the one.

jolmg · on Jan 27, 2020

You mean mathematically, or specifically for programming?

For programming, I imagine it's better for it to be infinity than 0 since having 0 in the denominator implies that it's changing (who would write a constant 0 in a denominator?), and so as the denominator is approaching 0, the fraction is getting closer and closer to pos/neg infinity, not 0. Making it evaluate to zero would imply a change of direction for whatever the fraction represents.

So, if it's a position, as the position goes 1000/1000, 1000/500, 1000/100, 1000/1, etc. it's getting closer to infinity. Having it suddenly go to 0 would break the pattern of movement.

Mathematically, there is no x that satisfies x = 1/0 -> x * 0 = 1, since anything multipled by 0 is 0, so it's undefined.

mark-r · on Jan 27, 2020

Yes, there's a discontinuity between the positive and negative numbers, that was part of the argument. Another part was that certain mathematical formulas get simpler when x/0 = 0.

smabie · on Jan 27, 2020

Do you remember seeing or do you remember?

mark-r · on Jan 27, 2020

It was something on the internet. I wish I could come up with the proper search terms to find it again.

dragonwriter · on Jan 28, 2020

> Why don't we just get our computer languages to understand what "2, 3, ..., 12"

What does that mean? 2,3,5,8,12?

> starting at 1 (like everyone does outside of computing

Non-computing science quite frequently uses offset-based indexing starting at zero, just as is frequently done in computing.

tlb · on Jan 27, 2020

They’re not unambiguous when the endpoints are variables. What is [1,2,...,x] when x=1? You probably want just [1] in this case if you’re iterating over indices.

tsimionescu · on Jan 28, 2020

When I did maths in high-school and university, most indices were 0-based,not 1-based. 0-based indexing is so common in physics and engineering (e.g. the initial time of a system is T0, not T1).

Note also that, in maths, both the cardinal and the ordinal numbers start at 0 - defined as the cardinality of the empty set and the set containing the empty set, respectively.

0-_-0 · on Jan 27, 2020

Lots of people have a strong opinion on this: Ones that say that 0-based indexing is the only logical way of things, and ones that don't see what the fuss is about and prefer 1 because that's how we count objects. I suspect the people in the first category did some low level programming that needed to do arithmetic on indices. For example, let's say you want to take a string "abc" and repeat it until the length is 10, getting "abcabcabca". Assuming some Python-like language you would start with:

    a = "abc"
    b = [" "]*10

With 0-based indexing you would do:

    for i in range(0,10):
      b[i]=a[i%3]

In a 1-based language that becomes:

    for i in range(1,11):
        b[i]=a[(i-1)%3+1]

So you need to shift the index twice. This is because modulo arithmetic needs 0 to form a ring. As a result, in situations where the difference between 0 and 1 based indexing makes a difference it's ususally 0-based indexing that leads to simpler code.

Certhas · on Jan 27, 2020

Julia actually just includes a dedicated function for this case (mod1), and that covers the vast majority of places where 0 base is easier.

In my cases of numerical code 1 based indexing causes fewer ±1 than 0 based indexing. (Nitpick: In modulo arithmetic 0 = N so having the index run from 1 to N is a ring just as well. The problem is the conventional representation of the equivalence class -N,0,N,2N,3N… in the numbers being 0 (which arises from arithmetics definition of modulo))

a9h74j · on Jan 28, 2020

In terms of your remark that in counting we start at 1.

I'd argue that we start at 0 and repeatedly add one for each item counted.

Likewise, in multiplying we start at 1, particularly in this sense: For n>0, n^0 = 1. Multiply 1 by something zero times leaves 1.

0-_-0 · on Jan 28, 2020

That's a good point!

contravariant · on Jan 28, 2020

In pretty much any language other than python you'd have to be careful for the case that i might become negative anyway.

And in python you could just go:

    import itertools as it
    a = ''.join(it.islice(it.cycle("abc"),10))

Also you don't need 0 to form a ring. Modulo arithmetic forms a ring no matter which representatives you pick. Languages just tend to implement one that includes 0 because it's more convenient.

thomasahle · on Jan 28, 2020

Another example is combining to indices like row*height+column, but which works nice in 0 indexing but is weird and error prone with 1 indexing.

kragen · on Jan 27, 2020

Dijkstra makes, I think, a stronger argument, particularly given that in languages like C, (-2) % 3 == -2 rather than 1 as it should.

wnoise · on Jan 27, 2020

This used to be unspecified behavior. C99 then codified this wrong behavior.

Assume positive b for a moment. We want a * (a/b) + (a%b) = a. If a%b is to always be within [0..b), then a/b has to round toward -infinity. C99 instead chose round towards 0.

kragen · on Jan 27, 2020

That's right. Why do you suppose they chose that behavior to standardize, rather than Python's? Conceivably it's because nobody on the C99 standards committee had enough technical expertise to make the argument you're making, but can you think of another explanation? Because the prior probability on that one is pretty low.

wahern · on Jan 27, 2020

> In C89, division of integers involving negative operands could round upward or downward in an implementation-defined manner; the intent was to avoid incurring overhead in run-time code to check for special cases and enforce specific behavior. In Fortran, however, the result will always truncate toward zero, and the overhead seems to be acceptable to the numeric programming community. Therefore, C99 now requires similar behavior, which should facilitate porting of code from Fortran to C.

Source: C99 Rationale v5.10 sec. 6.5.5p10 (http://www.open-std.org/jtc1/sc22/wg14/www/C99RationaleV5.10...)

bloak · on Jan 28, 2020

Thanks for that. Now can anyone tell us where the Fortran behaviour came from?

(Haskell has both kinds of integer division, called div and quot, which is nice.)

derriz · on Jan 27, 2020

Don't you mean we want "(a/b) + (a%b) = a"?

a1369209993 · on Jan 27, 2020

> We want a * (a/b) + (a%b) = a.

> Don't you mean we want "(a/b) + (a%b) = a"?

No, they mean "b * (a/b) + (a%b) = a". (Also I'm not convinced that's actually useful, but that's a different discussion.)

derriz · on Jan 28, 2020

Ha - you’re correct of course. I saw the a^2 term and thought that can’t be right. Note to self - before attempting to correct others, check the “correction”.

kragen · on Jan 27, 2020

> Don't you mean we want "(a/b) + (a%b) = a"?

Like, 1000/1000 + 1000%1000 = 1000? That sounds like undesirable behavior for division to me.

wnoise · on Jan 28, 2020

As others have pointed out, I did mean b*(a/b) here.

0-_-0 · on Jan 27, 2020

That's likely there to make modulo power-of-two operations faster by turning them into a single AND operation. Otherwise it doesn't make sense to me.

kragen · on Jan 27, 2020

That's not why, no. Why don't you implement signed integer division (with shifts, masks, and adds/subtracts) and see what you come out with?

0-_-0 · on Jan 28, 2020

According to wahern's quote above it was to "avoid incurring overhead in run-time code to check for special case", like the ability to accelerate mod power-of-two by convertig to AND.

Dylan16807 · on Jan 28, 2020

You get a positive number if you use AND.

0-_-0 · on Jan 28, 2020

You don't mask the sign bit...

Dylan16807 · on Jan 28, 2020

In two's complement? You'd need to add extra masks and shifts to replicate the sign bit. It puts you in much more complex territory than throwing an AND at it.

_v7gu · on Jan 29, 2020

Or you could just use `cycle >> take 10` and avoid all this meaningless discussion.

eigenspace · on Jan 27, 2020

Very subjective and context specific argument.

Some things are more naturally numbered from 1, others from zero. Fence posts versus fence spans.

Pointer arithmetic is a great example of a situation well adapted to 0 based numbering, and was quite relevant to many programmers when this was written. However, far fewer programmers nowadays interact with raw pointers directly.

Personally, I run into more cases where I’d rather have 1 based numbering than zero, but your mileage may vary. It’s valuable to have a language which can support both.

asveikau · on Jan 28, 2020

> However, far fewer programmers nowadays interact with raw pointers directly.

It doesn't have to be literally a memory pointer to run into the same phenomenon. For example, offsets into a file. I can think of an example when I was doing work on a file format, and 0 based indexing certainly helped simplify some logic.

eigenspace · on Jan 28, 2020

Correct, I was merely pointing out that the main motivator of Dijkstra’s argument is not nearly as relevant as it was when this was written.

I have no idea what indexing style is more appropriate to the hypothetical ‘average’ programmer, but I find myself preferring mostly 1 to N the most, followed next by -N to N and finally 0 to N.

As I said originally, your mileage may vary, but I think it’s important that we use languages where many different indexing styles can be supported ergonomically, and in a way that avoids silly errors due to using the wrong index style.

Type systems and generic functions help a lot with this. I think of it a lot like say, signed versus unsigned integers. Multiple useful ways of looking at the same bitwise data. Type systems generally save us from nasty errors with the various types of integers. They can do it with arrays too.

daveFNbuck · on Jan 28, 2020

Do you have any examples where 1 to N is a more convenient numbering?

eigenspace · on Jan 28, 2020

What are the second to fifth elements of the array A?

    A[2:5]

I find that much much more convenient and user friendly than the alternative.

tabtab · on Jan 27, 2020

In the business domain one is often counting and tracking explicit things: widgets, contracts, customers, complaints, pizzas, etc. When counting, you almost always start with 1. "1" fits the domain more naturally, and if you deviate, you'll likely spend code converting between the end-user's view and the code's view. The translation layer creates risk of errors and more code. Maybe weather forecasting or space probe orbit calculation is different, but 1 better fits biz. Dijkstra was not (primarily) a business domain coder. (Smarter languages, like Pascal, allow you to define the range, although I'm not sure if it has dynamic upper ranges.)

tomjakubowski · on Jan 27, 2020

I think you'd still start counting those things from 0, because it's a convenient and natural way to express that you don't have complaints, widgets or pizzas.

Or, I suppose, you could start counting at 1, but use an optional type and use None for that case. But that makes doing arithmetic on those numbers ("widgets per pizza") harder than if you'd just used 0 to begin with.

tabtab · on Jan 27, 2020

I think you are mixing up indexing and default value setting. They are roughly related, but only roughly.

marcosdumay · on Jan 27, 2020

Hum, no. He is using a generalized version of counting that works for 0 items too.

A computer should start counting from 0, there's no reason to special case the empty set.

lopmotr · on Jan 28, 2020

Why would you need any special case? If 1 means there is 1 item then you can still use 0 to mean there are 0 items.

The 1-based counting would be to index or label them like item 1, item 2, item 3, ... which is quite natural to humans. If you had zero items, you'd express that by your list being empty, not by having a number 0 somewhere. It's not a special case.

mjevans · on Jan 27, 2020

Humans should too for the same reason.

Lets consider a measuring tool for fluids which is based on volume. What is a useful range for the index on the side that measures the currently non-air capacity?

Many such devices, E.G. a measuring cup I've got at home in the kitchen, use a series of tick marks within a bounded scale of accuracy range ] with marks on both the top (of course) and also the bottom (it looks nice).

While 0 isn't expressly labeled on many scales, it is part of the inclusive range by implicit nature. Thus as you point out the case of 0 units, and 'empty set' are one and the same in this real world example.

tabtab · on Jan 27, 2020

I haven't done enough "fluid" applications to give domain-related suggestions. That's not really an integer "count". I'm only saying it doesn't fit well in typical business applications as I see them. Each domain is different.

Many of our existing conventions derived from military, science, and academic applications. Business applications came along later, emphasizing discrete counting and categorizing things along the lines of set theory, such as in the set or not in the set, not half in. (Money has decimals, but you don't typically increment through it with indexes.)

(Although COBOL was published around 1960, it took roughly 5 years for computers to get powerful enough to make it practical and widespread. The earliest COBOL compilers were dog-slow resource hogs relative to the hardware of the day.)

chupasaurus · on Jan 28, 2020

You forgot to explicitly mention that bottom mark on your measurement cup is indeed 0, and without it you could measure "at least this volume of fluid".

tabtab · on Jan 28, 2020

That doesn't change anything that I see. Indexes and counters are usually not used for "continuous" metrics such as liquid quantities. And again, the biz domain typically does not count fractional quantities, at least not in an indexed way.

It's not a practical problem for most coding. If you disagree, can you demonstrate a problem caused by indexing arrays starting at one in a typical code situation/scenario?

physicles · on Jan 28, 2020

There are two separate concepts here: counting (how many things do I have) vs numbering/indexing (which one in a sequence am I talking about).

If I have five pizzas, would anyone argue that the variable storing that fact should have a value of 0x0004? On the other hand, if there’s a stack of five pizzas, they’re either numbered 0..4 or 1..5 depending on your religion.

tabtab · on Jan 28, 2020

So coding philosophy is tied to sex, politics, and religion? We're in trouble.

Yizahi · on Jan 28, 2020

I concur, because I observe this exact problem for years now. I work for a company producing modular hardware for decades. Until like 5-10 years ago all countable stuff was counted from 1 as is only logical. Processing blades, interface ports, chassis, internal modules of same type, logical elements (sessions, channels etc.). And then on seventh day devil came by :) . The biggest customer along with several other wrote a new revision of standard we are using and there everything is 0 based. And now we are in hell. Yes, it is mostly contained but even after years of developments of new product, years in production sometimes we still find bugs around 0 based numbering. There are multiple places where 0-based is mixed with 1-based or just used incorrectly. And talking to humans about hardware become much more error prone and inefficient - "please connect card one, port one to the switch" - first as in 0 or 1? Should I specify it this time explicitly? But he probably knows what I mean. Or maybe not? And if he is wrong I will waste another day waiting to change wiring again. Screw it, I'll tell him explicitly. Every goddamn time.

All of this because some arrogant programmers (or ex-programmers) think that they know better than everyone else and that changing legacy and/or logically correct conventions is good because of their religious beliefs that "0-based is better for everyone and every task".

tabtab · on Jan 31, 2020

Re: All of this because some arrogant programmers (or ex-programmers) think that they know better than everyone else ...

It could be they lack real-world experience or work in "esoteric" domains. Theorem proving is a different animal than making Boss Bob's billing summary reports come out right.

fctorial · on Jan 27, 2020

Even in business domain apps interfacing with humans is a miniscule fraction of the work.

tabtab · on Jan 31, 2020

I find the opposite. UI issues dominate too much since The Web murdered client-server stacks. Client-server was like parking a passenger car: you aim the front wheels where you went to go, and then you are there. Web stacks are like driving an 18-wheel truck: you have to plan in advance your multi-point swings and move carefully and slowly because rework is expensive.

gjm11 · on Jan 27, 2020

The syntax I always want but no language I know of supports:

    for 1 <= i <= 10:
        do stuff

    for 0 <= i < 10:
        do stuff

    for 1 <= i < j <= 10:
        do stuff

(In the last case, the exact order in which the 45 iterations happen should maybe be left unspecified; at any rate, it would be bad style to depend on it.)

Downward iteration:

    for 10 >= i >= 1:
        do stuff

Obviously this doesn't cover every case of "arithmetic for loop" that would be useful: sometimes you want a stepsize that's neither 0 nor 1. I'd be quite happy with a language in which I had to do that using a more general iterate-over-an-arbitrary-sequence construction; I'm tempted by options like

    for 100 <= i <= 1000 where i%3==1:
        do stuff

but it's probably too clever by half; either you only support the simple "specify the value of the variable mod something that doesn't change while iterating" case, in which case users will complain that some obvious generalizations fail to work, or you support arbitrary predicates, in which case you have to choose between making the "easy" cases efficient and the "hard" cases not (in which case users will be confused when what seem like small changes have drastic effects on the runtime of their code) and making all of them inefficient (in which case users will be confused by how slowly their code runs in some cases).

arh68 · on Jan 28, 2020

You may enjoy Common LISP's looping: it's a whole minilanguage where you can say stuff like

    (loop for i from  1 upto   10 ..)
    (loop for i from  1 below  11 ..)
    (loop for i from 10 downto  1 ..)
    (loop for i from 10 above   0 ..)
    (loop for i from 3.5 to 7.5 by 0.25 ..)

It does much more than that, I should mention; it's a whole mini-language/DSL (like format).

gjm11 · on Jan 28, 2020

Not only may but do; I like Common Lisp a lot, and while some people complain that LOOP isn't lispy enough I personally do like it.

But it doesn't have the elegance of just specifying the inequalities you want your loop variables to satisfy :-).

kazinator · on Jan 28, 2020

Plus loop has unspecified behaviors leading to nonportable code that OP might enjoy.

kazinator · on Jan 28, 2020

> (In the last case, the exact order in which the 45 iterations happen should maybe be left unspecified; at any rate, it would be bad style to depend on it.)

Relying on something unspecified isn't just bad style; it's a bug.

Why would you introduce requirements into a higher level language that leave important aspects unspecified, opening the door to bugs?

gjm11 · on Jan 28, 2020

Sorry, I wasn't clear enough. I meant:

1. It might be best not to specify what happens, because

2. if it were specified then people would rely on the specific order, which would be bad style because it would make code harder to read and be more liable to errors if anyone misremembers what order the iteration happens in.

(But it's true that people might rely on the observable behaviour even if it weren't officially specified, and that would be even worse. Which is why I said "should maybe be left unspecified"; it isn't obvious to me which way is better overall. If it's unspecified then fewer people will rely on it but the consequences each time will be worse.)

paulmd · on Jan 27, 2020

"Should array indices start at 0 or 1? My compromise of 0.5 was rejected without, I thought, proper consideration."

-- Stan Kelly-Bootle

travisjungroth · on Jan 27, 2020

I made a Python library that implemented this and gave a talk about it at PyCon: https://www.youtube.com/watch?v=yC9m2GInXqU&feature=youtu.be...

andai · on Jan 27, 2020

Gave me a good laugh, thanks for this :)

https://github.com/travisjungroth/Compromise

yesenadam · on Jan 29, 2020

So silly. I love it. The thing most worthy of the name "Python" I've seen, by a mile.

thijsvandien · on Jan 28, 2020

It's at 5:30.

BubRoss · on Jan 27, 2020

Indices should start at 1, but in programming we mostly use offsets and call them indices which is one source of confusion that leads to this endless debate.

gautamdivgi · on Jan 27, 2020

A real problem and pretty sure quite a few bugs in mathematical toolsets happen because of this. It was a real pain translating formulae to code for numerical computing.

GuB-42 · on Jan 27, 2020

There is worse. In Perl, you have the global variable $[, which lets you specify the first index of an array. One could imagine setting it to 0.5. I don't thing that would work, but with Perl being Perl, we never know.

This is, of course, a terrible idea, and that feature is now deprecated. $[ is always 0 and setting it to any other value is an error.

kps · on Jan 28, 2020

Sadly missed. Stan Kelly-Bootle was the first person to get a postgraduate degree in computer science. For more with the flavour of the quote above, check out his Devil's Advocate column in UNIX Review¹ magazine (not much UNIX, promise) and the post-paper Son of Devil's Advocate.²

¹ https://archive.org/search.php?query=creator%3A%22Unix+Revie...

² https://web.archive.org/web/20161024171031/http://www.sarche...

Kaiyou · on Jan 27, 2020

Things would be so much simpler if they start at 0, but with 0 being reserved for the length the array is supposed to have.

oarabbus_ · on Jan 27, 2020

Things would also be so much simpler if everything started at 1.

mc3 · on Jan 27, 2020

Relevant xkcds:

https://xkcd.com/394/

https://xkcd.com/1292/

tenebrisalietum · on Jan 27, 2020

On the assembly language level, you have compare instructions. These subtract, and throw away the result, but leave the flags.

Many CPUs have a flag that is automatically raised if a value is 0.

This means you don't have to execute a compare instruction to test if a value is 0, because the "Zero" flag will be set as soon as the 0 value is loaded.

That means you can load a value, and directly go to a BEQ (which is really shorthand for "if zero flag = 0") and save a few cycles by avoiding the CMP instruction.

So this is why numbering starts at zero. Testing if your list is empty, which is probably a common thing if you loop through each element, is slightly quicker.

kragen · on Jan 27, 2020

A disappointingly large number of comments on this article are by people who inexplicably offer weaker arguments than Dijkstra's.

Biganon · on Jan 27, 2020

Isn't it because an index of 0 in an array means "reach address of the array then go 0 bytes further"?

snicker7 · on Jan 27, 2020

In scientific-oriented languages (R, MatLab, Julia, Fortran, &tc.), array indicies tend to start at 1. I think it is a culture thing. Software engineers prefer 0. Scientists prefer 1.

enaaem · on Jan 27, 2020

In science indexing can be quite inelegant too. We often denote time steps as t0, t1,...,tn. But if we put these steps in a vector v, we would get: v[1]=t0,...,v[n+1]=tn.

mkl · on Jan 28, 2020

I think that's the case only if we're stuck in a programming language like Matlab. Usually we would get v[0] = t0, ..., v[n] = tn. Mathematicians are perfectly happy starting with zero, even for things like matrix indices, when it makes sense.

pjmlp · on Jan 27, 2020

Software engineers using languages from Algol, Pascal, Ada, Modula familes get to choose which array indices to use.

snicker7 · on Jan 27, 2020

Julia and Fortran also allow arbitrary starting indices. But the default is 1.

pjmlp · on Jan 27, 2020

Thanks for the heads up.

tshaddox · on Jan 27, 2020

But note that the potential for underlying confusion is not limited to the choice of indexing in programming language syntax. Casually ask someone who lives on the 8th floor of an apartment building how many floors they go up in the elevator and you'll probably discover that people don't think in detail about the precise labeling of elements in a collection or how arithmetic with intervals work.

samatman · on Jan 28, 2020

To add to the confusion, you’ll get genuinely different answers in America and Europe.

metalliqaz · on Jan 27, 2020

The scientists can do what they want until they start using computers, then they should listen to the software engineers.

Certhas · on Jan 27, 2020

This is silly. Different domains have different needs. If you think of arrays as pointer + offset calculation, and your only for loop has an explicit index increment in it, then starting at 0 is natural. If you are a systems programmer then a language targeting you should accommodate that.

If you are anyone who doesn't want to care (too much) about how things are implemented, then it's a lot less error prone to mark the 1st element of an array a[1]. Having taught and written fairly complex scientific code in both 0 and 1 indexed languages, there really is no good reason to do anything else.

    for a in array[3:6]

Should iterate over the third to sixth element of the array.

metalliqaz · on Jan 27, 2020

I dunno man, it's Dijkstra's essay. He has a pretty good reputation.

contravariant · on Jan 27, 2020

Nah everyone should just listen to the mathematicians and allow for arbitrary index sets.

Seriously though you save yourself a lot of trouble if you don't worry that much about what range your indices have but rather what they represent. The fact that they're integers starting at 0 is just an implementation detail.

goatlover · on Jan 28, 2020

Software engineers made the first high level programming language called Fortran, which starts indexing at 1 by default.

linuxlizard · on Jan 27, 2020

I'd like to have a keyboard with a 0 (zero) to the left of the 1 (one) digit.

mjevans · on Jan 27, 2020

Learn to enter numbers with the 'ten key' 'num pad' on the right... or possibly buy a full-sized keyboard.

fctorial · on Jan 27, 2020

Swap ~ and 0

dang · on Jan 27, 2020

Discussed in 2018: https://news.ycombinator.com/item?id=17765034

2016: https://news.ycombinator.com/item?id=13186225

2015: https://news.ycombinator.com/item?id=9761355

2009: https://news.ycombinator.com/item?id=777580

magicalhippo · on Jan 27, 2020

I'm usually a start-at-zero guy, except if I'm implementing matrix routines... going from the 1-based indexing from the math theorems to 0-based code is so tedious in order to ensure the translation is correct.

HorizonXP · on Jan 27, 2020

I totally agree with this. However, I'm trying to teach my 1 year old how to count. He's got 1, and 2 down, and we're working on 3 (He can say the numbers 1 to 10, I'm talking more about the concept of numbers). I would love to get him to understand 0 too, but I'm not sure how.

ilikehurdles · on Jan 27, 2020

If he's counting physical objects, perhaps add and remove objects until reaching 0 and explaining to him that this lack of objects is called 0?

arh68 · on Jan 28, 2020

People almost always start "stopwatch" counting at 1. If you time how long a juggling ball stays in the air, or how long it takes to run a short distance, a stopwatch will read 0.9 or something meanwhile most people start "1, tw-.." and then say 1.5 or so.

empath75 · on Jan 27, 2020

He just will one day. It was my 2 year old's favorite number after a sesame street episode about it.

teekert · on Jan 27, 2020

Yeah, it's pretty incredible, isn't it? Makes you realize that it's not just the training of the brain, that brain is growing and it is simply able to comprehend more as it grows, no training will do that. For some things, you just wait.

HorizonXP · on Jan 27, 2020

Yeah, I think I posed the question just to get ideas for how others think about it. I don't have a need for him to learn it per se, I just wanted to start a discussion about how zero is a difficult concept to teach. Heck, humanity had to "invent" the concept of zero.

I alluded to this in another comment, but what astonishes me about my son is how rapidly he grows and changes. Every night, my wife and I talk about how amazing he is, etc. and I can't help but think about the fact that 1 year ago, he was barely able to roll over, and that he's now running around, "reading," eating solid foods, and a full-blown chatterbox. If he's doing this much now, what the heck is he going to be doing next year?

It's forced me to reevaluate my own life. If my son can accomplish all of that in one year with proper support, nurturing, and guidance, and we are genetically related (i.e. 50% of him is me), then what could I do in the same amount of time?

I've resolved to find that out.

teekert · on Jan 28, 2020

It sure can be motivating. But unfortunately we do not benefit from any growing or our "hardware" anymore (or not as much at least). Where your son get some extra ram and CPUs plugged in every month,we have to make do with what we have.

kazinator · on Jan 27, 2020

> He's got 1, and 2 down, [...] He can say the numbers 1 to 10

That's extremely amazing for a kid that is exactly one years old, rather than, say, 1.x years old for high values of x.

HorizonXP · on Jan 27, 2020

He's actually 17 months if we want to get specific. On his 1st birthday, I managed to get him to answer the question "X, how old are you?" with "Wah!", which only became an articulate "One!" around Christmas/16 months, which was also about the time he started being able to say 1 to 10.

He actually is a pretty amazing kid. He's miles ahead of other kids his age, even some who are older. I wasn't expecting to be able to teach him counting until next year, so I'm being very relaxed about it right now and making it fun. He's 1, not 5, as far as I'm concerned, he can do what he wants, but if he's into it, I'll teach him.

travisjungroth · on Jan 27, 2020

Seems like a very technical way to say “almost 2”.

pgcj_poster · on Jan 27, 2020

Consider P₁ ∈ H and P₂ ∈ G where H is the set of HN posters, and G is the set of persons in the general population. Let p(P U) be the probability that a person, P, will produce an utterance, U, using mathematical terms and/or notation. Let q(U) be the probability that U will be better understood in mathematical terms than in plain language.

For any U, p(P₂ U) ≤ q(U) < p(P₁ U)

chapium · on Jan 27, 2020

How many candies do I have? Knowing the difference between nothing, one, two, three may be easier than memorizing a set of numbers.

(Although kids can pick up on memorization very quickly)

teekert · on Jan 27, 2020

How many apples do you have when I give you 0?

SlowRobotAhead · on Jan 27, 2020

I’m an embedded C programmer who typically avoids dynamic allocation... so I know the answer is zero apples... but it doesn’t matter because I have to hold my hand out making a space for an apple to go just in case some day you do decide to hand it to me.

yesenadam · on Jan 29, 2020

>C programmer who typically avoids dynamic allocation

Me too. But the problem is, maybe they'll give you a few. Probably not more than 10 or 20? Say 20 to be safe. So I'd keep an empty box ready.

psychometry · on Jan 27, 2020

There's not much reason in a high-level language to start at 0. Humans reason about iteration in terms of the natural numbers.

wnoise · on Jan 27, 2020

That doesn't help: mathematicians have two definitions for "the natural numbers", one starting at zero, and one at one.

goatlover · on Jan 28, 2020

The languages designed for scientific computing, which make heavy use of math libraries, start at 1.

psychometry · on Jan 27, 2020

It's very clear which definition I'm using. No need to be so pedantic.

codeulike · on Jan 27, 2020

Great to see this at position 1 on Hacker News

DonHopkins · on Jan 27, 2020

It deserves to be at position 0.

goatlover · on Jan 28, 2020

Position 0 doesn't exist.

DonHopkins · on Jan 28, 2020

No, you just need to subscribe to HN Gold to see it.

oarabbus_ · on Jan 27, 2020

The worst is google sheets API, which uses Javascript-based 0-indexing for some functions, but Excel-based 1-indexing for other functions. Nightmare!

mjevans · on Jan 27, 2020

It's trying to replace/inter-operate with Excel, so it has to maintain backwards compatibility; bug for bug, including design bugs.

I couldn't tell you offhand if the logic you speak of is distinct to Sheets or if it's Sheets maintaining the same interfaces other spreadsheet software defined for those macro names long ago.

jandrese · on Jan 27, 2020

I always love when you get a database query back into your 0 indexed language and the data is 1 indexed because that's how the database does it.

javajosh · on Jan 28, 2020

Counting a physical process, so you have to allocate space. 0 is that allocation, and as you move along the number line, you update that space (using "add 1"). Note that you only update the space at discrete intervals, which has surprisingly deep implications.

Having a kid I've been thinking a lot about counting, and what it really is. It seems totally wrapped up in repetition, and so I'm wondering if teaching counting as a function of, say, circular motion doesn't give a better intuition than the usual "count this clump of things" approach. (Counting clumps requires the person to simultaneously introduce an ordering and then implement a kind of internalized repetition as they point and count rhythmically. My kid seems to struggle with the ordering part, and no wonder: N objects have N! orderings.)

wruza · on Jan 27, 2020

Please note that “counting” there is really “indexing”, as in 1st, 2nd, etc. Apart from indexing in programming languages, some commenters in this thread seem to have an idea of counting (indexing) real things from zero, as if it solved the -1 problem easily. But that is a natural offset between a quantity and a position. We could in theory rename our ordinals one backwards like: zeroth, first, second, and so on. But that would shift the meaning temporarily, and 0th would become new 1st. With new counting in mind, for an empty set [] count is 0, for [x] count is one and an ordinal for x is “zeroth”, -1 again dammit.

Between 0 and 10 there is 10 one-sized intervals touching 11 connecting/borderline points (integers). You cannot make this fact go away, no matter which language you choose.

waffletower · on Jan 28, 2020

If the author had contemplated this sentence before writing this pained and narrow-minded treatise: "Why thinking should start before writing", engineers would be slightly more capable of interacting with and designing for other human beings. Numbering is highly domain specific -- the concept of zero is not always relevant. And sometimes the lay perspective has primacy over others. I remember the intense arguments engineers had regarding when the new millennium was to start. There weren't celebrations on New Year's 2001 that could compete with the scale of those for the year 2000.

tomkaos · on Jan 27, 2020

Worst than starting at 0 or 1 is two give the option. In old Visual Basic the "Option Base" statement change the indice for a whole module. I had to debug code program with a mix use of code with indice starting at 0 and 1.

shanxS · on Jan 27, 2020

I (almost) always fall victim of one-off errors in competitive programming when working with counting numbers in range. Sticking to convention like 2 ≤ i < 13, as Dijkstra observed for Mesa programmers, sounds like a good idea.

hi41 · on Jan 27, 2020

I need help with this. I intuitively count with starting at one.

In a for loop, if we start with 0, then the terminating condition is i less than n.

However, ifvit starts at 1, then the terminating condition is i less than or equal to n.

What mental trick do you use to remember that?

fctorial · on Jan 27, 2020

n is n steps ahead of 0,and thats what < checks for.

scythe · on Jan 27, 2020

One thing that’s nice about it is that you can use a positive difference as an array index directly. This is nice when you want to store a function of the “distance” between two things, e.g. potential[] = {-10, -5, -3, -2.2, ...}; ... potential[r1-r2]. It also works nicely with modular arithmetic as ‘0-_-0 mentions in their post. More generally this corresponds to the array being a function from the natural numbers to some other values, and this function can only start at zero if the arrays start at zero. But if you want a function that starts at 1, you can just set x[0] = invalid().

mirekrusin · on Jan 27, 2020

Nice quote at the end...

"In corporate religions as in others, the heretic must be cast out not because of the probability that he is wrong but because of the possibility that he is right." Antony Jay

mjw1007 · on Jan 27, 2020

« Exclusion of the lower bound forces for a subsequence starting at the smallest natural number the lower bound as mentioned into the realm of the unnatural numbers »

There's a parallel argument for being inclusive in the upper bound: it lets you specify a range which is the entire size of your integer type.

I can see that that might seem like a poor trade for losing the ability to represent an empty range starting at zero, but it seems a shame he didn't mention that this is the tradeoff you're making.

transfire · on Jan 27, 2020

The only reason to count from zero is for the sake of array processing -- given a pointer to an array, the first element will be at pointer plus zero. That's significant for low-level code as it avoids an extra add one operation.

However for high-level programming, starting at one has advantages. For instance what index to use to represent insertion of an element at the beginning? Being able to use zero for this is cognitively easier.

tabtab · on Jan 27, 2020

That convention seems confusing to me, especially one new to the convention. If you need an explicit "insert at the start" operation, then do something like "x.insertAtStart(y)" instead of "x[0] = y".

monkpit · on Jan 27, 2020

This is typically built into a language as “x.unshift(y)”

tabtab · on Jan 27, 2020

"unshift" is not very intuitive in my opinion, but let's not get bogged down with API naming design.

Sharlin · on Jan 27, 2020

Insertion at the beginning is insertion at zero in most zero-based systems too. Inserting at zero means that the new element will end up at index zero, and the other elements are shifted forward. In a one-based system, similarly, insertion at the beginning should be insertion at one, not zero; doing it any other way would just be a source of off-by-one bugs.

ivanhoe · on Jan 27, 2020

One detail that I always liked about zero-based arrays is that it makes for loops look nicer, using `i < length` instead of `i <= length`.

fctorial · on Jan 27, 2020

A recent study has found the number of '+1's and '-1's in a codebase is around 15 times higher if you use 1 based indexing.

crazygringo · on Jan 27, 2020

I'd love to see the citation?

Also I wonder if it's highly dependent on whether the domain is largely algorithmic (e.g. video codecs) or business logic (e.g. supply chain).

And how much is simply the language being "misused", e.g. writing "<= n - 1" rather than "< n".

fctorial · on Jan 27, 2020

It was an app I wrote, probably in algorithmic category. But even in business code, I very rarely see off by one problems that would be easier to handle in 1 based indexing.

> e.g. writing "<= n - 1" rather than "< n"

Wouldn't such errors further increase the ratio?

crazygringo · on Jan 28, 2020

I'm sorry, the "recent study" was "an app I wrote"? Sorry, but unless I'm misunderstanding... that's not a study. And the whole point of a study is to back up assertions like "I very rarely see" so they aren't just opinion subject to all the normal human biases we all have.

But if you ran such an analysis across public GitHub repositories per-language and wrote the results up in a blog post, I'm sure HN would love to see it. Definitely front-page material.

fctorial · on Jan 28, 2020

It wasn't a serious comment. I thought it was obvious from the tone.

> But if you ran such an analysis across public GitHub repositories per-language and wrote the results up in a blog post, I'm sure HN would love to see it.

I'll see what I can do. I do have some repos in sight that could be used for it.

But even then this won't be as straightforward as you say since different languages have different applications (eg cpp for games, julia for scientific computing). This would require writing the same code in both indexing patterns and then comparing them.

palae · on Jan 27, 2020

At least in scientific computing, that would not be the case, quite the opposite in fact.

mkl · on Jan 28, 2020

My experience strongly disagrees. I have done scientific computing in Matlab, Python, and C++, and Matlab is the one with the awkward +1, -1 everywhere.

palae · on Jan 28, 2020

Fair enough. I was thinking of operations like getting row i from a n*m matrix, which would be A(i,:) in 1-based compared to A(i-1,:) in 0-based (or A(i,:) with a 0..n-1 index).

thedirt0115 · on Jan 27, 2020

I was just wondering which was more ergonomic! Do you have a link to the study?

nabla9 · on Jan 27, 2020

While I agree with the conclusion, i think he could have expanded it even more. Not everyone has as developed aesthetic sense. Measuring, counting, combinatorics, indexing, arithmetic with ranges.

95% of cases it does not matter or make things more elegant or simple, but when it does it's always zero-indexing that is more elegant.