
Indices Point Between Elements - nelhage
https://blog.nelhage.com/2015/08/indices-point-between-elements/
======
thwarted
The visuals are definitely valuable in explaining this.

 _It used to be popular, and still is in some circles, to debate whether
programming languages ought start array indexing at 0 or 1._

When talking about this with other programmers, I've discovered that a lot of
the issues/confusion could be avoided by consistent use of terminology:
Offsets/offsetting always being zero-based and indexes/indexing always being
one-based.

Using rulers and birthdays also helps to explain differences. You're in the
first year before your first birthday, being zero (whole) years old.

To make matters potentially more confusing, culturally, I remember something
about the ground floor in the UK buildings not being "Floor 1" like it is in
the United States.

[http://www.hacker-dictionary.com/terms/fencepost-error](http://www.hacker-
dictionary.com/terms/fencepost-error)

~~~
pavlov
_I remember something about the ground floor in the UK buildings not being
"Floor 1" like it is in the United States._

Actually, that's perfectly explained with your offset vs. index terminology.
In some countries, the floor number is an index within the array of floors. In
others, it's an offset from the ground.

~~~
fhars
Except in the UK, where there often is a Mezzanine floor somewhere above the
ground floor (usually, but not always, between the ground floor and the first
floor).

Is there an Esolang that numbers its arrays with 0,M,1,2,3...?

~~~
derefr
A mezzanine is, by definition, a floor offset a non-integral number of storeys
from the floors around it; a floor existing at a fractional floor number, in
other words. A mezzanine "between the first and second floor" (in american
parlance) would have a floor offset of 0.5 (or possibly ranging from 0.3 to
0.7, since mezzanines usually involve complex arrangements of stairs and
landings.)

------
Someone
See also
[https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/E...](https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html)
("Why numbering should start at zero")

~~~
hammock
And the HN discussion:
[https://news.ycombinator.com/item?id=9761355](https://news.ycombinator.com/item?id=9761355)

------
surrealize
People have taken different approaches to this in bioinformatics for numbering
intervals of dna bases in chromosomes. I think this approach is catching on,
though. In that realm, it's important to speak unambiguously about insertions
and deletions, and the "interbase" mental model makes it a lot clearer.

edit: Probably the most popular genome browser, based at UC Santa Cruz, uses
this zero-based, half-open numbering internally. But, at some point in the
past, biologists developed an expectation that numbering would be 1-based. So
the Santa Cruz genome browser actually adds 1 to the start position of
everything for display purposes.

~~~
rcthompson
One interesting side-effect of using 1-based closed indexing (i.e. numbering
the positions, not between the positions) is that a zero-width range (which is
something that actually comes up in genomics) starts at position N and ends at
N-1.

------
carapace
Once at a party I accidentally started an argument about which way toilet
paper should hang from the roll (front or back) by mentioning how silly it was
that people would argue about such a trivial matter.

~~~
leni536
Also at which and do you start pealing a banana?

------
sago
I've found this a helpful way of explaining ranges in Python to students, it
also makes Python's negative indices understandable for the same reason.

But it is contextual. When it comes to languages like C, where arrays are more
directly mapped to pointers and memory layout, I've found it better to talk
about pointers, and allow people to derive the behavior that way.

Either way, I'd be careful of trying to claim that 'this is what it _is_ ',
rather than 'here is a way to remember it'.

~~~
nelhage
I get into this a bit later on, but I think the exact same model applies to
pointers: You're much better off in most cases thinking of pointers as
pointing at the zero-width points between elements, than at elements
themselves.

~~~
rzzzt
I'm also having trouble meshing this way of thinking about indexes with the
idea of a fixed bit-width, discretely addressable RAM, which would suggest
that there is nothing "between" two storage elements.

I find it very useful, however, for imagining what the returned insertion
point index of a binary search would mean, when the item you are looking for
can not be found.

------
orclev
He undermines his own argument right out of the gate. When he shows the part
about which number should be used for the last element in the range there's
still two valid choices, you could choose to stop at 3 with the understanding
that the element to the right of that is the last element to be selected, or
to stop at 4 with the understanding that in this context your indexing is
"special" and you're actually selecting the element to the left of the index.

~~~
morpher
If you treat the indices as between elements and include all elements that are
between the start and end elements, there is only one reasonable choice. It
would be more "special" to include elements that are outside of the specified
range (i.e., the element to the right of the final index).

~~~
crusso
Yeah, but I think that orclev was saying that you have a rule at the beginning
that the element to the right of the index is important and then in ranges,
the element to the right of the index is know longer important.

One way or the other, someone needs to know the same number of rules in order
to understand how the indices work.

------
perlgeek
When you talking about where zero-width regexes match, this mental model
certainly helps, and I find it consistent otherwise too.

~~~
nelhage
zero-width matches (and empty lines) were a _huge_ source of stupid edge-case
bugs in livegrep[1], and being rigorous about maintaining this mental model
definitely helped a lot.

[1] livegrep.com

------
DougBTX
Further notes about half open intervals and why starting at zero is a good
idea:
[http://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF](http://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF)

------
bch
There is a reference to Emacs in the OP, but not directly to what immediately
sprung up in my mind[0]. In Emacs, it has a cursor (the block) and a "point"
which is described as an imaginary spot at the left edge of the cursor - the
point is what's important when one is indexing data (ie: setting a mark and
selecting a bunch of text). The model fits for lots of things... Unfortunately
for me, tmux highlighting does _not_ follow this model.

[0]
[https://www.gnu.org/software/emacs/manual/html_node/emacs/Po...](https://www.gnu.org/software/emacs/manual/html_node/emacs/Point.html)

edit: clarify Emacs reference in OP

------
MrManatee
I have sometimes wondered whether it would be useful to have two different
types for indices, depending on if we are indexing the elements themselves or
the "gaps" between them. Let's say I'm thinking about some language like
Haskell, where types are already a big deal.

That way the compiler would be able to tell if I accidentally mixed the two.
Every conversion would have to explicit: for example, there might be two
functions, "before" and "after", that take a gap index, and return an element
index.

I think I might actually enjoy programming this way, but perhaps others would
find it needlessly bureaucratic.

------
curiousDog
Isn't this the classic fence post vs fence section analogy? I like this
statement from Djikstra: "an element's ordinal ... equals the number of
elements preceding it in the sequence"[1]
[1][http://c2.com/cgi/wiki?WhyNumberingShouldStartAtZero](http://c2.com/cgi/wiki?WhyNumberingShouldStartAtZero)

Another nice link: [http://betterexplained.com/articles/learning-how-to-count-
av...](http://betterexplained.com/articles/learning-how-to-count-avoiding-the-
fencepost-problem/)

------
marijn
And this is why JavaScript's `lastIndexOf` method gets it wrong, wrong, wrong.
(`"foo".lastIndexOf("o", 2)` yields 2, whereas I, and probably a lot of other
people, would expect it to be 1, with the search starting at the space between
element 1 and 2)

------
Camillo
This is the mental model I use.

------
spartan37
Relevant [https://github.com/vr6/blog/blob/master/Zero-
One.md](https://github.com/vr6/blog/blob/master/Zero-One.md)

------
colanderman
This is the same model used by the Cairo graphics library and the HTML5 Canvas
(albeit in two dimensions). All the same benefits apply when you're addressing
pixels as elements of an array.

------
ape4
We need a different name for this. We're too used to "index". If the name is
"Foo" then we can have fooSubString(i,j);

------
pshc
This echoes the way unums work in "The End of Error". (Recommended read by the
way.) Good way of thinking about ranges.

------
amelius
Ugh. Please no.

This only adds to the confusion, as now there are 2 ways in which indices can
be interpreted.

~~~
Retra
Specifically, this:

>"Indexing between elements, instead of indexing elements, helps avoid a large
class of off-by-one errors."

It only replaces them with indexing-method errors. Instead of remembering if
my ranges are open or closed, I have to remember if they are using between-
element indices or on-element indices. It's still going to cause the same
kinds of problems.

~~~
rcthompson
You always have to remember what kind of indexing you are using in order to
avoid making errors. But using this mental model will help people avoid making
off-by-one errors because they don't even understand the indexing in the first
place.

------
maaku
Great teaching tool, thank you.

------
anon5446372
Some people spend too much time overthinking simple concepts...

~~~
kmill
Some people overestimate the simplicity of common concepts...

