
Why Python uses 0-based indexing. It's because of slices  - dangoldin
https://plus.google.com/115212051037621986145/posts/YTUxbXYZyfi
======
ot
As Guido says, 1-based indexes basically force closed ranges, so that [1,
length] is the full array.

One thing that annoys me about closed ranges (and it's not much talked about)
is that it is impossible to express the empty range: in python [i:i] is an
empty range, while closed ranges always contain at least one element. That
makes closed ranges _strictly_ less expressive than half-open ranges, making
often necessary to write special cases for zero-length.

For example when you want to split an array or a string at a given position i,
the partitions will always be a[:i] and a[i:] _even if one of the two is
empty_.

The other big issue, as many comment, is that the arithmetic to index multi-
dimensional arrays is cumbersome with 1-based indexing: for example if I is a
2d image in row-major, the element (x, y) is at position I[(y - 1) * width +
x] instead of I[y * width + x]. It's very easy to do off-by-one errors this
way.

EDIT: adding this just to try some ASCII art :)

I think that the preference about 0-based and 1-based indexing boils down to
how you reason about indexes.

I (0-indexing) think of them as the following:

    
    
          |-|-|-|...|-|
          ^           ^
        first   past-the-end
       position
    

that is, the index is _where the element begins_. This way there is a special
endpoint, the "past-the-end", where the an appended element would begin. In
particular, an empty array has both first and last endpoint equal to 0.

I believe that people who favor 1-based indexing think like this

    
    
          |-|-|-|...|-|
           ^         ^
        first      last
       element    element
    

that is, in the _ordinal_ way, i is the i-th element.

~~~
Stratoscope
Indeed, you hit the nail on the head. Or more to the point, on the infinitely
thin edges _between_ the nailheads.

The first time I saw this explained clearly was in the original Inside
Macintosh, describing the QuickDraw coordinate plane. Some excerpts (with
emphasis added to a few key concepts):

# # #

The Coordinate Plane

All information about location or movement is given to QuickDraw in terms of
coordinates on a plane. The coordinate plane is a two-dimensional grid, as
illustrated in Figure 2.

Note the following features of the QuickDraw coordinate plane:

• All grid coordinates are integers (in the range -32767 to 32767).

• _All grid lines are infinitely thin._

These concepts are important. First, they mean that the QuickDraw plane is
finite, not infinite (although it's very large). Second, they mean that all
elements represented on the coordinate plane are mathematically pure.
Mathematical calculations using integer arithmetic will produce intuitively
correct results. _If you keep in mind that grid lines are infinitely thin, you
'll never have "endpoint paranoia"—the confusion that results from not knowing
whether that last dot is included in the line._

Points

There are 4,294,836,224 unique points on the coordinate plane. Each point is
at the intersection of a horizontal grid line and a vertical grid line. _As
the grid lines are infinitely thin, so a point is infinitely small._

Figure 3 shows the relationship between points, grid lines, and. pixels, the
physical dots on the screen. ( _Pixels correspond to bits in memory_ , as
described in the next section.)

Rectangles

Any two points can define the top left and bottom right corners of a
rectangle. As these points are infinitely small, _the borders of the rectangle
are infinitely thin_ (see Figure 4).

# # #

The full PDF is available here (or via a search for "Inside Macintosh"), and
it's better with the illustrations:

[http://www.pagetable.com/?p=50](http://www.pagetable.com/?p=50)

The description of the QuickDraw coordinate plane is on pages 148-151 (I-138
to I-141).

Figure 3 is especially good. It shows how grid lines and points are infinitely
thin/small, but _pixels occupy the space between the gridlines_.

Disclaimer and shameless plug: My friend Caroline Rose wrote Inside Macintosh,
and she still writes. If you want someone who understands technical concepts
and can explain them clearly, look her up.

~~~
teddyh

                    │        │
      grid lines ─→ │        │
             ╲      │  point │
              ↘     │↙       │
             ───────┼────────┼───────
                    │░░░░░░░░│
                    │░░░░░░░░│
                    │░░░░░───┼── pixel
                    │░░░░░░░░│
                    │░░░░░░░░│
             ───────┼────────┼───────
                    │        │
                    │        │
                    │        │
                    │        │
                    │        │
    
           Figure 3.  Points and Pixels

------
BoppreH
There's a short article from Dijkstra on this reasoning:

[http://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF](http://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF)

He enumerates the four possible ways one could use, and goes on the pros and
cos of each.

~~~
oftenwrong
His handwriting should be a font

~~~
ginko
[http://www.fonts101.com/fonts/view/Uncategorized/34398/Dijks...](http://www.fonts101.com/fonts/view/Uncategorized/34398/Dijkstra.aspx)

~~~
memracom
I don't trust sites that ask you to download an .EXE downloader. If you go
here you can download it as a .TTF file

[http://www.fontpalace.com/font-
download/Dijkstra/](http://www.fontpalace.com/font-download/Dijkstra/)

~~~
dllu
The '1' in that font clearly differs from the one in
[http://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF](http://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF)

~~~
StavrosK
The "w" as well.

------
haberman
The article linked from this plus page
([http://exple.tive.org/blarg/2013/10/22/citation-
needed/](http://exple.tive.org/blarg/2013/10/22/citation-needed/)) poses the
question "why do programmers start counting at zero?" and then shows contempt
towards anyone who thinks they know the answer (while dismissing Dijkstra as
"incoherent", without argument).

To the author, the correct way to answer this question is by investigating the
history of how things happened to end up the way they are, which (apparently)
no one besides him has ever done.

While history is interesting, I think the much more important question is: why
_ought_ we to count from zero (or not)? And the answer to that question has
nothing to do with history.

In addition to Dijkstra's argument, mentioned elsewhere in this thread,
0-based addressing is more efficient (since it can be used directly as an
offset, without having to bias it first).

As much as I like Lua, its 1-based indexing is not my favorite part of that
language. In particular, the numeric for loop is confusing, because the loop:

    
    
      for i=x,y do
        -- stuff
      end
    

...will execute (y-x+1) times, not (y-x) as it would in most languages, like
Python:

    
    
      for i in range(x,y):
        # stuff
    

Off-by-one errors are some of the most annoying bugs. 0-based indexing and
half-open ranges help avoid them in my experience.

~~~
eridius
There's another reason that I haven't see anyone bring up. If your native
unsigned integer type is the same size as a word (which is common), and a word
defines the limits of your address space, and if you have a byte pointer
initialized to 0, then 0-based indexing allows you to index into every byte in
your address space, whereas 1-based indexing prevents you from indexing the
final byte in your address space.

Additionally, assuming an unsigned integer type is used for indexing, 0-based
makes every index valid and meaningful, whereas 1-based leaves one index value
(0) as invalid.

~~~
Dylan16807
Having a null value that's invalid is a _good_ thing. Thanks for the novel
argument in favor of 1-based indexing!

~~~
kenbot
Erk, no it isn't. If your semantics require an optional value, then encode it
that way explicitly. Don't inflict special cases and bugs on everyone else.

------
jerf
I've spent a lot of time in both domains, and the Python-style half-open
interval is the clear winner, because it produces the fewest "+1" and "-1",
_each and every one_ of which is _begging_ to be a bug. I disagree with
Dijkstra on some other counts, but in this case he is correct, both in theory
and in practice.

It's easier today than it used to be 1 indexed, though, because there are more
languages that support some form of

    
    
        for element in collection: ...
    

instead of the _incredibly_ error-prone

    
    
        for (int i = 0; i < something; i++) { ... }
    

Or, was it

    
    
        for (int i = 0; i <= something; i++) { ... }
    

Or was it

    
    
        for (int i = 1; i <= something + 1; i++) { ...}
    

You have fewer places to make fencepost errors when you use the first style,
so the costs of 1-based indexing are fewer. They're still higher than Python-
style, though, because in Python you have both the correct for loop and still
have the correct intervals.

~~~
to3m
When I started programming I quickly found that 0-based indexing was generally
simpler. A lot of the time, it makes no difference one way or the other
(because you're only using indexes to get at each item in turn, so you just
memorize the boilerplate for whichever language you're using), but the moment
you introduce ranges, or try to convert dimensionality - particularly with
random access - or do some wraparound, or try to implement something like a
ring buffer, it's all too easy to come a cropper. Either you put "1+" on every
array access, which is error-prone, or you try to manage two different
"spaces" (the 0-based one that's the ideal representation and the 1-based one
used for indexing) and carefully convert between them... which is error-prone
too.

Alternatively, you could just decide to start indexing at zero.

...or you could stick with 1-based indexing, it's up to you. The right thing
to do doesn't become wrong just because you don't do it ;)

------
dpratt
This is not meant to be tongue-in-cheek - I thought this argument was settled
ages ago. Are there any respected languages that use 1 based indexing? This is
an honest question - every single one I've ever used (C, C++, Java, Scala,
Python, JS, Objective-C, a bunch of academic ones) have been zero based. It's
quite clear that it's the right solution, since the first element of an array
is zero distance away from the beginning of the array.

~~~
kevincrane
Matlab does (or at least it did when I used it for a class a few years ago). I
was trying to iterate through some array for a project and kept hitting an
error, and was totally stumped until some EE major came over and fixed it.
"Haha did you forget how to count? You're starting the index at 0, that's the
problem."

~~~
ihuman
I had to use matlab the other day. It still starts at one. The first time I
used it, that confused me too.

~~~
aidos
I've just started using Octave (for the coursera machine learning course) and
it also starts at 1. Takes a few exercises to reaclimatise. More annoying is
that it doesn't auto broadcast matrix operations like numpy does (and the
slicing doesn't feel as powerful either).

------
espeed
Ah yes, but don't forget this is all changing in Python 4 :) -- remember the
"List Revolution" thread when Guido finally agreed to switch to 1-based
indexing...

    
    
      On Fri, Sep 9, 2011 at 2:12 PM, Christopher King <g.nius.ck at gmail.com> wrote:
      > The first element in a list is element zero, the second is one, the third it
      > two, and so on. This some times confuses newbies to the language or
      > programming in general. This system was invited when single bits where
      > precious. It's time to update. Keep in mind this is something for version 4,
      > since its not reverse compatible. I say we make the first element 1, second
      > 2, third 3, and so on. Other languages would follow. We are python, made for
      > easiness and readability, and we are in the age where you can't even get
      > something as small as a kilobyte USB. We must make first one, second 2, and
      > third 3, like it is supposed to be. I give this:
      > +1
    
      Consider it done.
    
      -- 
      --Guido van Rossum (python.org/~guido)
    

Source: [https://mail.python.org/pipermail/python-
ideas/2011-Septembe...](https://mail.python.org/pipermail/python-
ideas/2011-September/011448.html)

~~~
ssafejava
Funny thread, but just in case somebody takes it seriously:

    
    
        (And to those taking the thread seriously: this is all 
        in jest. We won't change the indexing base. The idea 
        is so preposterous that the only kind of response 
        possible is to laugh with it.)
    
        --Guido
    

[https://mail.python.org/pipermail/python-
ideas/2011-Septembe...](https://mail.python.org/pipermail/python-
ideas/2011-September/011462.html)

------
xentronium
I think ruby has it nailed better.

    
    
        a[1..10] # elements from 1 to 10, including 10
        a[1...10] # elements from 1 to 9; k...n is a half open range notation
        a[3, 5] # 5 elements, starting from position 3
    

Arrays are still zero based, but I feel this is more a homage to for- and
while- loops from older languages.

(for (i = 0; i < n; i++) is just a very nice notation, and so is while (i <
n))

~~~
teraflop
From the perspective of someone who doesn't know Ruby, the fact that a[1..10]
is _longer_ than a[1...10] is pretty counterintuitive.

~~~
xentronium
p..q is a fairly common math notation synonymous with [p,q].

Although, p...q is not very obvious, indeed.

~~~
marcosdumay
Well, they aren't synonymous in Ruby. It's [p,q] operator is also not
intuitive for me.

But then, "not intuitive" just means I'll spend a few minutes more learning
the language because of it (except in C++, where it means many hours).

------
shmageggy
The post he mentions is a really good read.
[http://exple.tive.org/blarg/2013/10/22/citation-
needed/](http://exple.tive.org/blarg/2013/10/22/citation-needed/)

It was submitted here yesterday but got no love.
[https://news.ycombinator.com/item?id=6595521](https://news.ycombinator.com/item?id=6595521)

edit: fixed link

------
rogerbinns
I never have off by one errors in Python, but have always experienced it in
previous languages and their libraries. Guido chose very well.

(The one exception is a function in the random module.)

~~~
calpaterson
I frequently have off-by-ones with range. len(range(1, 100)) == 99

~~~
aidos
Simplify it:

    
    
        len(range(100)) == 100
    

Though - I've written a lot of python code and I never use range. The beauty
of python is that all collections are iterable so you seldom need to specify
ranges.

------
knappador
Using zero reminds you that black is 0, 0, 0 in 24bit integer rgb, while white
is 255, 255, 255, not 256, since you only have 8 bits and 2^8 - 1 = 255, not
256, and calling black 1, 1, 1 makes no intuitive sense. Using zero indexes
saves a tremendous numbers of errors by habituating on what is more natural
for digital memory and the representation capability of intrinsics while
creating a nice correspondence between hex zero, decimal zero, binary zero.

The only time this leads to bugs in Python is when using my_list[len(my_list)]
instead of my_list[len(my_list) - 1], where the difference between the
magnitude of indexing vs counting can lead to intuitive error. However, it's
easier to just write list[-1], knowing that zero is the first element, so
counting backwards has to start at a different value. Of course you can do
something silly like my_list[-len(my_list)] to just be silly.

Indexing is about having some value to get some position out of an array. Zero
is a unique number, so it makes sense to use it as an index. If you start
counting with your fingers by holding up none and call it zero, suddenly life
makes intuitive sense. If you count the numbers that you count on the way to
ten fingers, you get eleven unique numbers for those ten fingers. The
difference between array length and array indexing. Magic.

~~~
edtechdev
1,1,1 is black, also

~~~
IanCal
0,0,0 is a slightly darker black.

------
frou_dh
While we're at it, the 0 key on a keyboard should be before the 1 key, not
after the 9 key :-)

------
rtkwe
Why is it zero indexed? Tradition. Why did that tradition start? The best
explanation I can think of is because array access is a convenient short hand
for pointer math:

array[i] translates perfectly into array_memory_start + i*element size. Thus
the tradition was most likely born.

~~~
thisisnotatest
> was most likely born I take it you just posted this comment for fun and
> didn't RTFA? That was the whole point; the author answered the question of
> how the tradition was born, down to the person and year.

~~~
rtkwe
I did read it. And all he says is that C was zero based but doesn't say why
that happened. Unless it was in the comment thread or one of the several
second order articles which I'll freely admit I didn't read.

------
Siecje
I don't think it is intuitive for

my_list[0:3]

to be the first three elements of the list.

Go from position 0 to position 3 should be 0, 1, 2, 3

~~~
Peaker
It's nice that my_list[0:3] + my_list[3:6] == my_list[0:6]

It's nice that 3-0 = 3 elements.

It's nice that an index counts the number of elements before it.

It's nice that my_list[0:length] is the whole list, with no extra arithmetic
needed.

So many nicer things about this way of doing things.

~~~
yoodenvranx
I think it is just personal preference.

I switched from Matlab to Pyhthon and I am totally happy that Python starts
counting at 0 (which is very important if you do image processing). But even
after four years of working Python with the open interval notation of the last
element it seems wrong to me. I think it is related to the fact that the
Matlab version is more direct. a[1:3] means: give me the elements 1, 2, 3.
That is very clear and obvious. In python a[1:3] means: give me the element 1
and 2. This is not that obvious and even after four years of python sometimes
I have to stop and think about it.

~~~
Peaker
I envision the array on a continuous axis, and place markers at 1 and 3 (the
exact positions, not the range of whole numbers). The range on the axis
between them is the right range. With this intuition, everything is very
simple and makes perfect sense.

~~~
yoodenvranx
Yes, it makes perfect sense, but it still feels wrong for me. I guess it is
just related to the fact that I grew up with Matlab and not Python. I was kind
of hoping I will get used to it but even after several years of writing image
processing stuff in Python I am still not a 100% convinced.

------
U2EF1
> My compromise of .5 was rejected without, I thought, proper consideration.

------
nodata
Humans use 0-based indexing. You're not born one year old.

------
analog31
I've been teaching Python to my kids (both in middle school), and had to
explain zero based arrays. Fortunately, all explanations automatically
translate into "yet another stupid thing that grown-ups do." ;-)

Being somewhat of an old timer, I understand the justification of zero based
arrays based on making the hardware simpler and maintaining a tight coupling
between the representation of a program in source code and its implementation
in hardware. Zero is special because all of the bits are the same. Arrays are
zero based because the physical location of the first element is the same as
the memory location assigned to the array itself, plus zero.

For the last element, I just remember the mnemonic "up to but not including."

~~~
vidarh
Drag out a ruler and ask them about distances.

It's a simple way of addressing the difference: 0-indexed "points to" the
start of each element. 1-indexed represents the ordinal position.

So to relate to the ruler, the first cm/inch interval is, well, the 1st. But
the distance to the start of the first is 0. So if you want to emphasise which
numbered interval it is, 1-indexed makes most sense. If you want to emphasise
where it starts, or the distance from origin, 0-indexed makes more sense.

------
DigitalJack
1-based counting would make binary very uncomfortable. b0 would be d1 and b1
would be d2. You would have the strange situation of needing more than 1 bit
to represent 0. Doesn't seem very intuitive to me, but that's due to
familiarity more than anything I suppose.

------
evanmoran
It doesn't matter how cool the slice notation is when people have trouble
accessing single elements!

1 based is better for human counting and 0 is better for certain mathematical
expressions, usually when thinking about offset from the start rather then
list position.

Clearly it is a judgement call, but I personally think counting is more
important for the same reason PI is wrong. We are confusing ourselves because
easy things aren't intuitive.

Imagine you have:

var list = ['first','second', 'third']

`list[2]` should clearly be the second element (not third), just as PI / 2
should mean half a circle (not 1/4).

We wonder why there aren't as many computer science / math grads. Well in
every intro class in the world people are getting confused by this.

~~~
IanCal
Depends on how you read `list[2]` "Start at the beginning and move two
elements along" works perfectly well.

------
rch
This was nice to read. I only wish itertools count was available by default,
in the same manner as range.

------
michaelochurch
I like 0-based indexing because it makes mathematical sense. By the ordinal
construction of the natural numbers, e.g. 5 ~= {0, 1, 2, 3, 4}. So an array of
length 5 has indexes corresponding to the five set-theoretic elements of 5.

