Hacker News new | past | comments | ask | show | jobs | submit login

As Guido says, 1-based indexes basically force closed ranges, so that [1, length] is the full array.

One thing that annoys me about closed ranges (and it's not much talked about) is that it is impossible to express the empty range: in python [i:i] is an empty range, while closed ranges always contain at least one element. That makes closed ranges strictly less expressive than half-open ranges, making often necessary to write special cases for zero-length.

For example when you want to split an array or a string at a given position i, the partitions will always be a[:i] and a[i:] even if one of the two is empty.

The other big issue, as many comment, is that the arithmetic to index multi-dimensional arrays is cumbersome with 1-based indexing: for example if I is a 2d image in row-major, the element (x, y) is at position I[(y - 1) * width + x] instead of I[y * width + x]. It's very easy to do off-by-one errors this way.

EDIT: adding this just to try some ASCII art :)

I think that the preference about 0-based and 1-based indexing boils down to how you reason about indexes.

I (0-indexing) think of them as the following:

      |-|-|-|...|-|
      ^           ^
    first   past-the-end
   position
that is, the index is where the element begins. This way there is a special endpoint, the "past-the-end", where the an appended element would begin. In particular, an empty array has both first and last endpoint equal to 0.

I believe that people who favor 1-based indexing think like this

      |-|-|-|...|-|
       ^         ^
    first      last
   element    element
that is, in the ordinal way, i is the i-th element.



> One thing that annoys me about closed ranges (and it's not much talked about) is that it is impossible to express the empty range: in python [i:i] is an empty range, while closed ranges always contain at least one element. That makes closed ranges strictly less expressive than half-open ranges, making often necessary to write special cases for zero-length.

In matlab/octave you can write an empty range as a(i:i-1). Not to say that that's at all elegant or aesthetically appealing, but it does work. But I 100% agree with the sentiment; one of my favorite things about using NumPy rather than matlab in my research code is the vastly superior indexing, in large part due to the fact that it's 0-based. matlab's indexing really is painful and off-by-one-error prone for doing anything other than a(1:n).


> In matlab/octave you can write an empty range as a(i:i-1). Not to say that that's at all elegant or aesthetically appealing, but it does work.

I agree, it feels like an ad-hoc workaround (BTW, I didn't know that).

It kind of makes sense once you mentally reverse-engineer what Matlab is doing underneath, since ultimately the indexes are pointers, but from an abstract point of view there is no mathematical notation where the second endpoint of an interval is smaller than the first. If you go down that road, now you might ask what is (i:i-2) and so on.


http://planetmath.org/emptysum

Although I vastly prefer the zero-based variant.


Indeed, you hit the nail on the head. Or more to the point, on the infinitely thin edges between the nailheads.

The first time I saw this explained clearly was in the original Inside Macintosh, describing the QuickDraw coordinate plane. Some excerpts (with emphasis added to a few key concepts):

# # #

The Coordinate Plane

All information about location or movement is given to QuickDraw in terms of coordinates on a plane. The coordinate plane is a two-dimensional grid, as illustrated in Figure 2.

Note the following features of the QuickDraw coordinate plane:

• All grid coordinates are integers (in the range -32767 to 32767).

All grid lines are infinitely thin.

These concepts are important. First, they mean that the QuickDraw plane is finite, not infinite (although it's very large). Second, they mean that all elements represented on the coordinate plane are mathematically pure. Mathematical calculations using integer arithmetic will produce intuitively correct results. If you keep in mind that grid lines are infinitely thin, you'll never have "endpoint paranoia"—the confusion that results from not knowing whether that last dot is included in the line.

Points

There are 4,294,836,224 unique points on the coordinate plane. Each point is at the intersection of a horizontal grid line and a vertical grid line. As the grid lines are infinitely thin, so a point is infinitely small.

Figure 3 shows the relationship between points, grid lines, and. pixels, the physical dots on the screen. (Pixels correspond to bits in memory, as described in the next section.)

Rectangles

Any two points can define the top left and bottom right corners of a rectangle. As these points are infinitely small, the borders of the rectangle are infinitely thin (see Figure 4).

# # #

The full PDF is available here (or via a search for "Inside Macintosh"), and it's better with the illustrations:

http://www.pagetable.com/?p=50

The description of the QuickDraw coordinate plane is on pages 148-151 (I-138 to I-141).

Figure 3 is especially good. It shows how grid lines and points are infinitely thin/small, but pixels occupy the space between the gridlines.

Disclaimer and shameless plug: My friend Caroline Rose wrote Inside Macintosh, and she still writes. If you want someone who understands technical concepts and can explain them clearly, look her up.


                │        │
  grid lines ─→ │        │
         ╲      │  point │
          ↘     │↙       │
         ───────┼────────┼───────
                │░░░░░░░░│
                │░░░░░░░░│
                │░░░░░───┼── pixel
                │░░░░░░░░│
                │░░░░░░░░│
         ───────┼────────┼───────
                │        │
                │        │
                │        │
                │        │
                │        │

       Figure 3.  Points and Pixels


Awesome reference! In fact, I was going to write that the interpretation with positions is "geometric" or "spatial", as opposed to "ordinal", but then I thought the analogy might be a little far-fetched so I left it out.

EDIT: I just remembered reading about Caroline Rose in Andy Hertzfeld diary: http://www.folklore.org/StoryView.py?story=Inside_Macintosh....

This passage was especially remarkable:

    Pretty soon, I figured out that if Caroline had trouble
    understanding something, it probably meant that the 
    design was flawed. On a number of occasions, I told her
    to come back tomorrow after she asked a penetrating 
    question, and revised the API to fix the flaw that she 
    had pointed out.
It should be a reminder that to the engineer who wrote the API everything is obvious in retrospect, even the inconsistent or poorly thought details. A fresh set of (smart) eyes is essential to bring some perspective.


Inside Macintosh was an absolutely excellent series of books. I read volume I from front to back in middle school, and it taught me much of my English.


APL uses, by default, an origin of 1. Having used APL extensively for probably around ten years for everything from database management to financial, mathematical, image processing and even DNA sequencing applications I would conclude that an origin of 1 is not a problem. At least it isn't for APL.

That said, APL does have a system variable known as the "index origin" that allows you to change the origin as desired when needed. The variable is called ⎕IO (pronounced "quad i o". The "⍳" operator is roughly equivalent to python's range(n) function.

With the default ⎕IO set to 1:

        ⍳10
    
    1 2 3 4 5 6 7 8 9 10

        a ← "This is a string"

        a[1]

    T
    
If you set it to 0:

        ⎕IO ← 0
    
        ⍳10
    
    0 1 2 3 4 5 6 7 8 9

        a[1]

    h
You can use this to your advantage during computation. For example, to go along with your image element calculation, you could set your index origin to 0, do the math index and whatever else needs to be done with ⎕IO at 0 and then go back to 1. I found the concept to be really powerful.


This post reminds me of an excellent stack overflow answer[1] explaining how slices in python refer to the spaces between the elements.

[1] http://stackoverflow.com/questions/509211/pythons-slice-nota...

(Perhaps in the context of this post I should index my first footnote as 0 :p)


There is nothing less expressive about 1-based counting, it is only... different. Empty ranges are usually marked with 0 or empty index. Negative indices tend to mean "get everything except those" instead of "count them from tail". Also in 1-based languages element x,y is usually just A[x,y].


> That makes closed ranges strictly less expressive than half-open ranges, making often necessary to write special cases for zero-length.

Be a little creative. If you have slice like list[a:b] a could be the the first element and b could the length of the slice.

Alternatively the empty slice could just be b < a, instead of b <= a as it is now.

> The other big issue, as many comment, is that the arithmetic to index multi-dimensional arrays is cumbersome with 1-based indexing: for example if I is a 2d image in row-major, the element (x, y) is at position I[(y - 1) width + x] instead of I[y * width + x]*

In most cases it's better to use numpy for that anyway.

However, note that is more or less the reason we have 0-indexed arrays in C. In C the array is bascially just a another synatax for pointer arithmetics. So, I[0] is the same I + 0 * sizeof(int) (assuming I is of type of int).


I agree, also I find that the original article linked in TFA is very interesting but a bit disingenuous.

He starts of by saying "you think 0-indexing was chosen because it makes pointer arithmetic simpler but you're completely wrong". Then he digs up the origins of the concept in BCPL: the reason is that it made pointer arithmetic simpler. Oh.

And then he goes on a contrived (but still interesting) explanation of how it's not because it made pointer arithmetic simpler but rather that it made it less CPU-intensive. Definitely worth a read but I didn't really like the patronizing tone.


Some versions of BASIC going way back into the 80's (I very dimly remember Microsoft BASIC for Macintosh having it) have an OPTION BASE keyword allowing you to actually specify which array base to use (0 or 1).

Although I kind of like the following solution I found in a dusty corner of the Net:

http://vb.mvps.org/hardcore/html/optionbasezero-basedarrays....

Basically, you yourself specify the range indices.

    Dim ai(1 To 10) As Integer
    Dim ad(0 To 49) As Double
    Dim av(-10 To 10) As Variant




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: