As Guido says, 1-based indexes basically force closed ranges, so that [1, length] is the full array.
One thing that annoys me about closed ranges (and it's not much talked about) is that it is impossible to express the empty range: in python [i:i] is an empty range, while closed ranges always contain at least one element. That makes closed ranges strictly less expressive than half-open ranges, making often necessary to write special cases for zero-length.
For example when you want to split an array or a string at a given position i, the partitions will always be a[:i] and a[i:] even if one of the two is empty.
The other big issue, as many comment, is that the arithmetic to index multi-dimensional arrays is cumbersome with 1-based indexing: for example if I is a 2d image in row-major, the element (x, y) is at position I[(y - 1) * width + x] instead of I[y * width + x]. It's very easy to do off-by-one errors this way.
EDIT: adding this just to try some ASCII art :)
I think that the preference about 0-based and 1-based indexing boils down to how you reason about indexes.
I (0-indexing) think of them as the following:
|-|-|-|...|-|
^ ^
first past-the-end
position
that is, the index is where the element begins. This way there is a special endpoint, the "past-the-end", where the an appended element would begin. In particular, an empty array has both first and last endpoint equal to 0.
I believe that people who favor 1-based indexing think like this
|-|-|-|...|-|
^ ^
first last
element element
that is, in the ordinal way, i is the i-th element.
> One thing that annoys me about closed ranges (and it's not much talked about) is that it is impossible to express the empty range: in python [i:i] is an empty range, while closed ranges always contain at least one element. That makes closed ranges strictly less expressive than half-open ranges, making often necessary to write special cases for zero-length.
In matlab/octave you can write an empty range as a(i:i-1). Not to say that that's at all elegant or aesthetically appealing, but it does work. But I 100% agree with the sentiment; one of my favorite things about using NumPy rather than matlab in my research code is the vastly superior indexing, in large part due to the fact that it's 0-based. matlab's indexing really is painful and off-by-one-error prone for doing anything other than a(1:n).
> In matlab/octave you can write an empty range as a(i:i-1). Not to say that that's at all elegant or aesthetically appealing, but it does work.
I agree, it feels like an ad-hoc workaround (BTW, I didn't know that).
It kind of makes sense once you mentally reverse-engineer what Matlab is doing underneath, since ultimately the indexes are pointers, but from an abstract point of view there is no mathematical notation where the second endpoint of an interval is smaller than the first. If you go down that road, now you might ask what is (i:i-2) and so on.
Indeed, you hit the nail on the head. Or more to the point, on the infinitely thin edges between the nailheads.
The first time I saw this explained clearly was in the original Inside Macintosh, describing the QuickDraw coordinate plane. Some excerpts (with emphasis added to a few key concepts):
# # #
The Coordinate Plane
All information about location or movement is given to QuickDraw in terms of coordinates on a plane. The coordinate plane is a two-dimensional grid, as illustrated in Figure 2.
Note the following features of the QuickDraw coordinate plane:
• All grid coordinates are integers (in the range -32767 to 32767).
• All grid lines are infinitely thin.
These concepts are important. First, they mean that the QuickDraw plane is finite, not infinite (although it's very large). Second, they mean that all elements represented on the coordinate plane are mathematically pure. Mathematical calculations using integer arithmetic will produce intuitively correct results. If you keep in mind that grid lines are infinitely thin, you'll never have "endpoint paranoia"—the confusion that results from not knowing whether that last dot is included in the line.
Points
There are 4,294,836,224 unique points on the coordinate plane. Each point is at the intersection of a horizontal grid line and a vertical grid line. As the grid lines are infinitely thin, so a point is infinitely small.
Figure 3 shows the relationship between points, grid lines, and. pixels, the physical dots on the screen. (Pixels correspond to bits in memory, as described in the next section.)
Rectangles
Any two points can define the top left and bottom right corners of a rectangle. As these points are infinitely small, the borders of the rectangle are infinitely thin (see Figure 4).
# # #
The full PDF is available here (or via a search for "Inside Macintosh"), and it's better with the illustrations:
The description of the QuickDraw coordinate plane is on pages 148-151 (I-138 to I-141).
Figure 3 is especially good. It shows how grid lines and points are infinitely thin/small, but pixels occupy the space between the gridlines.
Disclaimer and shameless plug: My friend Caroline Rose wrote Inside Macintosh, and she still writes. If you want someone who understands technical concepts and can explain them clearly, look her up.
Awesome reference! In fact, I was going to write that the interpretation with positions is "geometric" or "spatial", as opposed to "ordinal", but then I thought the analogy might be a little far-fetched so I left it out.
Pretty soon, I figured out that if Caroline had trouble
understanding something, it probably meant that the
design was flawed. On a number of occasions, I told her
to come back tomorrow after she asked a penetrating
question, and revised the API to fix the flaw that she
had pointed out.
It should be a reminder that to the engineer who wrote the API everything is obvious in retrospect, even the inconsistent or poorly thought details. A fresh set of (smart) eyes is essential to bring some perspective.
Inside Macintosh was an absolutely excellent series of books. I read volume I from front to back in middle school, and it taught me much of my English.
APL uses, by default, an origin of 1. Having used APL extensively for probably around ten years for everything from database management to financial, mathematical, image processing and even DNA sequencing applications I would conclude that an origin of 1 is not a problem. At least it isn't for APL.
That said, APL does have a system variable known as the "index origin" that allows you to change the origin as desired when needed. The variable is called ⎕IO (pronounced "quad i o". The "⍳" operator is roughly equivalent to python's range(n) function.
With the default ⎕IO set to 1:
⍳10
1 2 3 4 5 6 7 8 9 10
a ← "This is a string"
a[1]
T
If you set it to 0:
⎕IO ← 0
⍳10
0 1 2 3 4 5 6 7 8 9
a[1]
h
You can use this to your advantage during computation. For example, to go along with your image element calculation, you could set your index origin to 0, do the math index and whatever else needs to be done with ⎕IO at 0 and then go back to 1. I found the concept to be really powerful.
There is nothing less expressive about 1-based counting, it is only... different. Empty ranges are usually marked with 0 or empty index. Negative indices tend to mean "get everything except those" instead of "count them from tail". Also in 1-based languages element x,y is usually just A[x,y].
> That makes closed ranges strictly less expressive than half-open ranges, making often necessary to write special cases for zero-length.
Be a little creative. If you have slice like list[a:b] a could be the the first element and b could the length of the slice.
Alternatively the empty slice could just be b < a, instead of b <= a as it is now.
> The other big issue, as many comment, is that the arithmetic to index multi-dimensional arrays is cumbersome with 1-based indexing: for example if I is a 2d image in row-major, the element (x, y) is at position I[(y - 1) width + x] instead of I[y * width + x]*
In most cases it's better to use numpy for that anyway.
However, note that is more or less the reason we have 0-indexed arrays in C. In C the array is bascially just a another synatax for pointer arithmetics. So, I[0] is the same I + 0 * sizeof(int) (assuming I is of type of int).
I agree, also I find that the original article linked in TFA is very interesting but a bit disingenuous.
He starts of by saying "you think 0-indexing was chosen because it makes pointer arithmetic simpler but you're completely wrong". Then he digs up the origins of the concept in BCPL: the reason is that it made pointer arithmetic simpler. Oh.
And then he goes on a contrived (but still interesting) explanation of how it's not because it made pointer arithmetic simpler but rather that it made it less CPU-intensive. Definitely worth a read but I didn't really like the patronizing tone.
Some versions of BASIC going way back into the 80's (I very dimly remember Microsoft BASIC for Macintosh having it) have an OPTION BASE keyword allowing you to actually specify which array base to use (0 or 1).
Although I kind of like the following solution I found in a dusty corner of the Net:
The article linked from this plus page (http://exple.tive.org/blarg/2013/10/22/citation-needed/) poses the question "why do programmers start counting at zero?" and then shows contempt towards anyone who thinks they know the answer (while dismissing Dijkstra as "incoherent", without argument).
To the author, the correct way to answer this question is by investigating the history of how things happened to end up the way they are, which (apparently) no one besides him has ever done.
While history is interesting, I think the much more important question is: why ought we to count from zero (or not)? And the answer to that question has nothing to do with history.
In addition to Dijkstra's argument, mentioned elsewhere in this thread, 0-based addressing is more efficient (since it can be used directly as an offset, without having to bias it first).
As much as I like Lua, its 1-based indexing is not my favorite part of that language. In particular, the numeric for loop is confusing, because the loop:
for i=x,y do
-- stuff
end
...will execute (y-x+1) times, not (y-x) as it would in most languages, like Python:
for i in range(x,y):
# stuff
Off-by-one errors are some of the most annoying bugs. 0-based indexing and half-open ranges help avoid them in my experience.
There's another reason that I haven't see anyone bring up. If your native unsigned integer type is the same size as a word (which is common), and a word defines the limits of your address space, and if you have a byte pointer initialized to 0, then 0-based indexing allows you to index into every byte in your address space, whereas 1-based indexing prevents you from indexing the final byte in your address space.
Additionally, assuming an unsigned integer type is used for indexing, 0-based makes every index valid and meaningful, whereas 1-based leaves one index value (0) as invalid.
Erk, no it isn't. If your semantics require an optional value, then encode it that way explicitly. Don't inflict special cases and bugs on everyone else.
Ruby has "nil", and 0. They are not the same, and should not be considered the same.
Would you say that accessing coordinate 0,0 is a "logically invalid operation"? What is an array but a 1 dimensional coordinate system?
Are you suggesting that accessing an array via an uninitialized parameter will show up as a runtime error because of 1-based indexing? That only happens if your language initializes values to 0.
Ruby initializes everything to nil, not 0, and yes, indexing by nil will fail.
Perhaps I'm not seeing your problem or solution because Ruby doesn't even have the kind of problem that is solved by a 1-based array index. See: the Sapir-Whorf Hypothesis http://en.wikipedia.org/wiki/Linguistic_relativity , your thinking may be suffering due to the language you are typically working in.
>Would you say that accessing coordinate 0,0 is a "logically invalid operation"?
If you have a matrix that starts at cell 1,1 then trying to access 0,0 is absolutely a logically invalid operation. I'm not saying that 0-indexing is illogical, I'm saying that dereferencing null/nil is illogical.
As for the rest of your post, the scenario eridius came up with is pretty weird, at this point I'm just going to shrug and give up on analysis.
Would you rather I write "index that is 0 in a world where indexes start at 1" over and over? I am willing to do that if it helps make the conversation smoother.
>much less an operation to "access a null index".
foo_array[0]; //operation to do an access with a null index
>Also, believe it or not, on various machines and OS's it's legal to map address 0.
Which is a bad thing. You get a couple extra variable slots in exchange for thousands of privilege-escalation bugs that could instead be crash bugs.
How in the hell does "accidentally" accessing the first element of an array (by "accidentally" passing 0 to an array in a language that has 0-based indexing) result in "privilege escalation errors"? That's just "shitty programming".
For that matter, how does a 1-based indexing language NOT result in far more "off by one" errors which would include the kind of errors commonly known as "buffer overflow" errors?
You specifically said the native word size and address space. That's going to be a pointer, where a value that causes the CPU to trap is quite handy.
And there's already plenty of invalid index values. -1 for example. There's no way to make an allocation the size of your entire ram, so 0xff...f is going to be invalid.
> That's going to be a pointer, where a value that causes the CPU to trap is quite handy.
You're confusing an invalid pointer with an invalid index. There is absolutely no reason for any index to ever be invalid.
> And there's already plenty of invalid index values. -1 for example.
-1 is not representable with unsigned integers. There are no invalid index values in a 0-based indexing system (using unsigned integers, as I initially stated).
> There's no way to make an allocation the size of your entire ram, so 0xff...f is going to be invalid.
This is so completely off the mark I'm surprised you even came up with it. There are several things wrong with this (off the top of my head):
1. I don't need an allocation the size of my address space in order to want to address both the first and last bytes in the address space.
2. You don't know what machine I'm running on. This could be an embedded 16-bit machine without multiprocessing with 64KB of RAM. Every single addressable byte is valid and corresponds to RAM storage.
3. It doesn't even matter whether or not a given address can be loaded. Indexing a pointer does not necessarily require loading it. As a trivial example, I could take the address of the resulting indexed value.
>You're confusing an invalid pointer with an invalid index. There is absolutely no reason for any index to ever be invalid.
I'm not confusing them. You are very clearly describing a pointer.
>-1 is not representable with unsigned integers.
When I wrote -1 I meant "the value you get when you type -1 in your source code".
>This is so completely off the mark I'm surprised you even came up with it. There are several things wrong with this (off the top of my head):
>1. I don't need an allocation the size of my address space in order to want to address both the first and last bytes in the address space.
That's a pointer issue. If I allocate 20 bytes near the end of ram, my last index is only going to be 19 or 20, so it doesn't matter if I can't go to UINT_MAX.
> 2. You don't know what machine I'm running on. This could be an embedded 16-bit machine without multiprocessing with 64KB of RAM. Every single addressable byte is valid and corresponds to RAM storage.
Again, that's a pointer.
> 3. It doesn't even matter whether or not a given address can be loaded. Indexing a pointer does not necessarily require loading it. As a trivial example, I could take the address of the resulting indexed value.
The article you link to is interesting with a lot of research, but it is rather hostile and reaches a strange conclusion. The article quotes the creator of BCPL (precursor language to C): "if p is a pointer [...] Naturally p+0 has the same value as p. [... discussion of the indirection operator ...] I can see no sensible reason why the first element of a BCPL array should have subscript one." So the explanation from the creator of BCPL is basically what you'd expect, that 0-based is the natural thing to do if you're working with machine-level pointers.
But then the article goes off into compiler optimization, a total misunderstanding of time-sharing, and a misunderstanding of how indexing works and concludes that in BCPL arrays started at 0 to minimize compile time to avoid job preemption, and that's why many languages today use 0-based arrays. The article is still worth reading, though.
VBScript is 0 based for some stuff, but 1-based for slicing. Drove me nuts for a day until I figured it out inspite of the criptic error message. Hate VBScript for this reason alone.
One thing to remember is that Lua arrays are 1-based by convention, not requirement, since they are really just tables indexed by integers. So you can have zero-based arrays in Lua, but you'll be making life difficult for yourself as the convention, the libraries, and array literals will all want to give you 1-based arrays.
...and if you're using LuaJIT, 0-based arrays will even be optimized appropriately (i.e., array[0] will be stored internally in the array rather than in the hash).
But you'll still be fighting with the libraries and array literals, as you say.
Wow, that's a great little read. I'm surprised I haven't come across that before. I've always had that intuition, as I'm sure most experienced folks have, but it's the first time I've seen it articulated.
I love things like this, i.e., demonstrations that these mathematical primitives are naturally elegant. It's often so hard to talk about, or even realize why, when given 2 or more viable choices, one is naturally better.
I studied at Eindhoven University of Technology, where Dijkstra walked around for quite some time. His students stayed with him and became teachers. They all copied his handwriting.
No kidding - my first two years in college consisted, for a large part, of watching elderly men with beards write on blackboards in exactly that handwriting.
I can't help wonder whether they all drove the same car as Dijkstra too.
I am not following his language, partially the grammar and partially his lexicon-- specifically what he means by "natural number".
There is a smallest natural number. Exclusion of the lower bound --as in b) and d)-- forces for a subsequence starting at the smallest natural number the lower bound as mentioned into the realm of unnatural numbers.
The natural numbers are defined as either the positive integers (1, 2, 3, ...) or the nonnegative integers (0, 1, 2, ...). In either case there is a smallest one (either 0 or 1). And so if you use a notation that excludes the lower bound then to include the smallest natural number (either 0 or 1) you have to define your sequence in terms of a number outside the set, which is awkward.
A more practical example is that you are using unsigned integers, so the smallest number is 0. If you use an exclusive lower bound, you can't actually represent any range containing 0, because -1 is not a valid number you can store.
When I read the headline I immediately thought of a short article by Knuth doing the same reasoning; now I think I may just have remembered the wrong name. Is somebody aware of a similar article or lecture notes by Knuth - I failed googling it?
I've spent a lot of time in both domains, and the Python-style half-open interval is the clear winner, because it produces the fewest "+1" and "-1", each and every one of which is begging to be a bug. I disagree with Dijkstra on some other counts, but in this case he is correct, both in theory and in practice.
It's easier today than it used to be 1 indexed, though, because there are more languages that support some form of
for element in collection: ...
instead of the incredibly error-prone
for (int i = 0; i < something; i++) { ... }
Or, was it
for (int i = 0; i <= something; i++) { ... }
Or was it
for (int i = 1; i <= something + 1; i++) { ...}
You have fewer places to make fencepost errors when you use the first style, so the costs of 1-based indexing are fewer. They're still higher than Python-style, though, because in Python you have both the correct for loop and still have the correct intervals.
When I started programming I quickly found that 0-based indexing was generally simpler. A lot of the time, it makes no difference one way or the other (because you're only using indexes to get at each item in turn, so you just memorize the boilerplate for whichever language you're using), but the moment you introduce ranges, or try to convert dimensionality - particularly with random access - or do some wraparound, or try to implement something like a ring buffer, it's all too easy to come a cropper. Either you put "1+" on every array access, which is error-prone, or you try to manage two different "spaces" (the 0-based one that's the ideal representation and the 1-based one used for indexing) and carefully convert between them... which is error-prone too.
Alternatively, you could just decide to start indexing at zero.
...or you could stick with 1-based indexing, it's up to you. The right thing to do doesn't become wrong just because you don't do it ;)
This is not meant to be tongue-in-cheek - I thought this argument was settled ages ago. Are there any respected languages that use 1 based indexing? This is an honest question - every single one I've ever used (C, C++, Java, Scala, Python, JS, Objective-C, a bunch of academic ones) have been zero based. It's quite clear that it's the right solution, since the first element of an array is zero distance away from the beginning of the array.
> Are there any respected languages that use 1 based indexing?
Fortran, Lua, Mathematica, MATLAB, Julia, and Smalltalk among current languages. Among historically important languages, COBOL, Algol68, PL/I.
> It's quite clear that it's the right solution, since the first element of an array is zero distance away from the beginning of the array.
You think it is clear because you are used to thinking of array indexes as offsets. If you think of them as ordinal numbers for describing the position in a list, then it is clear they should start at 1. If someone's grocery list is:
bread
milk
eggs
flour
and you ask them "what item is #1 on your list?", most people will say bread.
If you look at that list I gave of 1-based languages, you might note that many of them are oriented toward math/science/engineering. In math, both 0-based and 1-based are used. For example, matrix elements are specified by row and column number, and both of those usually start at 1. On the other hand, polynomial coefficients usually start with 0.
Ignoring the fact that equating what "normal people" would think in contrast to what a trained professional would think in the context of said profession, 1-indexing leads to other problems.
foreach is great, until you need an index. When you do, 0-indexing is more natural. Take, for example, looping over an image.
unsigned char *buf = get_image();
for(int y = 0; y < height; ++y) {
for(int x = 0; x < width; ++x) {
// simple
unsigned char pixel = buf[y * width + x];
}
}
That doesn't work when y begins at 1. All of a sudden you need to add and/or subtract 1 everywhere to keep things in line. I used C here, but this applies to many languages in many different circumstances where an index/count is required.
"Ignoring the fact that equating what "normal people" would think in contrast to what a trained professional would think in the context of said profession isn't always an apt comparison..."
fortran really lets you start from any integer index, including negative ones, 1 is just a default if you omit the lower bound. But it's not half-open when you declare (or use slicing) i.e. it uses inclusive lower and upper bounds, so A(-17:-17, 10:10) has one element not none. It might or might not have been nicer if they'd been half-open. Or one could also imagine a start+length type declaration, but meh.
Lua is probably the strongest candidate for a respected 1-based language. Smalltalk, Fortran, and MATLAB are other examples of widely used languages that are 1 based.
I am going to learn Lua and use it much more now, thanks for pointing this out. As someone said, having 0 based indexes is inhuman way of dealing with arrays, only made some sense in C, others just copied.
Matlab does (or at least it did when I used it for a class a few years ago). I was trying to iterate through some array for a project and kept hitting an error, and was totally stumped until some EE major came over and fixed it. "Haha did you forget how to count? You're starting the index at 0, that's the problem."
Did the same guy remind you that "i" is a poor choice for an index in Matlab because it's actually sqrt(-1)? And _then_ tell you that you're supposed to do array reshaping instead of loops in Matlab? Because if the same guy had hit me with all three I might have killed him.
I've just started using Octave (for the coursera machine learning course) and it also starts at 1. Takes a few exercises to reaclimatise. More annoying is that it doesn't auto broadcast matrix operations like numpy does (and the slicing doesn't feel as powerful either).
> since the first element of an array is zero distance away from the beginning of the array.
That's much less of a win in any language but C, where "array" means something more than "pointer." If you're checking array bounds on every access then the extra assembler instruction to subtract one from your index doesn't matter in comparison, whereas if you're just dereferencing a pointer it could potentially double the cost of array accesses (two instructions instead of one if your instruction set has base + offset addressing modes).
I'm just thinking logically - not even on a pointer level. If I'm doing index/offset math to calculate an array position, it's foolish to start at 1 instead of 0.
Adding things that aren't conceptually indices to indices works the same way in both conventions (so a[idx] and a[idx+k] have k-1 things in between). You miss out on stuffing multiple dimensions into one with a[x*xlen+y], but there's generally no reason to do that.
Could you elaborate about the uses you have in mind?.
> On top of that other languages that antedate BCPL and C aren’t zero-indexed. Algol 60 uses one-indexed arrays, and arrays in Fortran are arbitrarily indexed – they’re just a range from X to Y, and X and Y don’t even need to be positive integers.
It was actually a pretty interesting read, I recommend it if you have the time.
Mathematica is LISP, so it's really 0-based indexing, where the first element is the function you're applying. All the sugar and literals are made to hide this, but you can code like you're dealing with a real LISP.
In my opinion Matlab the language sucks even for its intended use. What makes the complete package arely acceptable to work with is the environment, the documentation and the included libraries.
I did 5 years of image processing in Matlab but then switched to Python 4 years ago. One of the best choices I did in my programming career.
Interesting. I make a living developing backend systems in Python, but picked up Matlab (Octave) for machine learning, and really enjoyed doing linear algebra on it. For PoCs and general experimentation I find it more amenable than messing with Numpy. Of course it isn't great for building complete systems, but for self-contained programs - which I believe is the intended purpose - it works pretty well.
This is not correct. Perl has arrays and, as the person you are replying to said, allows the index of the first array element to be changed globally by setting a special variable. $[, if I recall correctly.
Off the top of my head, Lua uses primarily 1-based arrays. (Of course, Lua arrays are actually associative arrays, so negative numbers and 0 are also valid indices, but all the standard library functions except your arrays to start at 1.)
This won't be a big deal. The 0 key will be in the hash part, and the 1-#t keys will be in the array part. The language doesn't expose the distinction, so the loop-related awkwardness is limited to a tiny performance hit.
Ah yes, but don't forget this is all changing in Python 4 :) -- remember the "List Revolution" thread when Guido finally agreed to switch to 1-based indexing...
On Fri, Sep 9, 2011 at 2:12 PM, Christopher King <g.nius.ck at gmail.com> wrote:
> The first element in a list is element zero, the second is one, the third it
> two, and so on. This some times confuses newbies to the language or
> programming in general. This system was invited when single bits where
> precious. It's time to update. Keep in mind this is something for version 4,
> since its not reverse compatible. I say we make the first element 1, second
> 2, third 3, and so on. Other languages would follow. We are python, made for
> easiness and readability, and we are in the age where you can't even get
> something as small as a kilobyte USB. We must make first one, second 2, and
> third 3, like it is supposed to be. I give this:
> +1
Consider it done.
--
--Guido van Rossum (python.org/~guido)
Funny thread, but just in case somebody takes it seriously:
(And to those taking the thread seriously: this is all
in jest. We won't change the indexing base. The idea
is so preposterous that the only kind of response
possible is to laugh with it.)
--Guido
a[1..10] # elements from 1 to 10, including 10
a[1...10] # elements from 1 to 9; k...n is a half open range notation
a[3, 5] # 5 elements, starting from position 3
Arrays are still zero based, but I feel this is more a homage to for- and while- loops from older languages.
(for (i = 0; i < n; i++) is just a very nice notation, and so is while (i < n))
Ruby offers three very similar syntaxes for the same concept. Contrast that with Python's "there should be one—and preferably only one—obvious way to do it"[1] ethos.
I found it quite unintuitive too at first. My way of remembering it was to think of the distance between the endpoints. The first example is inclusive because it's closer and the endpoint is "included" whereas the second range endpoint is shunned and shoved further away and is exclusive because it's "excluded"!
I don't know anything about ruby but that seems extremely error prone to me.
EDIT: I think I know why I think it's error prone:
In all programming languages besides the core syntax of the language idioms appear among the community of coders. For instance, in C you travel through an array using something like:
for (i = 0; i < n; i++)
a[i] = ...;
Where n is the length of the array. This is an idiom so common that if for some reason in some code the length of an array is (n + 1), instead of writing for (i = 0; i <= n; i++) I'll write for (i = 0; i < (n + 1); i++).
Because the idiom is so strongly entrenched in my brain (and probably many other C coders) if I read the <= version there's a great chance I'll read it as a "<" and assume i goes from 0 to n - 1.
My point is adding several alternative "syntactic sugar" to express the same thing can end up confusing newbies because they'll have to learn the difference between all version (and recognize them) as well as experienced coders because you're more likely to mistake the meaning of an expression at a cursory glance.
Ruby doesn't nail it using ranges. Ruby nails it by implementing Array#each.
0-index versus 1-index should not be an issue at this level of programming. You shouldn't be using for-loops; you should be iterating from start to finish.
Though - I've written a lot of python code and I never use range. The beauty of python is that all collections are iterable so you seldom need to specify ranges.
randrange is the fixed randint, which works just as you'd expect. It's far preferable to randint, for the aforementioned reason, and ever since I started using it all those annoying off by one errors that randint kept giving me vanished.
The one thing I still use randint for is rolling dice. randint(1,6) is the equivalent of a six-sided die, and is that bit more readable as such than the equivalent using randrange.
But for any other purpose, randrange beats randint hands down. Off-by-one errors, be gone!
Using zero reminds you that black is 0, 0, 0 in 24bit integer rgb, while white is 255, 255, 255, not 256, since you only have 8 bits and 2^8 - 1 = 255, not 256, and calling black 1, 1, 1 makes no intuitive sense. Using zero indexes saves a tremendous numbers of errors by habituating on what is more natural for digital memory and the representation capability of intrinsics while creating a nice correspondence between hex zero, decimal zero, binary zero.
The only time this leads to bugs in Python is when using my_list[len(my_list)] instead of my_list[len(my_list) - 1], where the difference between the magnitude of indexing vs counting can lead to intuitive error. However, it's easier to just write list[-1], knowing that zero is the first element, so counting backwards has to start at a different value. Of course you can do something silly like my_list[-len(my_list)] to just be silly.
Indexing is about having some value to get some position out of an array. Zero is a unique number, so it makes sense to use it as an index. If you start counting with your fingers by holding up none and call it zero, suddenly life makes intuitive sense. If you count the numbers that you count on the way to ten fingers, you get eleven unique numbers for those ten fingers. The difference between array length and array indexing. Magic.
Why is it zero indexed? Tradition. Why did that tradition start? The best explanation I can think of is because array access is a convenient short hand for pointer math:
array[i] translates perfectly into array_memory_start + i*element size. Thus the tradition was most likely born.
> was most likely born
I take it you just posted this comment for fun and didn't RTFA? That was the whole point; the author answered the question of how the tradition was born, down to the person and year.
I did read it. And all he says is that C was zero based but doesn't say why that happened. Unless it was in the comment thread or one of the several second order articles which I'll freely admit I didn't read.
I switched from Matlab to Pyhthon and I am totally happy that Python starts counting at 0 (which is very important if you do image processing). But even after four years of working Python with the open interval notation of the last element it seems wrong to me. I think it is related to the fact that the Matlab version is more direct. a[1:3] means: give me the elements 1, 2, 3. That is very clear and obvious. In python a[1:3] means: give me the element 1 and 2. This is not that obvious and even after four years of python sometimes I have to stop and think about it.
I envision the array on a continuous axis, and place markers at 1 and 3 (the exact positions, not the range of whole numbers). The range on the axis between them is the right range. With this intuition, everything is very simple and makes perfect sense.
Yes, it makes perfect sense, but it still feels wrong for me. I guess it is just related to the fact that I grew up with Matlab and not Python. I was kind of hoping I will get used to it but even after several years of writing image processing stuff in Python I am still not a 100% convinced.
I have read it, and Dijkstra is a good call to authority, but I don't agree.
His opening sentence is instructive: using the 'pernicious three dots', everything is totally clear: with an ellipsis, it's impossible to understand anything other than a closed interval.
It's primary school mathematics that 2 <= x <= 12, i.e. the interval mathematically written [2,12] (or 2,...,12), contains 11 integers. A programmer who doesn't understand that is in a world of trouble as half-open intervals won't stop the inevitable rain of off-by-one errors.
A programmer who actually calculates the size of a list in the way that Dijkstra suggests (specifically, by calculating b-a) is also perhaps acting questionably.
Dijkstra's argument here might make more sense when correctly framed in 1982.
As an aside, if Dijkstra would agree to use a fully-closed interval, his suggested advantage of 0-indexing is eliminated and he would presumably then prefer 1-indexing.
I've been teaching Python to my kids (both in middle school), and had to explain zero based arrays. Fortunately, all explanations automatically translate into "yet another stupid thing that grown-ups do." ;-)
Being somewhat of an old timer, I understand the justification of zero based arrays based on making the hardware simpler and maintaining a tight coupling between the representation of a program in source code and its implementation in hardware. Zero is special because all of the bits are the same. Arrays are zero based because the physical location of the first element is the same as the memory location assigned to the array itself, plus zero.
For the last element, I just remember the mnemonic "up to but not including."
It's a simple way of addressing the difference: 0-indexed "points to" the start of each element. 1-indexed represents the ordinal position.
So to relate to the ruler, the first cm/inch interval is, well, the 1st. But the distance to the start of the first is 0. So if you want to emphasise which numbered interval it is, 1-indexed makes most sense. If you want to emphasise where it starts, or the distance from origin, 0-indexed makes more sense.
1-based counting would make binary very uncomfortable. b0 would be d1 and b1 would be d2. You would have the strange situation of needing more than 1 bit to represent 0. Doesn't seem very intuitive to me, but that's due to familiarity more than anything I suppose.
It doesn't matter how cool the slice notation is when people have trouble accessing single elements!
1 based is better for human counting and 0 is better for certain mathematical expressions, usually when thinking about offset from the start rather then list position.
Clearly it is a judgement call, but I personally think counting is more important for the same reason PI is wrong. We are confusing ourselves because easy things aren't intuitive.
Imagine you have:
var list = ['first','second', 'third']
`list[2]` should clearly be the second element (not third), just as PI / 2 should mean half a circle (not 1/4).
We wonder why there aren't as many computer science / math grads. Well in every intro class in the world people are getting confused by this.
I like 0-based indexing because it makes mathematical sense. By the ordinal construction of the natural numbers, e.g. 5 ~= {0, 1, 2, 3, 4}. So an array of length 5 has indexes corresponding to the five set-theoretic elements of 5.
One thing that annoys me about closed ranges (and it's not much talked about) is that it is impossible to express the empty range: in python [i:i] is an empty range, while closed ranges always contain at least one element. That makes closed ranges strictly less expressive than half-open ranges, making often necessary to write special cases for zero-length.
For example when you want to split an array or a string at a given position i, the partitions will always be a[:i] and a[i:] even if one of the two is empty.
The other big issue, as many comment, is that the arithmetic to index multi-dimensional arrays is cumbersome with 1-based indexing: for example if I is a 2d image in row-major, the element (x, y) is at position I[(y - 1) * width + x] instead of I[y * width + x]. It's very easy to do off-by-one errors this way.
EDIT: adding this just to try some ASCII art :)
I think that the preference about 0-based and 1-based indexing boils down to how you reason about indexes.
I (0-indexing) think of them as the following:
that is, the index is where the element begins. This way there is a special endpoint, the "past-the-end", where the an appended element would begin. In particular, an empty array has both first and last endpoint equal to 0.I believe that people who favor 1-based indexing think like this
that is, in the ordinal way, i is the i-th element.