Seriously, it all depends on whether u're counting the items themselves (1-based) or the spaces btwn them (0-based).
The former uses natural numbers, while the latter uses non-negative integers
For instance, when dealing with memory words, do u address the word itself or its starting location (the first byte)?
The same consideration applies to coordinate systems: r u positioning urself at the center of a pixel or at the pixel's origin?
Almost 2 decades ago I designed an hierarchical IDs system (similar to OIDs[1]). First I made it 0-based for each index in the path. After a while I understood that I need to be able to represent invalid/dummy IDs also, so I used -1 for that. It was ugly - so I made it 1-based, & any 0 index would've made the entire ID - invalid
The mix of "The same consideration applies to coordinate systems:" and "r u positioning urself..." in the same message is ridiculous to me and I can't take anyone seriously who speaks like this.
It's not illegal or immoral, but it is at the very least, demonstrably distracting from their own actual substantive point. Here we all are talking about that when they otherwise had a perfectly good observation to talk about.
Everyone is jumping on your use of the word "serious" as though you meant serious like cancer instead of just not texting ya bro bout where u at tonite, as though there are only two levels of formal and informal, the two extremes.
If this is so casual and we are txting, then why did they use punctuation and no emojis? Ending sentences with periods is verbal abuse now, so, I guess we're not txting after all? Pick a lane!
People taking such offense to something so absolutely inconsequential, on an internet forum no less, is ridiculous to me and I can't take anyone seriously who gets worked up about it.
You, and parent poster, understood them fine. You, and the parent poster, are the ones who are steering the conversation in the direction of how they typed, not what they typed.
They had a "perfectly good observation to talk about", yet u decided not to talk about it.
It's at least as valid, in fact more so, to say that I was distracted by something they decided to say.
There is no objective way to assign all blame for the tangent to just us or just them, however the closest you can come is to say that whoever speaks first is more responsible for their unprompted speech than responders are for their reactions. They chose their reactions, but they are not reactions in a vacuum, they are reactions to something, that came from someone else.
I'm a native English speaker and whether it's harder to read depends on what you mean by "harder". It's immediately obvious what it means (so it's not "harder" in that sense), but on the other hand, it's jarring and distracts me from the point being made.
“serious discussion” notwithstanding, I thought the “u’re” was particularly interesting. Why not “ur”? “u” is obviously easier to write, so why worry about properly contracting it? Just interesting imo
Abbreviations can only be useful when they're unambiguous within their context. Maybe “u’re” is unambiguous here, but “re” might have been a step too far to still know what word writer was abbreviating.
Not to contradict you, but fascinating coincidence: my favorite alleged originator of OK — "Old Kinderhook," Martin Van Buren — is to date the one and only U.S. president who did not speak English as his first language.
(But "oll korrect" is apparently attributed to Andrew Jackson, who was a native speaker, yes.)
Not sure if it's the case here but over the decades I've observed there's a cultural aspect here. When I've had colleagues from the Middle East and/or South Asia I've found they're much more likely to use `u` and `r` colloquially than western counterparts.
This could be what you're observing. Or perhaps they just like the aesthetics.
I've never understood the universal acceptance of poor writing on mobile; everyone immediately throws their hands up and goes ahhh okay that makes sense then.
If only smartphones had some means of seeing the typed output... a screen perhaps? Icing on the cake would be some kind of backspace button, which together would enable proofreading! You know, like in other forms of written communication.
Basically everyone will insist it's entirely to be blamed on the phone, and we're expected to believe that no, really, the moment they sit at a physical keyboard they reliably distinguish your from you're etc.
I knew who it was, sorry, I was just trolling. Both twitter and AI-bro culture are extremely distasteful to me so I thought citing an AI bro’s twitter account as an example of how it’s acceptable to behave online was absurd.
Ah. I took it to be less "sama did it, so it's OK," and more "you wouldn't gatekeep sama's English, so why gatekeep the English of a random HN commenter?" But maybe that was the wrong interpretation.
I have started using “u” in my posts as there are a surprising number of sites that block or delay for approval posts that use “you”. They seem to feel that posts with “you” are too often inflammatory. It is very frustrating when you use grammatical “you” as I just did and suddenly your post is stuck in limbo.
Zero is a natural number. It is in the axioms of Peano arithmetic, and any other definition is just teachers choosing a taxonomy that best fits their lesson.
> Zero is a natural number. It is in the axioms of Peano arithmetic, and any other definition is just teachers choosing a taxonomy that best fits their lesson.
It is, but it need not be. In the category of pointed sets with endofunctor, (Z_{\ge 1}, 1, ++) and (Z_{\ge 0}, 0, ++) are isomorphic (to each other, to (Z_{\ge 937}, 937, ++), and to any number of other absurd models), so either would do equally well as a model of Peano arithmetic.
I may be misunderstanding your argument, but if it's that of a simple offset, then only the one starting from 0 forms a monoid (a group without an inverse to each element). Though, of course, you could redefine the + operation...
> I may be misunderstanding your argument, but if it's that of a simple offset, then only the one starting from 0 forms a monoid (a group without an inverse to each element). Though, of course, you could redefine the + operation...
Yes, agreed, there is other algebraic structure that can tell the difference, but Peano arithmetic by itself cannot.
I think I’m missing something here. PA defines x * 0 = 0 for all x. So while we could take (Z+, 1, ++) as a model of it, we would be imposing a completely different definition of multiplication than the usual. Would this not be simply choosing to label 1 as 0 and work from there?
> I think I’m missing something here. PA defines x * 0 = 0 for all x. So while we could take (Z+, 1, ++) as a model of it, we would be imposing a completely different definition of multiplication than the usual. Would this not be simply choosing to label 1 as 0 and work from there?
Despite the name, in the usual mathematical meaning of the term, Peano arithmetic does not define arithmetic at all, only the successor operation, and everything else is built from there. Once we have those, for the model (Z_{\ge 0}, 0, ++), we certainly usually do define x0 = 0 for all x; and, you're right, if for the model (Z_{\ge 1}, 1, ++) we defined x1 = 1 for all x (as no-one could stop us from doing), then we'd just be dealing with "0 by another name." But it might be equally sensible, if our model of Peano arithmetic is (Z_{\ge 1}, 1, ++), to define x1 = x for all x, in which case we recover the expected arithmetic.
In the usual terminology, these are not axioms; as your wording itself says, they are definitions. (Indeed, I'd argue that it's almost ungrammatical to say something is "defined in the axioms"; axioms may, and probably must, be stated in terms of definitions, but the definitions are not themselves axioms.) As I say, one can quibble about terminology, since what's important is less what's axiom and what's definition, and more what we can build on top of both; but the usual mathematical presentation separates out the axioms (numbered 1–9 at https://en.wikipedia.org/wiki/Peano_axioms#Historical_second..., though things like 2–5 wouldn't usually be stated as an axiom of the theory but rather of the ambient logic) from the definitions (see https://en.wikipedia.org/wiki/Peano_axioms#Defining_arithmet...).
(Now having written that and looking back, I see that, in my previous post https://news.ycombinator.com/item?id=43442074, I wrote "Despite the name, in the usual mathematical meaning of the term, Peano arithmetic does not define arithmetic at all, only the successor operation, and everything else is built from there." Perhaps this infelicitious-to-the-point-of-wrong wording of mine is the source of our difference? I meant to say that Peano arithmetic does not axiomatize arithmetic at all, but that arithmetic can be defined from the axioms. Thus the specific definition x[pt] = [pt] is eminently sensible if we consider the distinguished point [pt] to be playing the usual role of 0; but the definition x[pt] = x is also sensible if we consider it to be playing the usual role of 1, and even things like x[pt] = x + x + x + x + x can be tolerated if we think of [pt] as standing for 5, say. The axioms cannot distinguish among these options, because the axioms say nothing about multiplication.)
No, they are axioms. Peano arithmetic itself is a first-order theory, and a theory is just a recursively enumerable set of axioms.
Enderton, “A Mathematical Introduction to Logic, 2nd Ed.”, p,203,269-270
Kleene, “Mathematical Logic”, p.206
EDIT: It seems like you're talking about Peano's original historical formulation of arithmetic? That's all well and good but it is categorically not what is meant by "Peano Arithmetic" in any modern context. I've provided two citations from pretty far apart in time editions of common logic texts (well, "Mathematical Logic" is a bit of a weird book, but Kleene is certainly an authority) and I hope that demonstrates this.
There's a lot of reasons that the theory is pretty much always discussed as a first-order theory. The biggest, of course, is that when taken as a first-order theory it fits neatly into the proof and statement of Godel's Incompleteness Theorems, but iiuc it's just generally much less useful in a model theoretic context to take it as a second order theory (to the point where I only ever saw this discussed as a historical note, not as a mathematical one).
EDIT 2: This is all a digression anyway. Both first- and second-order PA label the start of the Z-chain as 0; so any model of PA contains 0 when interpreted as a model of PA.
I'm away from my library, but fortunately the books you referenced are a Google away, so I could consult them and confirm that they say what you say. I'm not quite willing to accept Kleene's word as an authority on common modern mathematical practice, since he was a theoretical computer scientist before the term, but, though I'm not familiar with Enderton's book, it certainly looks like a reasonably standard one.
But these are all referring to Peano arithmetic as a model of the theory of the natural numbers. And that seems a bit silly: the impact of Peano's work wasn't because he showed that there was a model of the theory of the natural numbers, which everybody believed if they bothered to think about it, but because he showed that all you needed to make such a model was a successor operation satisfying certain axioms. Yes, they may be less model-theoretically congenial because they're second order, but to change Peano's work from what he did historically and still call it Peano's seems strange to me. (I'm fine with dressing it up in modern language, and calling it an initial object in the category of pointed sets with endofunctor, which perhaps is biased but still seems to me to be capturing the essential idea.)
Certainly I was taught the second-order approach, though it was as an undergraduate; I've never taken a model-theory class. As I say, I'm away from my library and so can't consult any other sources to see if they still teach it this way, and anyway I am a representation theorist rather than a logician; but, if the common logical approach these days really is to discard Peano's historical theory and to call by Peano's name something that isn't his work, even if it is more convenient to use, then I think that's a shame from the point of view of appreciating the novelty and ingenuity of his ideas. But just because I think something is a shame doesn't mean it's not true, and so far you've produced evidence for your view and I can't for mine, so I can't argue any further.
It’s not really considered throwing away Peano’s work. Peano was working in the very infancy of logical formalism.
As it turns out, further work developing on his discovered that using a recursively enumerable schema for induction rather than a second-order induction axiom gives rise to a simpler abstraction that still has all the properties that Peano actually desired, and which makes further developments in the space much easier.
Continuing to call it Peano Arithmetic is respect for the fact that the guy got it mostly right, and it took the mathematics community many more years to refine the ideas to their current point.
Is it a shame that Galois theory isn’t presented as a historical fossil and frozen to its state of development in Galois’s lifetime? I may be making a rather big assumption, but I like to think he would be proud, and so would Peano.
> Continuing to call it Peano Arithmetic is respect for the fact that the guy got it mostly right, and it took the mathematics community many more years to refine the ideas to their current point.
> Is it a shame that Galois theory isn’t presented as a historical fossil and frozen to its state of development in Galois’s lifetime? I may be making a rather big assumption, but I like to think he would be proud, and so would Peano.
Oh, by no means do I object to calling an updated and generalised version by the name of the person who originated the subject! Since you've brought up Galois, I hardly think that he'd recognize the modern theory of Galois connections, but I think that the name is wholly appropriate.
No, what I thought was a shame is if the original theory doesn't get discussed at all. If my only exposure to Peano's work was, for example, the axiom schema in Enderton, then I don't think I'd be able to appreciate why it's such a big deal. That would feel to me like teaching the theory of Galois connections without ever saying anything about field theory! Whereas, on the other hand, I did immediately understand as an undergraduate the magic of being able to define everything in terms of the pointed set using induction, and I think I'd appreciate even more having seen that first and then seeing how it is updated for modern mathematical logic.
In fact, at a casual glance, I still don't see why L1, L3, and the A, M, and E axioms can't be omitted in the presence of the axiom(s) on p. 269, which has been the whole substance of my objection. I believe that there's an answer, but, if I don't see it as a professional mathematician (though not a logician), then surely it can't be true that every undergraduate will appreciate it!
Second addendum, chapter 4 is about second order logic and apparently I just forgot that exercise 1 is simply showing that you get all of the structure built up in Chapter 3 with Peano's original formulation in second-order logic. Seems that here I'm the one suffering from a lack of historical context!
I think from a logic standpoint this also makes sense -- getting to undecidability quickly makes taking the direct route through first-order logic more appealing.
If I'm being honest, I now do feel a little bit deprived, I probably would have enjoyed the categorical view when I was learning this too.
Oh, then yeah, I totally agree. In general I think it's a shame that so little emphasis is placed on the history of mathematics, though at the same time I appreciate that most of my peers just didn't care :(
> EDIT 2: This is all a digression anyway. Both first- and second-order PA label the start of the Z-chain as 0; so any model of PA contains 0 when interpreted as a model of PA.
Ah, good point that this was the actual source o# the discussion. This one at least can be argued, because the question is about how things should be axiomatized/defined, not how they are. And certainly the theory of the "natural numbers starting with 1" can be axiomatised just as well as the "natural numbers starting with 0." All these axioms are made by humans, and an appeal to existing axioms here can only say what's been done, not what should be. (And I say this as someone who does start my naturals at 0.)
There is no consensus on that, and it's not just about teachers. It depends on the mathematical field and tradition. It usually starts at 1 in German, at 0 in French due to the influence of the Bourbakis, and in English I think it's more field-dependent.
I never realized it was controversial. I think I've always included 0 in the nat numbers since learning to count.
But there are some programming books I've read, I want to say the Little Typer, or similar, that say "natural number" or "zero". Which makes actually confuses me.
> zero represents an absence of quantity and doesn't appear in Nature
From one point of view, zero never appearing in nature is exactly an example of it appearing in nature!
From another point of view, do you not think a prairie dog has ever asked another prairie dog, "how many foxes are out there now?" with the other looking and replying "None! All clear!"? Crows can count to at least 5, and will count down until there are zero humans in a silo before returning to it. Zero influences animal behavior!
From a third point of view, humans are natural, so everything we do appears in nature.
From a fourth point of view, all models are wrong, but some models are useful. Is it more useful to put zero in the natural numbers or not? That is: if we exclude zero from the natural numbers, do we just force 90% of occurrences of the term to be "non-negative integers" instead?
> From another point of view, do you not think a prairie dog has ever asked another prairie dog, "how many foxes are out there now?" with the other looking and replying "None! All clear!"?
type PrairieDogFoxCount = NoFoxesAllClear | SomeFoxes 1..5 | TooManyFoxes
type CrowCount = Some 1..5 | UpsideDown 5..1
type HumanProgrammerCount = 0..MAXINT
type HumanMathematicianCount = 0..∞
My point is: "No Foxes - All Clear" is not the same thing (the same level of abstraction) as 0.
> From a third point of view, humans are natural, so everything we do appears in nature.
using this definition everything is Natural, including fore example Complex numbers, which is obviously incorrect, and thus invalidates yr argument
> From a fourth point of view, all models are wrong, but some models are useful. Is it more useful to put zero in the natural numbers or not? That is: if we exclude zero from the natural numbers, do we just force 90% of occurrences of the term to be "non-negative integers" instead?
all models are wrong, but some are really wrong
If all u care is the length of the terms, i.e. "Natural" vs "non-negative integers", then what's wrong with 1-letter set names, like N, W, Z ?
I think the usefulness of including 0 into the set of natural numbers is that it closes the holes in various math theories like [1,2]
> using this definition everything is Natural, including fore example Complex numbers, which is obviously incorrect
No, that's not "obviously incorrect", nor does it invalidate my argument: that is my exact argument. Complex numbers appear in electromagnetism, in exactly the same sense of "appear", as whole numbers appear in herds of sheep. Which is to say, it's the simplest and most useful model of the situation. And what's more natural than one of the four fundamental forces of nature? And the weak & strong nuclear forces have even more esoteric math structures appearing in their most parsimonious models as well.
> "No Foxes - All Clear" is not the same thing (the same level of abstraction) as 0.
In your model. In my model, it is the same thing. All models are wrong; some models are useful. Which one is more useful? Almost always, the one with 0 as a natural number. What about this:
> using this definition everything is Natural, including fore example Complex numbers, which is obviously incorrect, and thus invalidates yr argument
Except you’re wrong here; should we thus call your argument “obviously incorrect”?
Complex numbers are natural; they’re fundamental in quantum mechanics. Ever since Schrödinger’s equation fundamentally required them for time evolution of states, physicists (and philosophers) wondered if they could be removed. Recent experiments say “no.” QED and QFTs are the most precise theories known in all of science.
They’re more than useful: they’re required, hence the research demonstrating it that I linked. That you don’t understand it doesn’t mean the rest of science is at your level of ignorance on the topic.
Your repeated, willful ignorance on a topic, especially when shown to you, is why you have such low understanding of the incorrect claims you make.
Take a moment and learn. Then maybe you’ll not repeat claims shown to be wrong.
Yes, they have been observed in the same rigor as any number you claim has been observed. If you’d spend a moment and read instead of repeated willful ignorance, you’d learn something.
Now go do your homework. Attaching idiot phrases like complex apples is as stupid as claiming we don’t see radio waves so they can’t exist or that matter cannot be mostly empty space because you can stack books.
Your limited imagination, understanding, and unwillingness to learn, even when given a source and phrases to look into, doesn’t apply to those scientists that have done the work.
A symbol being arbitrary doesn't influence the reality of the meaning behind a thing. I've always thought about `zero` while counting, it never was about `0`.
I observe zero.
I don't think zero is an absence of quantity. I don't think zero is the null set.
You can write types in a programming language, but there are other type theory books that do include zero in the natural numbers. And type theory comes from number/set theory. So it's ok if you decide to exclude it, but this is just as arbitrary.
In fact I'd be happy to write `>=0` or `>0` or `=0` any day instead of mangling the idea of zero representing 0 and zero representing something like `None`, `null` or any other tag of that sort. I don't think the natural world has anything like "nothing" it just has logical fallacies.
it cannot be observed directly at any static point in time, but it can be observed as a dynamic process when some quantity goes down to empty and back up over time
> In fact I'd be happy to write `>=0` or `>0` or `=0` any day instead of mangling the idea of zero representing 0 and zero representing something like `None`, `null` or any other tag of that sort. I don't think the natural world has anything like "nothing" it just has logical fallacies.
N, W, R, etc. - r just well-known names for sets of numbers, nothing stops us from defining better or additional names for them (with self-describing names)
We can discuss Empty type[1] vs Unit type[2], but I think it goes off-topic
It's funny because pi is the joke compromise between 0 and tau, the circumference of the unit circle, and the period length of the trigonometric functions.
I think we can pretend now that anyone talking about pi is just being sarcastic? It is such a random and useless number, a perfect candidate for a funny meme
Actually, the duality arises from counting (there can be 0 items) and ordering (there is only a 1st item), conceptually. Which is why the year 2000 can and cannot be the start of the 3rd millenium, for instance.
Dates and times are prime examples of modular systems that make the most sense when they start at 0. but most commonly start at 1. Think how stupid it is that the day start at 12 hours then goes back to 1 hour, at least 24 hour clocks do away with this absurdity.
My personal take is that we should not let one short sighted decision 1500 years ago to mess us up and the first century covers from years 0 to 99 and the 21 century 2000 to 2099
I have a database where I tried keeping once per year periodic events(like birthdays or holidays) as postgres intervals, so number_of_months:number_of_days_in_month past start of year or effectively 0 based month-day. this looks weird but solved so many calculation problems I kept it and just convert to more normal 1 based on display. and a shout out to postgres intervals, they do a heroic job at tracking months, a thankless job to program I am sure.
Fun fact: the words Ordinal and Cardinal respectively derives from Ordering and Counting.
So Ordinal quantities represent the ordering of coordinates and can be negative, while Cardinal quantities describes the counting of magnitudes and must be positive.
You can transform between ordinal domains and cardinal domains via the logarithm/exponential function, where cardinal domains have a well-defined “absolute zero” while ordinal domains are translation-invariant.
I don't follow the distinction you're making. The number line is ordered and contains a 0....
The GP's explanation seems more fitting for the year 2000 ambiguity. Are you measuring completed years (celebrate millenium on NYE 2001) or are years the things happening between 0 and 1, 1 and 2, etc (celebrate on 2000, because we're already in the 2000th "gap")?
Thought about this a bit more... Not sure if this is what you're saying, but the concept of "space between" I alluded to seems to arise naturally whenever you have ordered items, and vice versa. Because once you have order you have the concepts "greater than"/"less than", and once you have that you have a border between your items, and your items are between those borders. This connects back to Dijkstra's consideration of <, <=, etc....
That makes sense for the ID system or a database, but for arrays in a language I still prefer starting at 0. It makes frame buffers easier easier to index
I prefer thinking from the first principles, and not according to the current computer architecture fashion.
And BTW that ID system was used in the system processing PBs of data in realtime per day back in the early 2000s, so it’s not that it was super-inefficient.
The center of the debate is that outside of pure mathematics numbers and number systems can only be signifiers for some physical or conceptual object. It is the signified object that determines the meaning of the number and the semantics of the mathematics.
I totally disagree, but it's only my opinion and probably not scientific at all.
From a logical point of view I think it's totally unnatural to start at 1. You have 10 diffferent "chars" available in the decimal system. Starting at 1 mostly leads to counting up to 10. But 10 is already the next "iteration". How do you explain a kid, who's learning arithmetics, that the decimal system is based on 10 numbers and at the same time you always refer to this list from 1 to 10.
I think it's totally natural to start counting at 1, because you start with one of something, not zero. How do you explain to a kid that although they're counting objects, the first one is labelled zero, and that when they've counted 10 objects, they use the number 9?
Python supports a third way: start at -1. And if you think about it a little (but not too much) then there's some real appeal to it in C. If you allocate an array of length n and store its length and rewrite the pointer with (*a+=n)=n, then a[0] is the length, a[-1] is the first element (etc) and you free(a-a[0]) when you're done. As a nice side effect, certain tracing garbage collectors will never touch arrays stored in this manner.
Upshot: if you take the above seriously (proposed by Kelly Boothby), the median proposed by Kelly-Bootle returns to the only sensible choice: zero.
He was right. If the first fencepost is centered at x=0 and the second at x=1, and you want to give the rail in-between some identifier that corresponds to its position (as opposed to giviung it a UUID or calling it "Sam" or something), 0.5 makes perfect sense.
In computer programming we often only need the position of the gap to the left, though, so calling it "the rail that starts at x=0" works. Calling it "the rail that ends at x=1" is alright, I guess, if that's what you really want, but leads to more minus ones when you have to sum collections of things.
I can't find a reference, but I have a vague memory that in original Mac OS X, 1-pixel-width lines drawn at integer locations would be blurred by antialiasing because they were "between" pixels, but lines drawn at e.g. x = 35.5 were sharp, single-pixel lines. Can anyone confirm/refute this?
Perhaps ideally we'd change English to count the "first" entry in a sequence as the "zeroth" item, but the path dependency and the effort required to do that is rather large to say the least.
At least we're not stuck with the Roman "inclusive counting" system that included one extra number in ranges* so that e.g. weeks have "8" days and Sunday is two days before Monday since Monday is itself included in the count.
Hmm, "en 8" makes sense to me in that you're using it to reference the next Whateverday that is at least 8 days apart from now.
If we're on a Tuesday, and I say we're meeting Wednesday in eight, that Wednesday is indeed 8 days away.
Now I'm fascinated by this explanation, which covers the use of 15 as well. I'd always thought of it as an approximation for a half month, which is roughly 15 days, but also two weeks.
To partially answer the other Latin languages, Portuguese also uses "quinze dias" (fifteen days) to mean two weeks. But I don't think there is an equivalent of the "en huit". We'd use "na quarta-feira seguinte" which is equivalent to "le mercredi suivant".
Music only settled on 12 equal tones after a lot of music theory and a lot of compromise. Early instruments often picked a scale and stuck with it, and even if they could produce different scales, early music stuck to a single scale without accidentals for long stretches. Many of these only had 5 or 6 notes, but at the time and place these names were settling down, 7-note scales were common, so we have the 8th note being the doubling of the 1st.
Most beginners still start out thinking in one scale at a time (except perhaps Guitar, which sorta has its own system that's more efficient for playing basic rock). So thinking about music as having 7 notes over a base "tonic" note, plus some allowed modifications to those notes, is still a very useful model.
The problem is that these names percolated down to the intervals. It is silly that a "second" is an interval of length 1. One octave is an 8th, but two octaves is a 15th. Very annoying. However, it still makes sense to number them based on the scale, rather than half-steps: every scale contains one of every interval over the tonic, and you have a few choices, like "minor thirds vs. major thirds" (or what should be "minor seconds vs. major seconds"). It's a lot less obvious that you should* only include either a "fourth" (minor 3rd) or a "fifth" (major 3rd), but not both. I think we got here because we started by referring to notes by where they appear in the scale ("the third note"), and only later started thinking more in terms of intervals, and we wanted "a third over the tonic" to be the same as the third note in the scale. In this case it would have been nice if both started at zero, but that would have been amazing foresight from early music theorists.
* Of course you can do whatever you want -- if it sounds good, do it. But most of the point of these terms (and music theory in general) is communicating with other musicians. Musicians think in scales because not doing so generally just does not sound good. If your song uses a scale that includes both the minor and major third, that's an unusual choice, and unusual choices requiring unusual syntax is a good thing, as it highlights it to other musicians.
> At least we're not stuck with the Roman "inclusive counting" system that included one extra number in ranges* so that e.g. weeks have "8" days and Sunday is two days before Monday since Monday is itself included in the count.
Yes, we are. C gives pointers one past the end of an array meaningful semantics.
That's in the standard. You can compare them and operate on them but not de-reference them.
Amusingly, you're not allowed to go one off the end at the beginning of a C or C++ array. (Although Numerical Recipes in C did it to mimic FORTRAN indices.) So reverse iterators in C++ are not like forward iterators. They're off by 1.
Note that 'first' and 'second' are not etymologically related to one or two, but to 'foremost'. Therefore, it is would make sense to use this sequence of ordinals:
In terms of another thread the item is the "rail" between the "fence posts". The address of the 'first' item starts at 0, but it isn't complete until you've reached the 1.
Where is the first item? Slot 0. How much space does one item take up* (ignoring administrative overheads)? The first and only item takes up 1 space.
The 1980s were not a particularly enlightened time for programming language design; and Dijkstra's opinions seem to carry extra weight mainly because his name has a certain shock and awe factor.
It isn't usual for me to agree with the mathematical convention for notations, but the 1st element of a sequence being denoted with a "1" just seems obviously superior. I'm sure there is a culture that counts their first finger as 0 and I expect they're mocked mercilessly for it by all their neighbours. I've been programming for too long to appreciate it myself, but always assumed it traces back to memory offsets in an array rather than any principled stance because 0-counting sequences represents a crazy choice.
I've heard the statement "Let's just see if starting with 0 or 1 makes the equations and explanations prettier" quite a few times. For example, a sequence <x, f(x), f(f(x)), ...> is easier to look at if a_0 has f applied 0 times, a_1 has f applied 1 time, and so on.
0-based indexing aligns better with how memory actually works, and is therefore more performant, all things being equal.
Assuming `a` is the address of the beginning of the array, the 0-based indexing on the left is equivalent to the memory access on the right (I'm using C syntax here):
The comment you are replying to essentially said exactly that:
> but always assumed it traces back to memory offsets in an array rather than any principled stance because 0-counting sequences represents a crazy choice.
> The 1980s were not a particularly enlightened time for programming language design; and Dijkstra's opinions seem to carry extra weight mainly because his name has a certain shock and awe factor.
Zero based indexing had nothing to do with Dijkstra's opinion but the practical realities of hardware, memory addressing and assembly programming.
> I'm sure there is a culture that counts their first finger as 0
Not a one because zero as a concept was discovered many millenia after humans began counting.
For math too, 0-based indexing is superior. When taking sub-matrices (blocks), with 1-based indexing you have to deal with + 1 and - 1 terms for the element indices. E.g. the third size-4 block of a 16x16 matrix begins at (3-1)*4+1 in 1-based indexing, at 2*4 in 0-based indexing (where the 2 is naturally the 0-indexed block index).
Also, the origin is at 0, not at 1. If you begin at 1, you've already moved some distance away from the origin at the start.
Just speaking anecdotally, I had the impression that math people prefer 1-based indexing. I've heard that Matlab is 1-based because it was written by math majors, rather than CS majors.
Indeed. I was going to point out that mathematicians choose the index based on whatever is convenient for their problem. It could begin at -3, 2, or whatever. I've never heard a mathematician complain that another mathematician is using the "wrong" index. That's something only programmers seem to do.
That's arguably one of the only downsides of zero-based, and can be handled easily with negative indexing. Basically all indexing arithmetic is easier with zero-based.
`l[:n]` gives you the first `n` elements of the list `l`. Ideally `l[-n:]` would give you the last `n` elements - but that doesn't work when `n` is zero.
I believe this is why C# introduced a special "index from end" operator, `^`, so you can refer to the end of the array as `^0`.
> Yes, negative indexing as in e.g. Python (so basically "from the end") can be incredibly convenient and works seamlessly when indexes are 0-based.
I'd claim 0-based indexing actually throws an annoying wrench in that. Consider for instance:
for n in [3, 2, 1, 0]:
start_window = arr[n: n+5]
end_window = arr[-n-5: -n]
The start_window indexing works fine, but end_window fails when n=0 because -0 is just 0, the start of the array, instead of the end. We're effectively missing one "fence-post". It'd work perfectly fine with MatLab-style (1-based, inclusive ranges) indexing.
Pretty much any algorithm that involves mul/div/mod operations on array indexes will naturally use 0-based indexes (i.e. if using 1-based indexes they will have to be converted to/from 0-based to make the math work).
To me this is a far more compelling argument for 0-based indexes than anything I've seen in favor of 1-based indexes.
Both are fine, IMO. In a context where array indexing is pointer plus offset, zero indexing makes a lot of sense, but in a higher level language either is fine. I worked in SmallTalk for a while, which is one indexed, and sometimes it made things easier and sometimes it was a bit inconvenient. It evens out in the end. Besides, in a high level language, manually futzing around with indexing is frequently a code smell; I feel you generally want to use higher level constructs in most cases.
I've always appreciated Ada's approach to arrays. You can create array types and specify both the type of the values and of the index. If zero based makes sense for your use, use that, if something else makes sense use that.
e.g.
type Index is range 1 .. 5;
type My_Int_Array is
array (Index) of My_Int;
It made life pretty nice when working in SPARK if you defined suitable range types for indexes. The proof steps were generally much easier and frequently automatically handled.
Many BASIC dialects had this too, which could make some code a bit easier to read e.g.
DIM X(5 TO 10) AS INTEGER
I recall in one program I made the array indices (-1 TO 0) so I could alternate indexing them with the NOT operator (in QuickBASIC there were only bitwise logical operators).
On the other hand, if you receive an unconstrained array argument (such as S : String, which is an array (Positive range <>) of Character underneath), you are expected to access its elements like this:
S (S'First), S (S'First + 1), S (S'First + 2), …, S (S'Last)
If you write S (1) etc. instead, the code is less general and will only work for subarrays that start at the first element of the underlying array.
So effectively, indexing is zero-based for most code.
I think lower..higher index ranges for arrays were used in Algol-68, PL/1, and Pascal long before Ada
At least in standard Pascal arrays with different index ranges were of different incompatible types, so it was hard to write reusable code, like sort or binary search. The solution was either parameterized types or proprietary language extensions
I found it devastating that there are no distinct agreed-upon words denoting zero- and one-based addressing. Initially I thought that the word "index" clearly denotes zero-base, and for one-base there is "order", "position", "rank" or some other word, but after rather painful and humiliating research I stood corrected. ("Index" is really used in both meanings, and without prior knowledge of the context, there is really no way to tell what base it refers to.)
So to be clear, we have to tediously specify "z e r o - b a s e d " or "o n e - b a s e d" every single time to avoid confusion. (Is there a chance for creating some new, terser consensus here?)
I like this. I feel like "offset" hints at the reason for starting at 0. "How far do you have to offset your feet (from the beginning of whatever space we're talking about) before you're touching the thing in question?" If it's the first thing, you don't have to move at all, so zero offset
Of course you could also "offset your feet" until they're past the end of the last thing, and then you've counted the number of things. But the offset of the thing itself (as opposed to that of your feet) could be considered zero, assuming the natural position of the thing is for its left edge to be at the left edge of the space.
But maybe its natural position is to be centered at x=0 and it had to be moved by 0.5 for the left edges to line up, in which case see my other comment.
In any case, I think the argument over 0 or 1 or 0.5-based indexing can be resolved just by being clear about what it is you're counting.
That sounds reasonable at first, but humans are messy and so the distinction is not always clear.
For example, in music we use numbers for the notes of a scale (G is the 5th of C, because in the C major scale: C=1, D=2, E=3, F=4, G=5, A=6, B=7). The numbers are clearly indices for the notes of the scale.
But we often think of stuff like: what's the 3rd of the 5th -- that is, get the note at the 5th position (G in our example) and then get the 3rd of that note (the 3rd of G is B). But note that B is the 7th of C major, not the 8th you'd get from adding 5 and 3.
The problem, of course, is that we ended up using the numbers as offsets, even though they started as indices.
Yes, that's the point: the numbers are used as both indices and offsets. To any musician or music student, moving from G to B is "going up a (major) third", even though it's obviously going up by 2 notes. The name of that offset ("interval" in music speak) is "third", even though it has a distance of 2 notes.
My point (more generally) is that even though it looks reasonable to make indices start from 1 and offsets from 0, in practice these things can get mixed together. It's not reasonable to get people to use two different numbers for what they see as the same thing (because their use got mixed).
I don't think "index" by itself should imply any starting value. After all many induces start at higher numbers and then you'd have to invent words for 2-based, 3-based and so on as well.
This article is one my pet peeves. It always shows up in discussions as "proof" that 0 indexing is superior, but it hides under the carpet all the cases the where it is not. For instance, backwards iteration needs a "-1" and breaks with unsigned ints.
Always beware the word should. I agree with Dijkstra's logic in the context that he presents it, but there are other contexts where I don't think it applies.
Personally, I find that in compiler writing, which is the only programming I do these days, the only things I use indexes for are line numbers and character offsets into strings. Calling the first character the zeroth character is ridiculous to me, so I just store a leading 0 byte in all strings and then can use one based indexing with no performance hit. Alternatively, since I am the compiler writer, I could just internally store the pointer to the string - 1 to avoid the < 1 byte per string average overhead (I also machine word align strings so often the leading zero doesn't affect the string size in memory).
If you are often directly working with array indices, you are likely doing low level programming. It is worth asking if the task at hand requires that, or if you would be better off using higher level constructs and/or a higher level language. Low level details ideally should not leak into the application level.
Not only is this preference not restricted to compiler programming, but it's not even restricted to programming.
Try to count 4 seconds, if you start at 1 you messed up.
Babies start at 0 years old. Etc..
I do agree it's a convention though. Months and years start at 1, but especially for years, only intervals are meaningful, so it doesn't really matter what zero is (even though christ is totally king)
In the Boole Library in Ireland (which has entrances on different floors) they use an algebraic (affine) system. There is a floor designated "Q" and then other floors are labelled relatively, "Q+2", "Q-1", etc.
Using the insight from the top comment that "it all depends on whether u're counting the items themselves (1-based) or the spaces btwn them," the American way of numbering floors is based on counting actual floors (ie, the things you stand on) -- and the one at earth level is one floor. If you go up a flight of stairs, there is second floor to stand on, and so on.
For buildings that go underground, the "-" sign can now act as a signifier of being underground, and the counting works as normal. If you take the stairs down one level, you are on the first underground floor, -1.
Of course, you want to interpret it like a y-axis number line, where 0 is the earth, 1 is "1 floor unit" above the earth, -1 is "1 floor unit" below the earth, etc. This is the "space between" model.
Elegance aside, both can be viewed as logically consistent depending on your lense.
At the University of Arizona (or at least in most of the buildings there), the lowest floor of the building is always 1, even if it’s a basement. So the ground floor is often 2. Maddening.
1-based numbering is nonsense. How many years old are you when you’re born?
I notice almost all defenses of 1-based indexing are purely based on arbitrary cultural factors or historical conventions, e.g. “that’s how it’s done in math”, rather than logical arguments.
You have lived zero full years and are in the first year of your life. In most (but not all) countries the former is considered "your age".
That's consistent with both zero-based and one-based indexing. Both agree on cardinal numbers (an array [1, 2] has length 2), just not on ordinal numbers (whether the 1 in that array is the "first" or "zeroth" element).
> I notice almost all defenses of 1-based indexing are purely based on arbitrary cultural factors or historical conventions, e.g. “that’s how it’s done in math”, rather than logical arguments.
I think it's largely a matter of taste in either direction. But, I'd raise this challenge:
If you're unfamiliar with Python (zero-based, half-open ranges exclusive of end), that's taking a slice from index 3 to index 1, backwards (step -1). How quickly can you intuit what it'll print?
Personally I feel like I have to go through a few steps of reasoning to reach the right answer - even despite having almost exclusively used languages with 0-based indexing. If Python were instead to use MatLab-style indexing (one-based, inclusive ranges), I could immediately say ['C', 'B', 'A'].
Both agree on cardinal numbers (an array [1, 2] has length 2), just not on ordinal numbers (whether the 1 in that array is the "first" or "zeroth" element).
I think whenever people say "zeroth" they speak in jest and doubt that there is any disagreement on the fact that the element without predecessor (nowadays in programming most often assigned index 0) is the first element.
You used "first" in that sense naturally just in the sentence before without the slightest notion of ambiguity.
You have lived zero full years and are in the first year of your life.
What people wrongly get riled up with is the fact that ordinal (first) is not in sync with the cardinal (index 0), but it rarely is anyways. If you go to an administrative office and pull your number 5357 no one assumes that there are 5356 people in the queue before them. Your are still the 5th or 10th in line, even if your index is 5357.
> I think whenever people say "zeroth" they speak in jest and doubt that there is any disagreement on the fact that the element without predecessor (nowadays in programming most often assigned index 0) is the first element.
"Zeroth" sounds silly because English has generally settled on one-based indexing, and so typically you'd convert to one-based indexing when speaking out loud or writing something other than code (`users[3]` is the "4th user").
Maybe you could argue that the words "first"/"second", as just some letters/sounds, are not inherently one-based? But I feel that gets a bit dubious from "third" onwards where the connection to numbers is obvious, and with the ordinals frequently spelled as 1st/2nd/etc.
But, if you want (and if I'm understanding what you mean), you could replace my statement with: "whether the '1' in that array is the index-1 or index-0 element" - purpose was just to distinguish between ordinals and cardinals, in response to a comment that seemingly implied the cardinality of an empty collection would be 1 with one-based-indexing.
I don't think it's "wrong" per se - it's zero-based indexing and ranges are consistently inclusive of start, exclusive of end. But, it breaks the "fence-post" mental model that some people use, and is less intuitive than MatLab's indexing approach IMO.
I do. I think it only makes sense for ranges to be inclusive at the lower end and exclusive at the higher end. A slice with a negative step `-n` should contain the same elements as the same slice as with step `n`, just reversed.
Consider the following from the Irish Constitution:
> 12.4.1° Gach saoránach ag a bhfuil cúig bliana tríochad slán, is intofa chun oifig an Uachtaráin é.
and the official translation to English:
> 12.4.1° Every citizen who has reached his thirty-fifth year of age is eligible for election to the office of President.
For those unfortunate few who do not understand Irish, that version says "Every citizen who is at least thirty-five years old", whereas the translation should in principle (arguably) allow a thirty-four-year-old.
Luckily the Irish version takes precedence legally. A constitutional amendment which would have lowered the minimum age to 21 and resolved the discrepancy was inexplicably rejected by the electorate in 2015.
The key word in "birthday" is "birth". Not conception.
For human cultural purposes, you are 0 days old at the time and date recorded for your birth. It goes on your birth certificate. If you find a culture that celebrates conceptiondays and issues conception certificates, let me know.
The comment I’m replying said nothing about “birthday”. They asked how many years old you are when you’re born, which is ambiguous.
You know some cultures don’t even celebrate birthdays, right? It’s not even uncommon for people to not know their date of birth. Even my own grandmother, born in Chicago in the 1920s, didn’t even know exactly what year she was born in.
When you're born. That is your birth day, because the person to whom you were born gave birth to you that day.
No culture I know of tracks the date of your conception as a measure of your age. If they measure it at all, the starting point is your birth.
You are zero days old on the day of your birth. Even if those cultures think you are "in the first year of your birth" for days 0-365 of your existence, they would accept that 1 day after your birth, you are 1 day old, not 9 months old.
You would, if English were logical as opposed to being a randomly-evolved cultural practice. With computer languages we have the opportunity to fix the mistakes of the past and do better.
The question arises when people get confused between a cut and span. These two are opposite concepts, and they make up a continuum, and they define each other.
So, it depends on what you understand as "numbering". If it is about counting objects, the word "first object" refers to existence of non-zero number of objects. This shows why the first one can't be called as zero, as zero is not equal to non-zero.
If the numbering is about continuous scale such as tape measure, then the graduations can start with zero. But still the first meter refers to one meter, not zero meters.
It looks silly when people have their book chapters numbering to begin with zero. They have no clue whether the chapter refers to a span or a cut. Sure, they can call the top surface of their book cover as zero, though. But still they can't number a page as zero.
The use of zero index for memory location comes from possible magnetic states of array of bits. Each such state is a kind of a cut, not a span. It's like a graduation on the tape measure, or mile stone on the side of the road. So it can start with zero.
So, if you are counting markers or separators, with zero magnitude, you can start at zero. And when you count spans or things of non-zero magnitude, you start at one. If you count apples, start at one. If you count spaces between apples start at zero.
I see people bringing up arrays, and an array index is represented by a number, you can do math on it, but it's not a regular number for counting a sequence of items. It's a unique reference to a location in the memory, and it's dangerous to treat an array index like it's just any old number.
Behold, the really stupid things you can do in Javascript:
let myArr = [];
let index = 0;
myArr[--index] = 5;
console.log(myArr.length); // 0
console.log(myArr[index]); // 5
I recently picked up Lua for a toy project and I got to say that decades of training with 0-based indexes makes it hard for me to write correct lua code on the first try.
I suppose 1-based index is more logical, but decades of programming languages choosing 0-based index is hard to ignore.
> decades of programming languages choosing 0-based index is hard to ignore
Yes - many file formats also work with zero-based indices, not to mention the hardware itself.
One-based indexing is particularly problematic in Lua, because the language is designed to interoperate closely with C - so you're frequently switching between indexing schemes.
Interestingly, it also poses great challenges for LLMs. GPT-4 can translate Perl into Python almost flawlessly, but its Lua is full of off-by-one errors.
It just matches the convention used in the language that one has learned as a child, so one is already familiar with it.
The association between ordinal numbers and cardinal numbers such that "first" corresponds to "one" has its origin in the custom of counting by uttering the string "one, two, three ..." while pointing in turn to each object of the sequence of objects that are counted.
A more rigorous way of counting is to point with the hand not to an object, but to the space between 2 successive objects, when it becomes more clear that the number that is spoken is the number of objects located on one side of the hand.
In this case, it becomes more obvious that the ordinal position of an object can be identified either by the number spoken when the counting hand was positioned at its right or by the number spoken when the counting hand was positioned at its left, i.e. either "0" or "1" may be chosen to correspond to "first".
Both choices are valid and they are mostly equivalent, similarly to the choice between little-endian and big-endian number representation. Nevertheles, exactly like little-endian has a few advantages for some operations, so eventually it has mostly replaced big-endian representations, the choice of "0" for "first" has a few advantages and it is good that it has mostly replaced the "1 is first" convention.
For people who use only high-level programming languages, the differences between "0 is first" and "1 is first" are less visible, exactly like the differences between little-endian and big-endian. In both cases the differences are much more apparent for compiler writers or hardware implementers.
Besides "1 is first" vs. "0 is first" and little-endian vs. big-endian, there exists another similar choice, how to map the locations in a multi-dimensional array to memory addresses. There is the C array order and the Fortran array order (where elements of the same column are at successive addresses).
Exactly like "1 is first" and big-endian numbers match the conventions used in writing the European languages, the C array order also matches the convention used in the traditional printed mathematical literature.
However, exactly like in the other 2 cases, the opposite convention to the traditional written texts, i.e. the Fortran array order is the superior convention from the point-of-view of the implementation efficiency. Unfortunately, because much less people are familiar with the implementation of linear algebra than with the simpler operations with numbers and uni-dimensional arrays or strings, so they are not aware about the advantages and disadvantages of each choice, the Fortran array order is used only in a minority of programming languages. (An example of why the Fortran order is better is the matrix-vector product, which must never be implemented with scalar products as misleadingly defined in textbooks, but with AXPY operations, which are done with sequential memory accesses when the Fortran order is used, but which require strided memory accesses if the C order is used. There are workarounds when the C order is used, but with the Fortran order it is always simpler.)
I have a similar experience with pythons negative-indexing. In Python, you can access elements counting from the back by using negative numbers. But for this, they start with 1, not 0. Which is inconsistent, as they start for the normal forward indexing at 0. I guess it comes from reducing n.length-1 to -1, but it's still kinda annoying to have two different indexing-systems at work.
Your visualization makes sense if you always count them going from left to right. But with negative index you naturally count them going from right to left, from the last element backward to the first element. So -0 is the natural starting-point, except it's -1 in python.
> Your visualization makes sense if you always count them going from left to right.
Yes - personally, I do always count them left-to-right.
I don't think negative indices implies counting right-to-left. A negative step does, but I never use one because IMO it doesn't make sense to have an exclusive LHS and inclusive RHS.
Indexing is not really math, it's a reference-system which is using numbers for convenience. You could use letters or even emojis to get the same result.
Yes, but Python is not math. This is a syntax-feature we are talking about, math is here just a tool, not the purpose. And it's also not running on hardware, but multiple layers higher.
I'm proposing nothing. I'm pointing out a flaw, for me. Nobody will change this at this point. Nobody should change it at this point. There is no benefit in this, just harm.
Values take up space. When you manipulate a value or pass it around, it makes no sense to sometimes refer to the beginning of the value and sometimes to the end.
It makes a little more sense if you have an array of something other than plain integers. Let's say you have 2-tuples:
Now -3 clearly refers to the beginning of an array element. At least I wouldn't expect -3 to refer to (1, 1), even though I'm mentally traversing right to left for negative indexes.
Or another way to think about it: arr[5] does not exist in the above example. It's the end of the array, and the end is exclusive. Negative indexes count from the end. -0, as a result, refers to the (unmodified) end, which is the nonexistent thing, same as arr[5].
And yet another way: think of positive indexes as going forward, negative as going back. Imagine a syntax arr[3][-2] where arr[3] gives you the subarray starting at offset 3. (In C or C++, this would be like (&arr[3])[-2] with an array type that supported negative indexes, which implies it tracks subarray length.) Where should you end up? Start with the simpler case of arr[3][-0] -- clearly that should be the same as arr[3], not arr[2], if you are "going back 0". And if you're starting out with 0-based indexes, then the "going forward"/"going back" interpretation is inescapable.
As a bonus, arr[-n] is the same as arr[arr.length() - n]. But that's just a lucky happenstance; I wouldn't argue that the semantics of negative indexes should depend on it. Well... one could argue that arr[arr.length()] is the (nonexistent, exclusive) end.
Can visualize it as wrapping to the sequence's other side. That is you start at elem 0 and going backwards to -1 gets you to the other side (up to -len(seq) that returns to elem 0). Kinda like border wrapping in most modern Snake variants. Although this is only for negative indexing.
I don't particularly like negative indexing, but if we assume that -0 represents the address of the first element past the end of the array (the logical "end" of the array's span in memory), then -1 is naturally the starting address of the last element in the array, as measured in its offset from the end.
I don't use Lua as much anymore, but there were a few years where I used Lua and C++ both daily and very quickly you can easily handle both zero and one-indexing, even while switching between languages frequently. As with most things it's just practice.
It's interesting how most people find learning 0-based indexes confusing, but after a few years of programming, they don't even notice how odd it is.
How do you number things in real life? If you have two options, do you number them "option 0" and "option 1" or "option 1" and "option 2"? If you create a presentation with numbered bullet points, do you start numbering them from 0 or 1?
1. This is my first point
2. This is my second point
It would be odd to have your first point be point number 0 and your second point be point number 1, wouldn't it?
Outside of programming, even most programmers use zero-based indexing only when they're making a programming joke.
Zero-based indices create odd effects even in programming, although we don't really notice them anymore.
$array has 3 entries, but your for-loop says you should stop iterating before you reach 3. This isn't really consistent with how we intuitively understand numbers to work outside of programming.
if you see how its implemented in machine code (on most modern archs) you will see how it is logical. once you see that, you cannot unsee it.
people call these things an index, but really it is an offset from the base of the array. not an index into it. hence, base + 0 is the first entry. as the offset to that entry is 0. thats how it will work in generated machine code on all the machines i saw (i did not see all of them obviously).
i think people struggle with these concepts because they never bother to see whats under the hood. a bit of assumption ofc on my part i cant see into ppls minds nor do i know all architectures. x86 and x64 (amd/intel) definitely work like this.
Yes, it's the offset from a pointer in some languages. But to the user, it's presented as an index, not an offset. In normal syntax, you don't access an array value using a construct like
*(arraypointer+offset)
If you did that, then using 0 as the start offset would make intuitive sense.
In fact, if not for this technical reason, I doubt that any programming language would have 0-based indices.
im not debating the user level stuff. honestly i think there's fair argument for not letting users get bogged down in system level details. but in my opinion, if people dont understand this why, its a useful thing to learn. its only a small detail in the end.
i think, because software is built up abstractions over abstractions, its unavoidable such details creep into languages from the depths of the system. its not too long ago people didnt even have C or such high level constructs. so coming from there and building up its logical. Now, making something new, from above, one might put 1 as an index for the first element, but its likely during writing of a language you'll end up in the depths and come up with 0 anyway.
think of how a 1 index will work vs a 0 index on ptrs. (pseudo code)
ptr += index x size_of_object
ptr += (index-1) x size_of_object
if you wanna run that on the cpu, either compiler needs to come up with the first one or it will have an additional sub or decrement?
or do you want the compiler to have a duty of doing the index-1, it will result in the original code again, also possible. it will take extra compiler or interpreter time.
I don't disagree with anything you say. I'm not saying that C should have 1-based indices, for example. I'm only saying that from a language design perspective, ignoring technical limitations and historical precedents, 1-based indices would be preferable.
Are there any languages that call the first element the zeroth element in everyday speech? I can't think of one, Google can't come up with one, and neither can ChatGPT. This isn't just cultural; it's universal or almost universal.
That's a good example! Elevators sometimes have a 0 for the ground floor, but they often have an "E" in German-speaking countries or a "C" in English-speaking countries.
In this example, people also call the floor with index 1 the "first floor," although they don't call the ground floor the zeroth floor, as you say.
Since floors can go below ground, the first underground floor is floor -1, so everything works out. There's a floor for every number, unlike with 1-based floor numbering, where floors go from -1 to 1.
By the way, has anyone else had this problem: you're on floor 10, say, and you want to get to floor 15, say, so you run up 5 flights of stairs and try to find the room but then after a while you realise that somehow you've ended up on floor 16 so you think you're going demented and can't count but after this has happened a few times you realise ... THE IDIOTS HAVE OMITTED FLOOR 13!
If there's one thing worse than 1-based indexing it's leaving-out-13-based indexing ... Has any programming language tried doing that with arrays, I wonder?
Somebody should create a programming language that implements all real-world number idiosyncracies. Don't have 4 or 13; define π as 3 and τ as 6, print 6 three times every time it occurs for good luck (or bad luck, depending on where you live), replace all numbers close to 8 with 8 in money values for great prosperity, have a constant for "dozen" that's 13 in case you're counting loaves of bread, have a rounding function that rounds to a close number that's easy to say depending on your locale...
And let's go with 0.5-based indexing as a compromise.
Your arguments using "first" and "second" are invalid, because those words have nothing to do with numbers, like also "last".
If you argue based on English, you should point only to the ordinals "third", "fourth" and the like.
In the older European languages, the initial position in a sequence was named using words like "first", which mean "closest to the front". The next position was named with words meaning "the other", e.g. "alter" in Latin (the old European languages had 2 distinct words for "the other from two" and "another one from many", e.g. "alter/alius" in Latin).
However when the need for naming other positions in a sequence besides the first, the second, the last or the next to the last (penultimate) has appeared, then the ordinals derived from numbers have been invented, like third, fourth etc.
In later Latin, the word meaning "the other" has been replaced with a word meaning "the following". This has been inherited in the Romance languages and it has been borrowed in English as "second". Also in Old English, "other" had been used for "second", before "second" was taken from French.
> Your arguments using "first" and "second" are invalid,
> because those words have nothing to do with numbers
"First" is the ordinal number corresponding to the number one, while "second" is the ordinal number corresponding to the number two. You can represent "first" as 1st and "second" as 2nd. I believe the origins of these two words do not significantly impact my argument.
Nothing in the word "first" indicates that it corresponds to "one". The same for "second". Those words are not numerals, "first" is a superlative adjective, while "second" is an active participle. They are perceived as ordinal numerals only because English does not have ordinal numerals for the 2 initial positions of a sequence and "first" and "second" are used instead of the missing ordinal numerals.
The abbreviations "1st" and "2nd" are very recent and they cannot be used as an argument that there is a long tradition of correspondence between "first" and "one".
Like I have said, the correct argument based on English is that the position after the second is called "third", which is derived from "three", and the next position is called "fourth" from "4", so extrapolating backwards that sequence results in decrementing "3", which gives a correspondence between "2" and "second", and decrementing the number once more gives a correspondence between "1" and "first".
This is the exact reasoning that has lead to the abbreviations "1st" and "2nd", which have no relationship with the pronunciation or the meaning of the abbreviated words, which mean "closest to the front" and "the following", meanings that are unrelated to any numbers.
It doesn't matter whether there is a long tradition of correspondence between first and 1 and second and 2. What matters is that the correspondence exists today, because I'm making my argument today.
Das erste Element in dieser Aufzählung ist jedoch null, und das zweite ist eins. Deutsche verwenden einsbasierte Indizes, genau wie die Amerikaner, auch wenn sie manchmal bei null zu zählen beginnen.
(The first element in this list, however, is zero, and the second is one. Germans use one-based indices, just like Americans, even though they sometimes start counting from zero.)
It is easy to miss that his argument boils down to that zero-based is “nicer” in a specific select case. The paper is written in the style of a mathematical proof but hinges on a completely subjective opinion.
>> when starting with subscript 1, the subscript range 1 ≤ i < N+1; starting with 0, however, gives the nicer range 0 ≤ i < N.
What about the range 0 < i ≤ N which starts with 1? Why only use ≤ on the lower end of the range? This zero-based vs one-based tends to come up in programming and mathematics, and both are used in both areas. Isn't it obvious that there is no universally correct way to index things?
I believe the main argument (from the OP) is that you have to specify the range with two bounds, and that it is common to want a 0 (assuming a 0-based indexing world), and so in order to refer to a range that includes index 0 you'll need to use a number that is not in the set of valid indexes to define the bound.
I would note that the argument is weakened when you look at the later bound, since you have the same problem there, it's just more subtle and less commonly encountered -- yet it routinely creates security bugs!
It's because we don't work with integers, we work with fixed-size intervals within the set of integers (usually a power of two consecutive integers). So `for (i = 0; i < 256; i++)` is just weird when you're using 8-bit integers: your upper range is an inexpressible value, and could easily be compiled down to `for (i = 0; i < 0; i++)` with two's complement, eg if you did `uint8_t upper = 256; for (uint8_t i = 0; i < upper; i++)`. That case is simple, but it gets nastier when you are trying to properly check for overflow in advance and the actual upper value is computed. `if (n >= LIMIT) { return error; }` doesn't work if your LIMIT is based on the representable range. Nor does `if (n * elementSize >= LIMIT) { return error; }`. Even doing `limit = LIMIT / elementSize; if (n >= limit) { return error; }` requires doing the `LIMIT / elementSize` intermediate calculation in larger-width numbers. (In addition to the off-by-one if LIMIT is not evenly divisible by elementSize.)
So when dealing with overflow checks, 0 ≤ i ≤ N may be better. Well, a little better. `for (i = 0; i <= LIMIT; i++)` could easily be an infinite loop if LIMIT is the largest in-domain value. You want `i = 0; while (true) do { ...stuff...; if (i == LIMIT) break; i++; }` and at that point, you've lost all simple correspondence with mathematical ranges.
> Isn't it obvious that there is no universally correct way to index things?
I don't know about "obvious", but I agree that there is no universally correct way to index things.
"The above has been triggered by a recent incident, when, in an emotional outburst, one of my mathematical colleagues at the University —not a computing scientist— accused a number of younger computing scientists of "pedantry" because —as they do by habit— they started numbering at zero. "
Same in Germany, just that we usually call it ground floor instead of 0th floor.
You could argue it's a bit of a translation error. The French and German words for floor are referring to ways to add platforms above ground. Either by referring to walls, wooden columns or floor joists. Over the course of language evolution those words have both broadened and specialized, referring to building levels in general. But the way they are counted still reflects that they originally refer to levels built above ground. The English "floor" on the other hand counts the number of levels that are ground-like, which naturally starts at the actual ground.
Zero means nothing (not that it has no importance :-) but that it symbolises the void). So the symbol 0 could be also a single space or any other predetermined. So, it is not a number and should not be used like one (pun intended)
Perhaps we can extend this to everyday language? Taylor Swift had a number zero hit, ones company, two's a crowd, I won the race and came in at number zero and so on?
I appreciate Dijkstra's arguments, but the fact remains that no non-technical user is ever going to jibe with a zero-indexed system, no matter the technical merits.
Languages aimed at casual audiences (e.g. scripting languages like Lua) should maybe just provide two different ways of indexing into arrays: an `offset` method that's zero-indexed, and an `item` method that's one-indexed. Let users pick, in a way that's mildly less confusing than languages that let you override the behavior of the indexing operator (an operator which really doesn't particularly need to exist in a world where iterators are commonplace).
Dijkstra's objective was to make programming into an intellectually respectable, rigorous branch of mathematics, a motivation he mentions obliquely here. He was, generally speaking, opposed to non-technical programmers and languages aimed at casual audiences, such as BASIC and APL; they were at best irrelevant to his goal and at worst (I suspect, though he never said) threats to its credibility.
A lot of Lua users are kids playing Luanti or Roblox or WoW who also spend a little time modding them—mostly editing textures or 3-D model meshes, but also scripting. Lua is a small and simple language which can be learned easily, prefers to produce incorrect answers instead of throwing exceptions when confronted with ambiguous situations (for example, permitting undeclared variables), has an interactive REPL, is memory-safe to avoid crashes, and uses dynamic typing (thus avoiding type declarations) and has garbage collection, as well as using 1-based indexing.
All of these design features seem to be helpful to casual programmers and are common in languages and programming environments designed for them, such as BASIC, Smalltalk, sh, Python, Tcl, and Microsoft Excel.
> And at that time, the only other option would be Tcl, "tickle.” But we figured out that Tcl was not easy for nonprogrammers to use. And Lua, since the beginning, was designed for technical people, but not professional programmers only.In the beginning, the typical users of Lua, were civil engineers, geologists, people with some technical background, but not professional programmers. And "Tcl" was really, really difficult for a non-programmer. All those substitutions, all those notions, etc. So we decided to create a language because we actually needed it.
(Tcl, of course, was designed for chip designers.)
Starting from zero saves memory.
If I have a variable used as an index for an array of 256 elements, starting from 0 allows me to store it in a single byte. If I start from 1, I need two bytes, effectively doubling the memory usage—an unnecessary 100% increase.
Now, multiply this inefficiency across every instance where programs encounter a similar situation.
This discrepancy appears in physics too. It's common to use 1,2,3 for spatial indices, but when you reach enlightenment and think in terms of spacetime you add a zero index and not a four.
That's only because you insist on explicitly mentioning the last element, which you can only do when the sequence is finite and non-empty (more generally, when it is indexed by a successor ordinal). So your choice of notation is not only inelegant, it cannot even express all possible sequences.
Hard ask considering there's effectively 2 Americas: only the scientific one using scaleable units like mg/g/kg, cm/m/km- everyone else using randomized trash... ft, mile, yard, inch, pound....
I don't know of evidence that he did. But Dijkstra left us a famous quote:
"LISP has jokingly been described as “the most intelligent way to misuse a computer”. I think that description a great compliment because it transmits the full flavour of liberation: it has assisted a number of our most gifted fellow humans in thinking previously impossible thoughts."
This is obviously a compliment; it even mentions that word.
Even a less positive remark than this would still be resounding compliment from a computer scientist who said things such as that BASIC causes irreparable brain damage!
So count this as a piece of evidence that he liked Lisp.
Lisp emphasizes structured approaches, and from the start it has encouraged (though not required) techniques which avoid destructive manipulation. There is a lot in Lisp to appeal to someone with a mindset similar to Dijkstra.
"I must confess that I was very slow on appreciating LISP’s merits. My first introduction was via a paper that defined the semantics of LISP in terms of LISP, I did not see how that could make sense, I rejected the paper and LISP with it."
Even McCarthy initially rejected the idea that the Lisp-in-Lisp specification could simply be translated into working code so that an interpreter pops out; at first he thought Steve Russel was misunderstanding something.
I know I'll get downvoted to Hell for this, but I have a mental list of traits poor programmer's have, and one of them is "Excessively complains about 1 based indexing".
For instance, when dealing with memory words, do u address the word itself or its starting location (the first byte)?
The same consideration applies to coordinate systems: r u positioning urself at the center of a pixel or at the pixel's origin?
Almost 2 decades ago I designed an hierarchical IDs system (similar to OIDs[1]). First I made it 0-based for each index in the path. After a while I understood that I need to be able to represent invalid/dummy IDs also, so I used -1 for that. It was ugly - so I made it 1-based, & any 0 index would've made the entire ID - invalid
---
1. https://en.wikipedia.org/wiki/Object_identifier