I'd put the blame on languages that don't allow exceptions, and whose return val...

Sharlin · on Nov 6, 2022

When you’re talking indices, you should NEVER use int, unsigned or not. The world is 64-bit these days and int is stuck at 32 bits almost everywhere. And even on 32-bit systems indexing with unsigned int may not be safe unless you think about overflow, as this bug demonstrates (at least unsigned overflow is not immediate UB in C and C++ like signed overflow is…)

C has size_t. Use it.

a1369209993 · on Nov 6, 2022

To be fair, size_t doesn't solve this particular problem; you also need to use correct array slice representation (ptr,len) not (start,end), and calculate the midpoint accordingly (ie (ptr,len/2) or (ptr+len/2,len-len/2)).

(And because C doesn't mandate correct handling of benign undefined behavior, you still have a problem if you `return ptr-orig_ptr` as a size_t offset (rather than returning the final ptr directly), because pointer subtraction is specified as producing ptrdiff_t (rather than size_t), which can 'overflow' for large arrays, despite that it's immediatedly converted back to a correct value of size_t.)

flupe · on Nov 6, 2022

The problem is not solved by using unsigned ints though, because it stems from integer overflow. I'm afraid your implementations are, alas, also incorrect.

dataflow · on Nov 6, 2022

Confused, how does using unsigned integers not solve this particular problem? Doesn't the article itself show solutions with unsigned integers?

a1369209993 · on Nov 6, 2022

Example using 16-bit size_t for convenience:

  char array[60000]; // 5KB left for code and stack if not segmented
  size_t i = 40000;
  size_t j = 50000;
  size_t mid = (i+j)/2; // should be 45000
  // i+j = (size_t)90000 = 24464
  // mid = 24464/2 = 12232 != 45000

Larger integers make the necessary array size bigger, but don't change the overall issue.

Filligree · on Nov 6, 2022

Unsigned int is 32-bit, most address spaces are 48-bit or more.

dataflow · on Nov 6, 2022

Array.length is 32-bit in Java. This is from 2006.

hoosieree · on Nov 6, 2022

Or you can return an unsigned int which is the highest valid index+1.

    ['a','b','c'].indexof('b') == 1 // found - return index
    ['a','b','c'].indexof('w') == 3 // not found - return size of array

pharmakom · on Nov 6, 2022

There should be a pointer type that is not an int and whose size depends on the host platform.

Casts from int to pointer should be explicit.

We are enamoured with programmer convenience at the expense of the safety of our systems. It’s unprofessional and we should all aim to fix it.

Sharlin · on Nov 6, 2022

The type that’s meant for indexing in C and C++ is called `size_t`. It is pointer-sized. In Rust it’s called `usize` and Rust does not have implicit conversions, so if you accidentally use too narrow an integer type to compute an index, at least Rust forces you to add an explicit cast somewhere.

tinyspacewizard · on Nov 6, 2022

I've seen libraries that add a size_t type as an alias to int on certain systems. Rust gets it right here.

orangepurple · on Nov 6, 2022

Seeing the new erroneous assumption being made in this post reminded me of Linus Torvalds' rant about C++ http://harmful.cat-v.org/software/c++/linus

pclmulqdq · on Nov 6, 2022

For most practical purposes, an int64 index can include a universe of negative return codes with no loss of functionality.

The problems here are about using integers that are too narrow and not properly doing arithmetic to prevent overflow from impacting the result.

dataflow · on Nov 6, 2022

> For most practical purposes, an int64 index can include a universe of negative return codes with no loss of functionality.

Isn't this article a counterexample to that? Where using signed instead of unsigned actually does result in a loss of functionality?

pclmulqdq · on Nov 6, 2022

No. This article explicitly mentions the "int" type, which in C, C++, and Java is 32 bits long. 32-bit ints are not large enough for this purpose: they can only index 2 billion items directly (which will overflow a lot given that a standard server now has 256-512 GB of RAM), and this average calculation hits problems at around 1 billion items. Overflows on 64-bit ints (when used to store 63-bit unsigned numbers) are not going to happen for a very long time.

dataflow · on Nov 6, 2022

Wasn't Array.length 32-bit on Java when the article was written? In fact, isn't it 32-bit even now?

Moreover I don't see how you deny that using signed would lose functionality in this case—it's pretty undeniable that it gives the wrong answer in cases where unsigned would give the correct answer; the code is right in the article and you can test it out. This is true irrespective of any other cases that neither might handle correctly (assuming you believe any exist, but see my previous paragraph).

pclmulqdq · on Nov 6, 2022

I didn't say that a signed int would be fine. I said that a signed 64-bit int would be fine.

Moreover, it is trivial to convert from a 32-bit signed or unsigned type to a 64-bit int, so you are not constrained by the size type of Java.

leni536 · on Nov 6, 2022

The naive (x+y)/2 returns the wrong number for x=UINT_MAX and y=UINT_MAX, for a trivial counter example.

lamp987 · on Nov 6, 2022

"Because, whenever you're talking indices, you should ALWAYS use unsigned int."

sounds like a lot of your code is in fact broken...

dataflow · on Nov 6, 2022

I think it'd be nice if you give some examples of how using unsigned integers for indices breaks code in cases where signed integers don't, because otherwise your comment is very unilluminating.

lamp987 · on Nov 19, 2022

you need to google why size_t exists. size_t guarantees you to be able to represent any array index. unsigned int can be arbitrarily tiny and a bigger loop may cause unsigned overflow during your loop on some architectures. in other words, your code would be broken. size_t will make it work correctly everywhere. it is the correct way of representing array indices.

fulafel · on Nov 6, 2022

C & C++ allow exceptions on signed integer overflow.