Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Does the algorithm described for finding a number in an unsorted array using minimal checks really work? Probably is addressed more in the book but didn't see my concerns in the post.

The algorithm I am referring to is: " 1. Check if the current item is the desired one. If it is, return success if the pointer is within the array bound and return failure if it isn’t; otherwise 2. Increment the pointer and continue. "

I think it assumes that all values will be found in memory somewhere (otherwise an exception will be raised for trying to read memory at an address that does not exist). This is probably the case in practice but I don't think there are guarantees of it being the case.

Also, this algorithm can have lower worse-case performance than a more naive solution because its performance is based on the number of possible values (assuming random distribution) rather than the size of the array for cases where a value cannot be found.



Read the sentence before it: ”Tack the desired item on to the end of the array, then start your pointer at the head of the array and do the following in a loop”

(https://paws.kettering.edu/~jhuggins/humor/elephants.html: ”Experienced computer programmers modify Algorithm A by placing a known elephant in Cairo to ensure that the algorithm will terminate.”)


> Does the algorithm described for finding a number in an unsorted array using minimal checks really work? Probably is addressed more in the book but didn't see my concerns in the post.

> I think it assumes that all values will be found in memory somewhere

That assumption is not made; the post does cover this. You left out the beginning of the algorithm:

> A more clever search algorithm can do it with just one bound check in all cases. Tack the desired item on to the end of the array, then start your pointer at the head of the array and do the following in a loop:

It's not clear to me how you're supposed to "tack the item on to the end of the array", though. That's not an operation arrays naturally support. If you recopy the array so you can add your sentinel value, then you're replacing an average n/2 bounds checks with a guaranteed n copies.


>It's not clear to me how you're supposed to "tack the item on to the end of the array", though. That's not an operation arrays naturally support.

It's a small change to avoid the append. First check the final array item separately, and return success if it matches. If it doesn't match, overwrite the final array item with the desired item, and continue as before except this time checking if you're within array bound - 1.


It probably is easier to reserve an array item for that sentinel marker.

In either case, it requires your array to be writable, and you’re giving up having multiple searches through the same array operating at the same time.

Also, on modern hardware, the ‘ran out of items’ check is essentially free.

Nowadays, the only realistic use for this would be a case where you have a fixed-size array that you search frequently, where the same end marker can be used all the time, so that you can add that sentinel marker once.

I’ve a hard time thinking of examples, but do not rule out cases inside OS kernels or search trees for computer chess or similar games.

A variant would be to use this in C, and, when the program allocates N+1 chars to hold a string of length <N, allocate N+2 chars, and set the last one to zero, to ‘guarantee’ that later strlen or strcpy calls don’t run on forever, even if a strncpy copies N+1 bytes. I don’t see how that ‘guarantee’ would add much, though.


>Nowadays, the only realistic use for this would be a case where you have a fixed-size array that you search frequently, where the same end marker can be used all the time, so that you can add that sentinel marker once.

The sentinel needs to be identical to the value you're searching for, otherwise you're trading "compare a pointer" for "compare a value with an arbitrary equality implementation" which is no better (and probably worse).


Do most array structures have a pointer to last element? Or similarly useful do they track length?


Not in C and similar languages, but you already need to know the length for the bounds check.


I think you could argue Pascal is similar to c? And Pascal typically encodes length/size in the first slot.


Thanks everyone; I actually heard of this algorithm before but wasn't thinking of it today and missed this when I wrote the above post.

>It's not clear to me how you're supposed to "tack the item on to the end of the array", though

The algorithm mentions pointers so I think it means temporarily change a value in memory that is after the array with the array value. I'd be worried about how this would work if there are multiple threads, though, in case another thread wants to access the memory that was temporarily swapped out. (Edit: as other poster mentions, would be helpful to have an extra entry in the array for this purpose.)

If the 'array' is internally something like a vector, then you can add it on to the end quickly by appending a node to the tail of a linked list.


It'd be problematic for some operations, but if your access is dominated by reads, then setting aside an extra entry for a sentinel value 'only' requires serialising operations that needs the sentinel value and mutations. You'll need to protect against mutations during the search anyway, so then it boils down to if its better to wait or fall back to a bounds-checking version.

Though I'd question if it's worthwhile vs. some unrolling (you might even have a use for Duffs device...) to just do whatever number of iterations you want per bounds check.


Yes, multi-threaded access to the array would invalidate the assumptions made when designing that algorithm, as so often happens.

And copying the array would cost more than you would save, so you would want to be sure that you don't have to do that. In general a growable array implementation will double the storage of the array (or multiply it by some other factor, such as the square root of 2) whenever it does grow it, so that it doesn't immediately have to grow it again. You could just arrange for your implementation to grow whenever there's only one storage location left, and then you could use this trick when searching it.

But recall that this algorithm was designed in a context where neither of those constraints apply.


I think a more typical, high performance operation for copying arrays would be using mem copy operations. This is how Go handles a lot of slice operations under the covers.

If it's a C style array you could always just keep it one larger than the actual size; perhaps place a null value at the end and swap it out as needed. Just build an abstraction over it.


You're missing some of the implementation. From the blog:

> A more clever search algorithm can do it with just one bound check in all cases. Tack the desired item on to the end of the array, then start your pointer at the head of the array and do the following in a loop: ...

So you will do exactly one more value check in the case where the element is not in the list, and n - 1 less bounds checks.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: