
Sentinels Can Be Faster - panic
https://lemire.me/blog/2020/09/03/sentinels-can-be-faster/
======
lalaland1125
This is a rather weird article. Of course sentinels can be faster in some very
particular algorithms. That's not the core contention. The core contention is
that sentinels in general cause more speed losses than speed gains over the
entirety of the application. Importantly, many useful algorithms such as
string copying, string comparison, and string concatenation can be much more
efficient using a size field.

~~~
colejohnson66
I think this is something C++ (and other later ones) got right; A size field
_and_ a sentinel’d (null-terminated) char[]. So getting the length only
requires reading 4 or 8 bytes, but if you want to pass that string to a
function that expects a null terminated string, just pass x.c_str().

~~~
throwaway894345
This seems useful if you're expecting to pass your strings to a lot of C
libraries, which is probably sensible for C++ (especially whatever vintage of
C++ happened to introduce this string design); however, surely this is a
negligible concern for most mainstream languages?

~~~
pengaru
> This seems useful if you're expecting to pass your strings to a lot of C
> libraries, which is probably sensible for C++ (especially whatever vintage
> of C++ happened to introduce this string design); however, surely this is a
> negligible concern for most mainstream languages?

No, the system call interface of the operating system expects NUL-terminated
strings as parameters all over the place, especially the filesystem APIs.

~~~
steveklabnik
Who came first, the chicken or the egg?

EDIT: Okay, this is at -1, maybe I shouldn't be slightly obtuse. These
questions are, in my mind, two sides of the same coin. The OS expects null
terminated strings, not for some inherent reason, but because C and UNIX grew
up together. There's no _inherent_ reason you can't have an OS that does not
use sentinels to demarcate strings.

~~~
throwaway894345
Probably my fault for phrasing the question ambiguously, but the parent
interpreted my question as I intended it to be interpreted. Interfacing with
the system libraries is a common enough case for using null-terminated
strings.

I'm actually a little curious how what these low-level APIs look like and how
"modern" languages handle efficiently converting to null-terminated strings.
Do they just copy everything while C/C++/etc would pass a straight reference?
Also, what do the underlying syscall APIs look like?

~~~
steveklabnik
It's all good. You are 100% right that it's still a good use-case.

And yeah, in Rust we have a separate CString type that's null terminated, so
you may need a copy to move between the two worlds. It just depends, if you're
only reading, for example, then you can create a pointer + len where the
length doesn't include the 0, and that's effectively zero cost.

You got me curious about my days as a kid putting "pascal" in front of Mac
System 7 stuff; I believe that it only accepted Pascal strings. I also vaguely
remember people doing funny stuff like null terminating pascal strings anyway
and then putting the length before the actual pointer so that you get both in
one...

~~~
colejohnson66
Weren’t “Pascal strings” limited in that they were only a byte though? So,
your string could only be 255 bytes long (conveniently going to a nice round
256 when you add the length byte).

~~~
steveklabnik
Yep!

------
sild
I put the benchmark into quick-bench but could not replicate the 40% result.
The sentinel version was faster but only slightly.

[https://quick-bench.com/q/314Z81FskTlcDqMCUHFVhWmDz8Q](https://quick-
bench.com/q/314Z81FskTlcDqMCUHFVhWmDz8Q)

Update 1: After moving some constants around, I get the 40% result:

[https://quick-bench.com/q/lPrpQTAyDQuOoKS9MBWCTBXk1TE](https://quick-
bench.com/q/lPrpQTAyDQuOoKS9MBWCTBXk1TE)

No idea why it made such a big difference to the benchmark.

Update 2: If the test order is reversed, the result goes back to being only
slightly faster for the sentinel version:

[https://quick-bench.com/q/Ds7aqe5-6md_tTPndOK54ltYZmE](https://quick-
bench.com/q/Ds7aqe5-6md_tTPndOK54ltYZmE)

~~~
brandmeyer
In the first two links you sent, the 40% result looks like the baseline case
getting slower, not the unit under test getting faster. The core assembly
looks look identical in both cases.

~~~
ggrrhh_ta
Well spotted; great for sild! Looks like the 40% claim was due to a bug in
benchmarking (which makes sense, and can happen).

------
bvrmn
I think most modern languages shifted towards slices due to security issues
with sentinels. C++ span, rust, zig, go. It's very sad we need to use libc
interfaces and deal with null terminated byte sequences.

~~~
steveklabnik
Not just security issues, it also makes substrings zero copy.

~~~
Someone
That can be problematic, too.

If your language has a garbage collector, it may mean tiny substrings keep
huge strings alive. Java moved away from using slices
([https://dzone.com/articles/changes-stringsubstring-
java-7](https://dzone.com/articles/changes-stringsubstring-java-7)), I think
for that reason.

If your language isn’t garbage collected, it can cause problems if the
substring must outlive the string it was extracted from. That means that, for
some functions, you’ll have to decide whether to return the slice or make the
copy.

(In Rust, that means the slice has a different ‘type’ (not sure what they call
it), that of a string whose lifetime is bound to the original string, and the
compiler will check that you won’t make the mistake of having the slice
outlive it’s source)

~~~
xfer
How huge typical strings are that this is a problem? It could be a problem for
arrays but seems unlikely for strings.

~~~
Someone
Some strings start live as part of multi-megabyte text files that get slurped
in. Happened quite a bit with HTML/XML parsers. Workaround was to do _new
String(s.substring(…))_ about everywhere. Real world case (with few details)
at [https://stackoverflow.com/questions/10951812/java-not-
garbag...](https://stackoverflow.com/questions/10951812/java-not-garbage-
collecting-memory).

Also, using slices doesn’t come for free; it makes all String objects larger
because, apart from the reference to the underlying character array, an offset
and length must be kept around.

------
dpc_pw
I never thought of null-termination as a "sentinel". I thought sentinel means
strategically putting somewhere a value to avoid having to check for
termination condition. Null-termination is not a sentinel, IMO. It is the
opposite - terminator, which has to be explicitly checked for. Am I wrong
here?

BTW. If you really want speed, SIMD will obliterate any non-SIMD solution, and
my bet would be that most of the time SIMD will be easier to implement with
length prefix because it will not have to handle a special case of null-
termination. Someone correct me if I'm wrong.

Trailing zero is a a typical case of a short-signed hack. More naive, simpler,
natural solution (length prefix) actually works better, especially long term,
where other things change (hardware architecture, performance and resource
constraint, use cases).

------
lowiqengineer
This is really less “sentinels can be faster” and more “assuming invariants
can be faster because you’re reducing comparisons”.

~~~
lifthrasiir
This. The whole argument only holds because you can choose `isspace` to return
false for '\0'. You should be able to choose whichever sentinel fits to the
required invariant, and C's global string "sentinel" is less useful in that
regard. If you need a sentinel you should be able to push one beforehand
anyway.

------
pornel
I really like Rust's approach of keeping string length in the pointer.

It allows making substrings without copying (and there's no costly substring
gotcha you get in GC languages, because the borrow checker will loudly remind
you when you try to keep the substring longer than the original string).

In string-processing algorithms the length usually stays in a register, so
(unlike Pascal string approach) it doesn't contribute to on-heap cost of the
string. Knowing length ahead of time allows having SIMD-optimized loops.

------
klodolph
This is not weird or theoretical.

Sometimes you find that in some complicated system, there's a big chunk of
overhead which you spend just parsing things. So, carefully, thinking about
the consequences, you can decide to use a sentinel for end of input rather
than length.

A larger, practical example: a C++ compiler spends a shocking amount of time
just doing lexical analysis on its inputs. One thing that Clang does to speed
it up is to get rid of the length check, just like suggested in the article.

[https://github.com/llvm-
mirror/clang/blob/master/lib/Lex/Lex...](https://github.com/llvm-
mirror/clang/blob/master/lib/Lex/Lexer.cpp#L3171)

    
    
        /// LexTokenInternal - This implements a simple C family lexer.  It is an
        /// extremely performance critical piece of code.  This assumes that the buffer
        /// has a null character at the end of the file. [...]
        bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
          [...]
          if ((*CurPtr == ' ') || (*CurPtr == '\t')) {
            ++CurPtr;
            while ((*CurPtr == ' ') || (*CurPtr == '\t'))
              ++CurPtr;
            [...]
          }
          [...]
        }
    

There are a number of places in the world where you might find that text
parsing is performance-critical. Note that in the rest of the code base, Clang
uses really quite standard and boring std::string to represent strings. You
don't have to use sentinels everywhere, but it's a trick to keep up your
sleeve for the cases where it makes a difference.

And of course, you keep the parts of your code with sentinels self-contained
because it can be hard to figure out whether the code is correct without a
close reading.

------
augustk
Another classical example of sentinels is improving a linear search by
inserting the key as the last element.

~~~
Someone
Classical, but often not the best idea, nowadays.

You often cannot know whether that element is writable, and if it is, that
will cause problems when running concurrent searches (even for the same key)

------
recursive
Any string representation is faster for some hand-picked task. But still, I
think it's reasonable to assert that certain ones are better than others for
general use.

------
jiggawatts
EDIT: Never mind, I mis-read the code in the blog...

~~~
klodolph
I think you may have misinterpreted the code.

> Zero spaces is a number of leading spaces. If you feed his efficient
> solution a string without a leading space, it'll read past the end of the
> string and into undefined memory until the program either crashes or reads
> some arbitrary space character (0x20) from some other string. If it contains
> spaces but doesn't start with one, you'll think it has more leading spaces
> than it actually does.

I don't know what code you are referring to, but I'm looking at this code:

    
    
        const char * skip_leading_spaces(const char * start) {
          while(is_space(*start)) {
            start++;
          }
          return start;
        }
    

And this correctly handles the case with zero leading spaces. It correctly
handles empty strings.

Recall that is_space('\0') is false (or at least isspace('\0') is false). Also
recall that it is valid to read the \0 byte at the end of a string.

I find it a bit unusual that you are capitalizing NULL. My interpretation is
that you are talking about the byte '\0', which is also called the null byte,
and not the NULL pointer.

