fn is_match(&self, haystack: &[u8]) -> Result<bool, Error> { ... }

burntsushi · on Aug 8, 2022

That's exactly what the next section of my post explores and explains why it's complete bonkers: https://blog.burntsushi.net/unwrap/#why-not-return-an-error-...

So it looks like I already provided a rewrite of the code for you in your preferred style:

    // Returns true if the DFA matches the entire 'haystack'.
    // This routine always returns either Ok(true) or Ok(false) for all inputs.
    // It never returns an error unless there is a bug in its implementation.
    fn is_match(&self, haystack: &[u8]) -> Result<bool, &'static str> {
      let mut state_id = self.start_id;
      for &byte in haystack {
        let row = match state_id.checked_mul(256) {
          None => return Err("state id too big"),
          Some(row) => row,
        };
        let row_offset = match row.checked_add(usize::from(byte)) {
          None => return Err("row index too big"),
          Some(row_offset) => row_offset,
        };
        state_id = match self.transitions.get(row_offset) {
          None => return Err("invalid transition"),
          Some(&state_id) => state_id,
        };
        match self.is_match_id.get(state_id) {
          None => return Err("invalid state id"),
          Some(&true) => return Ok(true),
          Some(&false) => {}
        }
      }
      Ok(false)
    }

My favorite part is the docs that say "this never returns an error." Because if it did, the code would be buggy. And now all sorts of internal implementation details are leaked.

You can't even document the error conditions in a way that is related to the input of the routine, because if you could, you would have discovered a bug!

Another nail in the coffin of your preferred style is that this will absolutely trash the performance of a DFA search loop.

fatherzine · on Aug 8, 2022

Man, I only gave some friendly feedback ;)

"this will absolutely trash the performance of a DFA search loop."

Do you happen to have some hard data to quantify? I'm curious.

ninkendo · on Aug 8, 2022

One additional data point: The Rust compiler got ~2% faster when they made some of the core traits in the serializer infallible (ie. panic instead of using Result): https://github.com/rust-lang/rust/pull/93066

Writeup from the contributor: https://nnethercote.github.io/2022/02/25/how-to-speed-up-the...

> The Decoder trait used for metadata decoding was fallible, using Result throughout. But decoding failures should only happen if something highly unexpected happens (e.g. metadata is corrupted) and on failure the calling code would just abort. This PR changed Decoder to be infallible throughout—panicking immediately instead of panicking slightly later—thus avoiding lots of pointless Result propagation, for wins across many benchmarks of up to 2%.

burntsushi · on Aug 8, 2022

Yes. I work on regex engines. Here's the code with all the crazy checks that return errors that can never happen:

    fn find_fwd_imp<A: Automaton + ?Sized>(
        dfa: &A,
        input: &Input<'_, '_>,
        pre: Option<&'_ dyn Prefilter>,
        earliest: bool,
    ) -> Result<Option<HalfMatch>, MatchError> {
        let mut mat = None;
        let mut sid = init_fwd(dfa, input)?;
        let mut at = input.start();
        while at < input.end() {
            let byte = match input.haystack().get(at) {
                None => return Err(MatchError::GaveUp { offset: at }),
                Some(&byte) => byte,
            };
            sid = dfa.try_next_state(sid, byte)?;
            if dfa.is_special_state(sid) {
                if dfa.is_match_state(sid) {
                    let pattern = dfa.match_pattern(sid, 0);
                    mat = Some(HalfMatch::new(pattern, at));
                } else if dfa.is_dead_state(sid) {
                    return Ok(mat);
                }
            }
            at = match at.checked_add(1) {
                None => return Err(MatchError::GaveUp { offset: at }),
                Some(at) => at,
            };
        }
        eoi_fwd(dfa, input, &mut sid, &mut mat)?;
        Ok(mat)
    }

And here's a search (I ran it a few times):

    $ regex-cli count matches dfa dense @$medium '\w{42}'
                parse time:  45.331µs
            translate time:  28.767µs
          compile nfa time:  9.577558ms
                nfa memory:  698148
    compile dense dfa time:  516.497476ms
          dense dfa memory:  6562848
     dense alphabet length:  113
              dense stride:  128
               search time:  531.808124ms
                    counts:  [2]

Here's code that's safe that will panic when there's a bug:

    fn find_fwd_imp<A: Automaton + ?Sized>(
        dfa: &A,
        input: &Input<'_, '_>,
        pre: Option<&'_ dyn Prefilter>,
        earliest: bool,
    ) -> Result<Option<HalfMatch>, MatchError> {
        let mut mat = None;
        let mut sid = init_fwd(dfa, input)?;
        let mut at = input.start();
        while at < input.end() {
            sid = dfa.next_state(sid, input.haystack()[at]);
            if dfa.is_special_state(sid) {
                if dfa.is_match_state(sid) {
                    let pattern = dfa.match_pattern(sid, 0);
                    mat = Some(HalfMatch::new(pattern, at));
                } else if dfa.is_dead_state(sid) {
                    return Ok(mat);
                }
            }
            at += 1;
        }
        eoi_fwd(dfa, input, &mut sid, &mut mat)?;
        Ok(mat)
    }

And here's a search with it:

    $ regex-cli count matches dfa dense @$medium '\w{42}'
                parse time:  33.701µs
            translate time:  24.264µs
          compile nfa time:  9.55935ms
                nfa memory:  698148
    compile dense dfa time:  550.959698ms
          dense dfa memory:  6562848
     dense alphabet length:  113
              dense stride:  128
               search time:  468.932568ms
                    counts:  [2]

So that's 10% right there. We can keep going though. Let's get rid of the bounds checks. We have to use unsafe though. The stakes have risen. Now if there's a bug, we probably won't get a panic. Instead we get UB. Which might lead to all sorts of bad stuff.

    fn find_fwd_imp<A: Automaton + ?Sized>(
        dfa: &A,
        input: &Input<'_, '_>,
        pre: Option<&'_ dyn Prefilter>,
        earliest: bool,
    ) -> Result<Option<HalfMatch>, MatchError> {
        let mut mat = None;
        let mut sid = init_fwd(dfa, input)?;
        let mut at = input.start();
        while at < input.end() {
            sid = unsafe {
                let byte = *input.haystack().get_unchecked(at);
                dfa.next_state_unchecked(sid, byte)
            };
            if dfa.is_special_state(sid) {
                if dfa.is_match_state(sid) {
                    let pattern = dfa.match_pattern(sid, 0);
                    mat = Some(HalfMatch::new(pattern, at));
                } else if dfa.is_dead_state(sid) {
                    return Ok(mat);
                }
            }
            at += 1;
        }
        eoi_fwd(dfa, input, &mut sid, &mut mat)?;
        Ok(mat)
    }

And now the same search:

    $ regex-cli count matches dfa dense @$medium '\w{42}'
                parse time:  22.511µs
            translate time:  26.843µs
          compile nfa time:  8.709352ms
                nfa memory:  698148
    compile dense dfa time:  565.365005ms
          dense dfa memory:  6562848
     dense alphabet length:  113
              dense stride:  128
               search time:  456.402963ms
                    counts:  [2]

For a smaller gain.

fatherzine · on Aug 8, 2022

This is so awesome, thanks for sharing! I am happy that the "~10% perf hit" intuition is reasonably close to what the data shows. What is the right tradeoff? I am not an author of widely popular regex engines, so I'll leave that evaluation to more qualified people.

burntsushi · on Aug 8, 2022

10% is an excellent win. And given that I consider the first code snippet to be effectively nonsense, the right trade off is obvious to me. :)

I bet if me and you sat down with a big codebase side-by-side, I could convince you that the style you're proposing is totally unworkable.

Now, whether it's worth it to use unsafe and elide bounds checks is harder. The gain I showed here is small, but it can be bigger in other examples. It depends. But it is certainly less clear. It helps that this is probanly the easiest kind of unsafe to justify, because you just have to make sure the index is in bounds. Other types of unsafe can be quite tricky to justify.