I think grep (and wc) are included in that table as a sort of baseline. Like, "r... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		burntsushi on March 15, 2021 \| parent \| context \| favorite \| on: Performance comparison: counting words in Python, ... I think grep (and wc) are included in that table as a sort of baseline. Like, "roughly speaking, what can we expect?" Or maybe, "what can we hope for?" grep (or ripgrep) isn't otherwise particularly relevant here. In the case of grep and wc, the "optimized" variant is running with LC_ALL=C, which sets the locale to C. Presumably, the non-optimized variant is using the OP's default locale, which is maybe something like en_US.UTF-8. (Apologies for the US assumption, but what matters is that it's not the C locale.) This is a common trick for speeding up GNU utilities when you're only working with ASCII data. It also works for things like `sort`. In the case of grep, sometimes running with and without the locale will be about as fast, particularly if you're only searching for a simple ASCII literal pattern.

rtb on March 15, 2021 [–]

The "grep" and "wc" commands that are included in that table don't attempt to solve the problem. That wasn't clear to me on first reading.

I suppose they are included as a baseline for reading a file and for tokenising strings into words. See https://github.com/benhoyt/countwords/blob/eb2a8adf21c895907...

Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact