That's not good enough. Rust already has a pure-Rust regex library. (I'm its aut...

sitkack · on Jan 4, 2021

> Those things are never going to be inlined across library boundaries.

What prevents this?

I trust you on on this, but where is the remaining work? Language semantics, compiler, third choice?

burntsushi · on Jan 4, 2021

It's prevented by good sense. The functions are likely multiple KB in size. Inlining them would seriously bloat the binary and would be unlikely to help due to how much work most regex engines do on each search.

The remaining work _on this particular benchmark_ is the regex algorithm itself. I'm on mobile so I can't do a deep dive, but I haven't yet figured out how to easily improve on this particular case. It has to do with the fact that the benchmark has a high match count and the finite automata approach in the regex crate has a bit higher overhead than the typical backtracking solution used in PCRE2 (which is also JIT'd in this case).

It's not the language, compiler, inlining or any other such thing. It's algorithms.

But this is one single benchmark. Before regex-redux there was regex-dna, and Rust's regex crate was #1 there. Why? Same reason. Algorithms.

You can't judge regex performance by a single benchmark. Two won't do it. Not even ten. It's one of the many problems with the Benchmark Game. This would be fine if everyone was circumspect and careful with their interpretation of the data provided, but they aren't. And the Benchmark Game doesn't really do much to alleviate this other than some perfunctory blurbs in some corners of the web site.

With that said, running a benchmark is hard work. It's easy to criticize.

igouy · on Jan 4, 2021

> … doesn't really do much to alleviate this … It's easy to criticize.

Indeed.

Especially easy if it's just fault-finding without any suggestion as to what might be done to "alleviate this" :-)

burntsushi · on Jan 4, 2021

You can do better than that, but I suppose I don't expect more than an insubstantial pithy quip from you.

Add analysis to each benchmark. Require submissions to come with analysis. Or more minimally, make the existing disclaimers on the web site more discoverable through one of any number of means, up to taste.

igouy · on Jan 4, 2021

> Add analysis to each benchmark.

Perhaps you'd like to take on that task?

> Require submissions to come with analysis.

Which raises the barrier for program contributors and presumably would require me to judge whether their analysis was acceptable? Me? Really? ;-)

> disclaimers on the web site more discoverable

The problem is that no one wants to "discover" disclaimers.

We want to see something that supports whatever it is we already believe — and we're very good at ignoring anything else.

burntsushi · on Jan 4, 2021

You asked me what could be done. I answered. I'm sure you have your reasons why you don't do more. But you asked.

> The problem is that no one wants to "discover" disclaimers.

Again repeating something I already said. It is indeed a problem. There are ways to address it. I personally think you do the absolute minimum.

igouy · on Jan 4, 2021

> It is indeed a problem. There are ways to address it.

Please make a specific suggestion.

burntsushi · on Jan 5, 2021

I did. You just didn't like their costs. Which is reasonable. They do have costs. I just happen to think they are worth it.

For analysis, it increases review time and increases the burden on contributors. The extent to which any one person can review all analyses isn't clear to me, and they would likely need to rely on contributors to self regulate them. But I would expect the maintainers to review them for some minimal level of quality. Being more specific than this would require a more thorough proposal. Given my time constraints and how much I dislike interacting with you, it's not something I can do right now.

As for increasing contributor barriers, yes, adding an analysis requirement would do that. However, I think you could compensate for that partially by reducing existing barriers. I've submitted code to your game before, and I'm unlikely to ever do it again. One reason is that interacting with you was unpleasant for me. But I found the interface difficult to navigate (that has since changed) and the rules difficult to discover. I think barriers could be reduced by switching to github (one of the reasons I use github) with more automated checks on submissions and better documented rules. (At the time when I worked on a submission it was unclear for example how to submit a Rust program that used external crates.)

Invariably though, providing an analysis is still more work. IMO, I think it would add rnough value to be worth it. I imagine many amalyses would build on top of others, similar to how code submissions do that today. As can be seen in any thread about the Benchmark Game, there are tons of comments that misunderstand things or make wild guesses. (Followed up invariably a day later with coy quips from you.) Some of those comments are from people that are unlikely to be helped by my analysis idea. But a lot aren't. I think a lot of people would benefit from an explanation of the performance characteristics of the program.

I generally try not to publish benchmarks without analysis. Here's one example.[1] I care a lot about it and I think it has immense benefits in terms of educating others. But reasonable people can disagree. You might have different values than me, which influences whether my suggestions are applicable to you or not.

As for the disclaimer, I don't really see the point of being more specific. There isn't much to this one? A simple suggestion might be to put it at the top of each page or even in the description of each benchmark. I don't know how much this would help things. Maybe people would still just ignore it.

People regularly misinterpret benchmarks and especially those in your Benchmark Game. From what I can tell, you as the maintainer have done very little to address that. Maybe you don't care. Maybe you think it isn't your problem to solve. Maybe you think it can't be solved or even improved, and therefore it isn't worth doing anything. I don't know. We might just disagree, which is fine.

Invariably, you can just respond and tell me to go do it. I think that's a cheap response, but I know you're fond of it. I've been thinking a lot about it and considering starting my own benchmark. But it will likely be a few years before I'll get to it. It is a trying daunting amount of work.

[1] - https://blog.burntsushi.net/ripgrep/

igouy · on Jan 5, 2021

> I think a lot of people would benefit from an explanation of the performance characteristics of the program.

For sure!

And that's way beyond the modest aims of the benchmarks game.

> People regularly misinterpret benchmarks … From what I can tell, you as the maintainer have done very little to address that.

Seems to me that misinterpretation is not something that is effectively addressed by website content; it's something that is effectively addressed by responding to what people say when they discuss benchmarks in forums like HN and proggit and … one person at a time.

> … you can just respond and tell me to go do it. I think that's a cheap response…

It isn't intended as a brush-off.

otoh as you show on proggit, it's when you "Just sit down and actually try to plan out what you would do." that you start to understand what is involved.

otoh I'd like to see others do all that stuff the benchmarks game does not do, in whatever way they choose https://pybenchmarks.org/

igouy · on Jan 5, 2021

> I did. … As for the disclaimer, I don't really see the point of being more specific.

I really was asking for a specific suggestion for "disclaimers on the web site more discoverable".

> A simple suggestion might be to put it at the top of each page or even in the description of each benchmark. I don't know how much this would help things. Maybe people would still just ignore it.

https://www.reddit.com/r/rust/comments/kpqmrh/rust_is_now_ov...

People see what they are looking for.