Do you understand how your compiler works? Shouldn't you be writing assembly ins...

kjksf · 2025-06-15T21:21:31 1750022491

I've listed several reasons why I decided to write and use this implementation:

  - better call stacks in crash reports
  - smaller and faster at runtime
  - faster compilation because less complicated, less templated code
  - I understand it

So there's more to it that just that one point.

Did I loose useful attributes? Yes. There's no free lunch.

Am I going too far to achieve small, fast code that compiles quickly? Maybe I do.

My code, my rules, my joy.

But philosophically, if you ever wonder why most software today can't start up instantly and ships 100 MB of stuff to show a window: it's because most programmers don't put any thought or effort into keeping things small and fast.

spacechild1 · 2025-06-15T21:33:28 1750023208

Oh, I definitely agree with some of your other points, just not the one I argued against.

BTW, I would also contest that your version is faster at runtime. Your data always allocated on the heap. Depending on the size of the data, std::function can utilize small function optimization and store everything in place. This means there is no allocation when setting the callback and also better cache locality when calling it. Don't make performance claims without benchmarking!

Similarly, the smaller memory footprint is not as clear cut: with small function optimization there might be hardly a difference. In some cases, std::function might even be smaller. (Don't forget about memory allocation overhead!)

The only point I will absolutely give you is compilation times. But even there I'm not sure if std::function is your bottleneck. Have you actually measured?

kjksf · 2025-06-15T23:15:36 1750029336

That's a fair point. I just looked and out of 35 uses of MkFunc0 only about 3 (related to running a thread) allocate the args.

All others use a pointer to an object that exists anyway. For example, I have a class MyWindow with a button. A click callback would have MyWindow* as an argument because that's the data needed to perform that action. That's the case for all UI widgets and they are majority uses of callbacks.

I could try to get cheeky and implement similar optimization as Func0Fat where I would have inline buffer on N bytes and use it as a backing storage for the struct. But see above for why it's not needed.

As to benchmarking: while I don't disagree that benchmarking is useful, it's not the ace card argument you think it is.

I didn't do any benchmarks and I do no plan to.

Because benchmarking takes time, which I could use writing features.

And because I know things.

I know things because I've been programming, learning, benchmarking for 30 years.

I know that using 16 bytes instead of 64 bytes is faster. And I know that likely it won't be captured by a microbenchmark.

And even if it was, the difference would be miniscule.

So you would say "pfft, I told you it was not worth it for a few nanoseconds".

But I know that if I do many optimizations like that, it'll add up even if each individual optimization seems not worth it.

And that's why SumatraPDF can do PDF, ePub, mobi, cbz/cbr and uses less resources that Windows' start menu.

spacechild1 · 2025-06-16T00:38:36 1750034316

First, thanks for providing SumataraPDF as free software! I don't want to disparage your software in any way. I don't really care how it's written as long as it works well - and it does! This is really just about your blog post.

> I just looked and out of 35 uses of MkFunc0 only about 3 (related to running a thread) allocate the args.

In that case, std::function wouldn't allocate either.

> All others use a pointer to an object that exists anyway. For example, I have a class MyWindow with a button. A click callback would have MyWindow* as an argument because that's the data needed to perform that action. That's the case for all UI widgets and they are majority uses of callbacks.

That's what I would have guessed. Either way, I would just use std::bind or a little lambda:

    struct MyWindow { void onButtonClicked(); };

    // old-school: std::bind
    setCallback(std::bind(&MyWindow::onButtonClicked, window));

    // modern: a simple lambda
    setCallback([window]() { window->onButtonClicked(); });

If your app crashes in MyWindow::onButtonClicked, that method would be on the top of the stack trace. IIUC this was your original concern. Most of your other points are just speculation. (The compile time argument technically holds, but I'm not sure to which extend it really shows in practice. Again, I would need some numbers.)

> I know things because I've been programming, learning, benchmarking for 30 years.

Thinking that one "knows things" is dangerous. Things change and what we once learned might have become outdated or even wrong.

> I know that using 16 bytes instead of 64 bytes is faster. And I know that likely it won't be captured by a microbenchmark.

Well, not necessarily. If you don't allocate any capture data, then your solution will win. Otherwise it might actually perform worse. In your blog post, you just claimed that your solution is faster overall, without providing any evidence.

Side note: I'm a bit surprised that std::function takes up 64 bytes in 64-bit MSVC, but I can confirm that it's true! With 64-bit GCC and Clang it's 32 bytes, which I find more reasonable.

> And even if it was, the difference would be miniscule.

That's what I would think as well. Personally, I wouldn't even bother with the performance of a callback function wrapper in a UI application. It just won't make a difference.

> But I know that if I do many optimizations like that, it'll add up even if each individual optimization seems not worth it.

Amdahl's law still holds. You need to optimize the parts that actually matter. It doesn't mean you should be careless, but we need to keep things in perspective. (I would care if this was called hundreds or thousands of times within a few milliseconds, like in a realtime audio application, but this is not the case here.)

To be fair, in your blog post you do concede that std::function has overall better ergonomics, but I still think you are vastly overselling the upsides of your solution.

maleldil · 2025-06-15T23:05:41 1750028741

> You can't understand all internals, and that's perfectly fine.

C++ takes this to another level, though. I'm not an expert Go or Rust programmer, but it's much easier to understand the code in their standard libraries than C++.

spacechild1 · 2025-06-16T00:43:24 1750034604

Fair enough :) Unfortunately, this is just something one has to accept as a C++ programmer. Should we roll our own std::vector because we can't understand the standard library implemention? The answer is, of course, a firm "no" (unless you have very special requirements).