
Blurred rounded rectangles - jobstijl
https://raphlinus.github.io/graphics/2020/04/21/blurred-rounded-rects.html
======
gfxgirl
> elaborate 3d scenes built up out mostly out of distance field primitives, a
> stunning demonstration of the power and flexibility of the technique.

Also a demonstration of how slow that technique is. I can run stunning games
with entire cities of buildings and people and cars and mountains in the
distance and trees and grass and clouds all running at 60fps or faster. Or I
can run some SDF that runs at 0.2 to 3 fps on the same machine.

Don't get me wrong, I'm blown away by those shaders but they aren't remotely
performant.

This particular technique might be okay but you'd still arguably be better
running it on 4 quads that make a frame. There's no reason you want to be
computing pixels in the middle of the frame where there is no shadow.

~~~
kroltan
> Also a demonstration of how slow that technique is.

Or how fast! SDFs can compute approximations of volumetric effects that on a
regular raytracing engine would take a few seconds to render.

Additionally, games have been using (baked) 2D SDFs [1] for ages to render
world-space (and recently even plain screen-space) text, it's plausible to use
the same technique to generate other kinds of shapes.

It can be very useful for UI elements since you can have just a few source
assets, and with some shader parameters you get fully animatable effects such
as drop shadows, glow, and even normals.

[1]:
[https://steamcdn-a.akamaihd.net/apps/valve/2007/SIGGRAPH2007...](https://steamcdn-a.akamaihd.net/apps/valve/2007/SIGGRAPH2007_AlphaTestedMagnification.pdf)

~~~
BubRoss
You are conflating multiple things here. Marching through volumes for
volumetric effects are not going to be made faster by SDFs generally because
you need to march through the videos from an eye ray and march back to the
light sources while doing it.

2D signed distance fields are also different, since as textures, each shader
fragment is still looking at the same pixels it would have seen before and is
just able to do a little extra work with the values it finds to create sharp
text.

------
grenoire
Fun fact: iOS app icons are not rounded rectangles (rectangles with 90 degree
arc corners) but squircles, which are roughly superellipses with n=5. The
linked Wikipedia articles also remark this.

~~~
busymom0
Same thing with the rounded corner of the iPhone X series. One time I was
trying to make one of he UIView match the corner radius of the screen and it
wasn’t matching until I learnt that the rounded corners of the screen/device
is a 38.5 squircle (not sure if that’s the real term). Basically instead of
having a sharp curve, it starts much earlier.

~~~
saagarjha
That’s a slightly different curve, and I believe the corner radius for that is
defined to be 39.

~~~
busymom0
This is where I originally learnt about it. Seems like it's called a
continuous corner, or “squircle”:

[https://kylebashour.com/posts/finding-the-real-iphone-x-
corn...](https://kylebashour.com/posts/finding-the-real-iphone-x-corner-
radius)

~~~
saagarjha
The corner radius is 39; you can find it by peeking inside iOS. CALayer has
"continuous corners", which are very similar to but just slightly different
than the app icon shape, which is a 16-part Bézier curve generated inside of
the MobileIcons framework. This curve is as far as I can tell identical to the
one that UIBezierPath.init(roundedRect:cornerRadius:) will give you when the
corner radius is 22.5% of the side length. (Note that the app icon on the home
screen, being of a constant size, is actually generated via an image mask
rather than dynamic clipping.)

------
pixelpoet
See also the great article from 2001 by Michael Herf (these days best known
for f.lux):
[http://stereopsis.com/shadowrect/](http://stereopsis.com/shadowrect/)

------
lainga
Why again are explicit `min` and `max` faster? Is that GLSL specific, and
(unlike say C++, where std::min and std::max are just `if (__a < __b) return
__a; return __b;`) the compiler won't be able to turn a one-line conditional
into an ARB MIN or MAX instruction?

~~~
Jasper_
The classic explanation is "divergence", and it goes something like this: On
GPUs, branching is tricky because the same code is evaluating many pixels at
the same time. If half of those pixels go one way, and half of those pixels go
another way, the GPU has to run _both_ pieces of code, with half the results
"masked out" [1]. This is why branchless code tends to be more idiomatic in
shading languages.

You might ask how max() and min() are implemented with a branchless model.
Sometimes the GPU has a native instruction for it, and a "sufficiently stupid
compiler" might not be able to recognize the branch and turn it into the
corresponding max/min.

The modern reality is that most all GPUs all have conditional move
instructions which allow them to do some amount of branchless conditional
across vector lanes like "x >= 1.0 ? x : 0.0;" without incurring the penalty
of true flow control.

However, some are still uncomfortable with trusting the compiler to recognize
and support this, especially on mobile chipsets with poor quality compilers.
Others still just prefer the coding style of the idiomatic branchless
expressions, since it's what they're used to.

[1] Footnote: On super old GPUs, like those in the Direct3D 8 era, flow
control was emulated _completely_ through branchless systems. The native
machine ISA was something like a series r=lerp(A,B,C)+D instructions, and flow
control amounted to clever abuse of this paradigm -- lerping to 0.0 or 1.0 can
get you a form of conditional move.

------
benkoller
It really gives me great pleasure to find gems like this article on HN. I
can't see a future in which I'd otherwise gained the insights I've now gained
through reading (your?) piece about the complexity of blurring complex shapes.
Thanks for that.

------
dahart
Nice article! I have a couple of questions.

> [reciprocal square root] it is particularly well supported in SIMD and GPU
> and is generally about the same speed as simple division.

Curious, Raph - why is the erf using f64? Reciprocal square root is well
supported for single precision, but not double precision. And the spline fit
constants in there are single precision anyway. I’m guessing it’d be a lot
faster with no harmful effects as f32. (Seems to work fine on ShaderToy BTW).

Also curious if erf() might be overkill? Did you compare to using a
smoothstep()? What are the quality indicators you’re looking for? It seems
like I get very close to the same results as your erf approximation if I use
smoothstep(0., blurwidth, sqrt(d)) where d is the SDF distance to the box.
(With the added benefit that I automatically have a strict bound on the blur.)

~~~
raphlinus
Sure, you'd want to do this with 32 bit floats in production, the f64 was
really for prototyping.

I didn't compare smoothstep. It's worth doing an analysis of the tradeoff
between performance and quality. In any case, I think in practice this erf
approximation will be plenty fast, and probably a bit better quality,
especially in the tail region.

