Hacker News new | past | comments | ask | show | jobs | submit | __abadams__'s comments login

I've found sorting-network-based filters to be faster than these methods up to size 25x25 or so. It has worse computational complexity, but filters up to that size covers a lot of ground in practice. See Figure 10 in https://andrew.adams.pub/fast_median_filters.pdf


Placing pixel centers at 0.5, 0.5 is only the obvious choice if you think pixels are little squares rather than point samples. Pixels-as-squares makes intuitive sense to people, especially those raised on pixel art, but it's just one possible choice you can make for your reconstruction function. It's not even a particularly good choice. It doesn't model real sensors or real displays, and it doesn't have particularly nice theoretical properties. The only thing going for it is that it's cheaper to compute some things.


It’s true that pixels aren’t most accurately modeled as squares, but they should still be centered at (0.5, 0.5), because you want the center of mass of a W×H pixel image to be at exactly (W/2, H/2) no matter what shape the pixels are. Otherwise it shifts around when you resize the image—perhaps even by much more than 1 pixel if you resize it by a large factor.


Unfortunately doesn't make sense when you need to look up pixel 0.5,0.5 in the framebuffer.

When dealing with cameras, the central point us rarely h/2,w/2. So you're really dealing with two sets of coordinates, camera coordinates and sensor coordinates, that need to be converted between.

Integer coordinates are convenient for accessing the sensor pixels, and the camera-to-sensor space transform should theoretically include for the 0.5,0.5 offset. However, getting a calibration within 0.5 pixels accuracy is going to be hard to begin with.


Nobody’s suggesting that pixels are stored at half-integer memory addresses. After all, only a small subset of the continuous image space will lie exactly on the grid of pixel centers—and this is true no matter how the grid is offset. The point is that the grid should be considered as being lined up with (0.5, 0.5) rather than with (0, 0).

So, for example, if you’re scaling an image up by 10× with bilinear interpolation, and you need to figure out what to store at address (7, 23) in the output framebuffer, you should convert that to continuous coordinates (7.5, 23.5), scale these continuous coordinates down to (0.75, 2.35), and use that to take the appropriate weighted average of the surrounding input pixels centered at (0.5, 1.5), (1.5, 1.5), (0.5, 2.5), and (1.5, 2.5), which are located at address (0, 1), (1, 1), (0, 2), and (1, 2) in the input framebuffer. The result will be different and visually more correct than if you had done the computation without taking the (0.5, 0.5) offset into account. In this case the naive computation would instead give you a combination of the pixels at (0, 2), (1, 2), (0, 3), and (1, 3) in the input framebuffer, and the result would appear to be shifted by a subpixel offset. This was essentially the cause of a GIMP bug that I reported in 2009: https://bugzilla.gnome.org/show_bug.cgi?id=592628.


Sub-pixel accuracy is needed for good results when using stereo camera pairs, and can routinely be achieved.


If your reconstruction function is symmetric, which it should be, then your pixel centers are at 0.5, 0.5. All commonly used resampling algorithms are symmetric, except perhaps truncating implementations of point sampling (which are thus inconsistent with the rest and arguably wrong for this reason). It doesn't matter how you choose to reconstruct, samples should still be centered around their reconstruction function.

Otherwise, you're shifting the image around every time you reconstruct. This causes errors and can be very hard to reason about.


In particular, (0.5, 0.5) pixel centers together with a 1×1 box filter makes for a decent approximation to the square pixels on a display.


Exactly. For example a path tracer takes multiple samples between 0,0 and 1,1 to create anti-aliasing.

But what I also miss from the article: shouldn't the range of a pixel be -0.5,-0.5 to 0.5,0.5?

If you translate a pixel, to for example a LED on a screen, it makes more sense that it's location is the center of the LED.


Halide is generally good for pipelines that do math on multidimensional arrays, so quite possibly. It has a property that I think would make it hard to use for some applications in scientific computing though - while you can mutate in-place, you can't mutate an array ("Func" in Halide) after it has been used by another pipeline stage. This means that the dependency graph between your pipeline stages has no cycles except for self-cycles, which lets us do lots of clever stuff like bounds inference to guarantee the program is correct regardless of the schedule. I can imagine things like flow simulation having a hard time with that constraint. In the deep learning context this means that CNNs are ideal, but RNNs can be awkward or impossible.


Andrew: have you considered adding something akin to Rust's lifetimes here? While you'd have restrictions on the schedule, it seems like it's still let you express who in the "graph" messes with the data as tightly as possible.


You again? The only other time your account has posted was to plug TVM in a different post on Halide. If you're not a Tianqi Chen sockpuppet, then you should apologize to him for making him look like an asshole.

For the benefit of others not aware: TVM took Halide, deleted the half of it they didn't understand, then reimplemented it (copy-pasting large chunks without credit, hacks and all) and claimed it was a revolutionary new deep learning IR. People who already knew Halide were left scratching their heads unable to tell how this wasn't just Halide re-marketed in a different community.


The HDR+ siggraph paper (hdrplusdata.org) states that they use Halide, as does the newer siggraph paper on the Pixel's portrait mode (http://graphics.stanford.edu/papers/portrait/wadhwa-portrait...), and the paper on the Pixel's video denoising algorithm (http://www.chiakailiang.org/papers/ehmann_2018_rtv.pdf). The Pixel camera is riddled with Halide code, but of course it's all closed-source.

I can also vouch for the youtube use, having been tangentially involved in it and given permission to state that they use it.


It works but still needs a bit of cleanup. I was planning to get back to it after an upcoming conference deadline. Here's the PR: https://github.com/halide/Halide/pull/3220


Halide doesn't currently have good ways to handle sparse matrices, unless they have some sort of special structure that lets you pack them into rectangular shapes (e.g. a constant number of non-zeros entries per row). It would be an interesting language extension. The big thing to add would be a way to reduce over the non-zero entries more efficiently than checking them all.


What Americans call scones are entirely unlike a British scone. A British scone is closest to an American biscuit.


And add to the confusion Scone is also a place in Scotland of some historical significance (not for baking though).


I find it in poor taste to be plugging TVM on a post about Halide, given TVM's history of lifting code from Halide without attribution (well beyond the "improved" fork of the IR).


I have not heard about that before, could you elaborate/point to other discussions on the internet that do the elaborating?

(BTW, based on your user-name: do you happen to be Andrew Adams?)


I am indeed Andrew. I'd rather not elaborate - it wasn't a huge amount of code, they added attribution when we complained about it, and they've been very careful to credit us appropriately since then. I shouldn't have even mentioned it but the TVM shilling in a Halide post struck a nerve.


Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: