Hacker News new | past | comments | ask | show | jobs | submit login
Deblur-GS: 3D Gaussian splatting from camera motion blurred images (chaphlagical.icu)
170 points by smusamashah 14 days ago | hide | past | favorite | 38 comments



The example blurred images look very "clean" for lack of a better term, as if they were produced synthetically by motion-blurring crisp images in a particular direction. I wonder how well it fares with real shaky-camera footage (where the path of motion blur in a given frame might not even be a straight line)


So much so, to me it looks like they took the clean image and added the blur in post, but I really do not believe nor claim that's what they did. It's just what it looks like. The speeds of the motion aren't even the same, so the interpolation is just off which is what gives it the uncanny valley feeling for me.


The camera motions do not need to be the same. Gaussian splatting reconstructs the scene in 3d, and you can then render the scene from arbitrary angles, so they just gave it a random camera motion to show you the 3dness of it.


Very cool! A next step could be to model a rolling-shutter


I’m interested in the number of unsolved cases this tech will help solve.


It might help a detective take a new look at a scene as if they were there. I suspect there are generally enough crime scene photos for a given crime.

However, I’m not sure how admissible this would be as evidence given the somewhat generative nature. I would assume any lawyer would tear apart the fact that this is guesswork/estimation and not a known “truth”. Deblurring tech already exists to mathematically undo motion blur, but gaussian splatting would effectively be “creating” evidence.

Allowing generated images like that to be considered evidence of crimes would be an incredibly dangerous precedent to set, in my opinion. But with this court, and the right case, dangerous, unpopular, and otherwise questionable precedents seem to be the name of the game these days. Especially when judges can be bought off without any repercussions.


Absolutely impressive - seems on par with what's happening in our eyes and brain. If this becomes realtime, we could turn the noisy low fps image from cameras on AR headsets in dark environments into smooth bright image.


The opposite might be useful too. Often when shooting outside there is too much light, and a high shutter speed is automatically used. Resulting in footage that has not enough motion blur. Ideally you would want to have the shutter speed at twice the frame rate for smooth looking motion blur. But I think this might be a little harder to add in camera or post. For example when filming someone "talking with their hands". You would want to add blur to their hands, but not necessarily to their head which is probably mostly stationary.


I’m not sure it works out that simply in practice because of camera movement vs subject movement


I know it's a meme at this point but this is real life "Enhance please". Incredibly impressive what we're able to do to reconstruct missing data.


Except that it's not really regenerated but hallucinated. Not to downplay how cool this is though.



This could make various “night mode” photos much clearer. And the motion perhaps even helpful.


Huawei does something similar with an algorithm on their newest phones. AFAIK it's a double exposure recombination method though, not gaussian splatting, cool nonetheless!


The reconstruction looks even better than ground truth images in their examples.


I really wish the examples included moving subjects. It's hard not to think this was intentionally excluded :<

My friend doesn't quite grasp this yet, can someone explain? Is the reconstructed detail all "real" and extracted from the blurred input, or is there some model at work here, filling in the image with plausible details, but basically making up stuff that was not really there to start with?


That's accurate. What's worth nothing though is that everything we 'see' with our own eyes is constructed from sampling our environment. The image we construct is what we expected to see given the sample data. This is one reason why eyewitness testimony can be vivid and false without any foul play.


No it does not "make up things" using generative AI. Current GS implementations assume camera poses are static. This paper assigns a linear motion trajectory to camera during training.


So can it handle when both camera and multiple objects in scene are moving in different trajectories?


Not with traditional 3D Gaussian splatting, but it is potentially possible to separate the time axis and do a 4D Gaussian splatting with some regularization to accommodate dynamic scenes.

Here's some early work in this area which seems promising: https://guanjunwu.github.io/4dgs/


I skimmed the Overview and am not an expert.

It seems to me they don't use any ML at all. They use backpropagation to jointly optimise the entire physics/motion model, which models camera motion and the generated blurry images (they generate multiple images for each camera frame along the path of motion of the camera, and then merge them, simulating motion blur)


It is ML in the sense of optimizing a nonconvex loss function over a dataset. It is not a fancy diffusion model or even a generative model, but it is no less a machine learning problem.


“Not ML” as in “not learning from data to apply in new situations” but rather they do “mathematical optimisation”.

The data they optimise over is just the images of the current camera trajectory (as far as I understand)


Gaussian Splatting creates an "approximation" of a 3D scene (captured from a video) using hundreds of thousands (or even millions) of tiny gaussian clouds. Each gaussian might be as small as a couple of pixels, and all these 3D gaussians get projected onto the 2D image plane (fast in GPU) to realize a single image (i.e. a single pose of the video camera). These gaussians are in 3D, so they explicitly represent the scene geometry e.g. real physical surfaces, and an approximation of physical textures. When a camera blurs an image, the physical surface / object gets blurred across many pixels. But if you can reconstruct the 3D scene accurately, then you can re-project the 3D gaussians into 2D images that end up not blurry. Another way to view the OP is that this technique is a tweak to the "sharp images only" Gaussian Splatting work from last year to deal with blurry images.

The OP paper is cool but isn't alone, here's some concurrent work: https://github.com/SpectacularAI/3dgs-deblur

Also related from a couple years ago, using NeRF methods (another area of current 3D research) to denoise night images and recover HDR: https://bmild.github.io/rawnerf/ NeRF, like Gaussian Splatting, seeks to reconstruct the scene in 3D, and RawNeRF adapts the approach to deal with noisy images as well as large exposure variation.

In terms of Gaussian Splats vs GenAI, usually GenAI models have been trained on a prior of millions of images so that they can impute / inference some part of the 3D scene or some part of the input images. However Gaussian Splats (and NeRF) lack those priors.


Gaussian blur is a reversible operation, but in practice it's not possible on still images. With multiple pictures you might have enough information.


Both. The paper mentions using a deblurrer and novel view synthesis model(ExBluRF).


finally, all the UFO videos can be clear!


The aliens are actually pan-dimensional light beings. That is why they are afraid of high quality cameras, if they get caught in a photo they are stuck here forever. Running this algorithm on pictures of UFOs is actually an intergalactic warcrime.


What benefits does it have over existing algorithms such as ESRGAN, CCSR, DAT, SPAN, SUPIR?


I really want to be impressed, but I've been reading papers about breakthroughs in deblurring and upscaling for two decades now, and the state of the art in commercial and open-source tools is still pretty underwhelming. Chances are, if you have a low-res keepsake photo, or take a blurry nature shot, you're gonna be stuck with that.

Video, where the result needs to be temporally coherent and make sense in 3D, can't be the easier one.


> Video, where the result needs to be temporally coherent and make sense in 3D, can't be the easier one.

Why not? Video is a much more tractable problem because you have much more information to go on.


At this stage there's really only a couple of options. More than there used to be, but still.

When you want to stay faithful to the actual data then your options are limited, for quite a large part of the image a simple convolution is about as good as it gets, except for the edges. Basically the only problem we couldn't solve 20 years ago was excessive ringing (which is why softer scaling algorithms were preferred). You can put quite a lot of effort into getting clearer edges, especially for thin lines, but for most content you can't expect too much more sharpness than what the basic methods get you.

And then there is the generative approach where you just make stuff up. It's quite effective but a bit iffy. It's fine for entertainment but it's debatable if the result is actually a true rescale of the image (and if you average over the distribution of possible images the result is too soft again).

In theory video can do better by merging several frames of the same content.


Video is absolutely the easier case - there's a lot more information to go on. A single blurry photo has lost information compared to the original, but you can theoretically recover that information in a video where you get to see the subject with a variety of different blurs/distortions applied.

Note that a limitation of this result is that it assumes a static scene, but that's already a typical limitation of most gaussian splat applications anyway, so it kind of doesn't matter?


this work won't solve that. it requires a video (sequence of images)


If your brain can imagine it de-blurred or de-scuffed, we'll get there with techniques eventually.


Hi,

I'm trying to make an opensource NN camera pipeline (objective is to be able to run on smartphones but offline, not real-time), and I'm still barely managing the demosaicing part... would you be open to discussing with me?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: