So 500 GB are about 2800 full video downloads.
/kenburns/promo.mp4 | GET | 175574 | 33.4183748571803
That the video was watched 175574 times? That would indeed be a lot!
Hope you're able to release it ASAP!
I skimmed the paper about the Ken Burns effect, and if you don't mind I have some questions. I hope I didn't miss the answers in the paper itself, I'll be sure to read it more carefully when time permits.
1. The loss function for the depth estimation is Ldepth=0.0001·Lord+Lgrad. Is Lgrad much bigger than Lord by design or will this basically make Lord tiny and almost unecessary? How did you arrive at this number and not a magnitude bigger or smaller?
2. How are you rendering the point cloud, like tiny discs in free space? When I think of point clouds I think of the typical LIDAR output renderings, but your results are continuous images, or video frames. Are the points rendered to an image plane and the result then interpolated?
And a couple more general ones:
1. Are all other depth estimation techniques other than NN obsolete now? Is there no point in estimating intrinsic camera parameters and epipolar lines when the state of the art seems to be to input one or more images into a NN and let it produce the depth for you?
2. How do you decide how the NN should look like, with various downsampling layers, convolutions, etc. I've seen people start with pretrained networks and retrain them, but how do you build a novel network based on the input and desired output?
1. I am not sure about the scale of the individual loss functions anymore, my apologies. I determined the combination of the two losses via a simple grid search and seeing what works best (plus / minus a magnitude did not make that much of a difference).
2. The points are just splatted to an image plane, more advanced point cloud rendering techniques would be better though. There is no video frame interpolation, each individual frame in the output video is a rendering of the point cloud from a different camera perspective.
3. I am not sure about multi-view stereo, COLMAP still seems like the state of the art for that. But neural networks definitely outperform classic techniques for single image depth estimation.
4. Common architectures just did not do as well as I was hoping for so I tried about 1500 model architectures. I started with an architecture that intuitively seemed right and then gradually explored / refined alterations of it. It ultimately was a lot of trial and error.
Was it the university that didn't want to release it? Are they looking at commercializing it, or how does that work? Is it available in any commercial software? It kind of looks like magic and would probably be very useful for a lot of purposes.
2. So basically each point is projected to the image plane without perspective mapping? So in 3D, the further away from the camera they are the bigger they are so they all have the same size on the image? And that prevents any seams to occur in the pixel grid as things move around?
4. Experience, intuition, and elbow grease. Kind of what I thought, but I guess it's reassuring to see an expert in the field having to try 1500 variants.
2. Yes and there are two mechanism for handling seams. First, the inpainting which extends the point cloud and can provide a higher sample rate. Second, a postprocessing step that heuristically fills in any seams that may still be present despite inpainting.
4. The downside of it is that one needs a lot of resources in order to try all of these variants, which not everyone is lucky enough to have access to.
The forward movement also helps in this case as those foreground objects grow in size (relative to the background) with time, so there's even less need for interpolation. If the movement were primarily lateral (or reversed relative to the original image), I imagine the algorithm would have a much harder time producing good results .
EDIT: After skimming the paper, it appears that the algorithm is automatically choosing these best-case virtual camera movements:
> This system provides a fully automatic solution where the start- and end-view of the virtual camera path are automatically determined so as to minimize the amount of disocclusion.
That is pretty impressive. I had originally assumed the paths were cherry-picked by humans, so it's cool that the paths themselves are automatically chosen (and that the algorithm matches the intuitive best-case scenario in most cases). It's still slightly misleading in terms of results because they mention that user-defined movements can be used instead, but of course, the results are likely to suffer significantly if the movement doesn't match the optimal path chosen by the algorithm.
 The last example shown in the full results video illustrates the issue with too much background interpolation in lateral movement: http://sniklaus.com/papers/kenburns-results
Ken Burns Effect is panning and zooming to highlight features in a photo then fading into another.
I think your results are wonderful. I was just pointing out that from my background (a television producer - most recently Shark Week, etc.) that the Ken Burns effect and parallax effects are considered two different concepts (that can be combined, but mean different things.)
- Ken Burns effect -> zoom and pan around the image.
The Ken Burns effect is a type of panning and zooming effect used in video production from still imagery. The name derives from extensive use of the technique by American documentarian Ken Burns. (Wikipedia)
- 2.5D parallax effect -> what we see in the link
Note that these two can (and frequently are) be combined, and Ken Burns himself does so occasionally.
I know a few online services that do manual (cutting in photoshop, setting on a 3D plane, etc) still photo to 2.5D parallax, e.g.:
there are Fiverr gigs doing it, and tons of filmmakers (me included when I dabble with that), do it ourselves too (which takes time in Photoshop, Premiere, etc) from time to time (for news, documentaries, and such).
In fact I am about to pay a good 100-200 to have 4-5 photos done this way for a project...
Again, your project is great. I’ve been trying to find a simple workflow to do something like this (currently it’s remove.bg and OpenCV - with a lot of tinkering.)
I would absolutely love to read that paper if it existed. Also, congratulations on the special!
I don't have any experience in any of it but I'm just offering another explanation of the discussion.
Here's the page and video explaining it: http://sniklaus.com/papers/kenburns
I find Apple’s automatic Ken Burns effect to be way too noisy and almost useless because it zooms in/out from random places in the pictures. When I manually edit a Ken Burns effect myself, there’s a lot less movement, and the movement is so much more meaningful and relevant to the video.
No it is not what is generally referred to as a ‘Ken Burns’ effect, which I think is a distraction in the title.
Because this is awesome!
Also how can I use this for a project I am literally doing production on right now??
Edit: People usually just call this parallax.
They're absolutely worth watching and gives BBC a run for its money.
Perhaps start with The West and then move on to Civil War. My personal favorite was perhaps the one on Jazz, made in 2001.
Civil War... I guess I already knew a lot about it, so what I noticed mostly was more sympathy than I'd like for a sort of Lost Cause nostalgia, and not much talk of slavery. But I guess he had to sell it for its time...
I'm 40 yo European so I knew close to nothing about the Vietnam war other than what I had seen in some movies. Very in depth and emotionally powerful. It also seemed pretty objective to me as a complete outsider.
One is in the White House. What he did in this case was unconscionable.
Unfortunately it's unavailable for streaming. Every time someone uploads it, it gets taken down.
This Peabody Award clip is all that's available. The DVD can be had for cheap on Amazon.
His Vietnam is particularly powerful since it's filled with personal accounts.
I also really enjoyed Empire of the Air, covering the history of radio.
* There needs to be some variation in the zooming if you use this in a slideshow. I'd even say bias it towards zooming out more often, since that gradually reveals more of the image.
* You might have enough information to work with here to develop a similar "focus pull effect from single image" algo. That would be really cool.
* Maybe also you could try to develop a "sunset fade" for images where you detect blue sky. Graduallu adding a gradient yellow-orange fade for the sky, gradually warmer and darker foreground and non-sky background.
* There definitely should be more variety in terms of camera paths. Our framework either supports fully automatic results (as shown) or ones with a manual camera path. For an automatically generated slideshow, one should probably add more variety.
* There is some great work in this area, commonly focusing on portrait images. But yes, once the scene geometry has been estimated sufficiently well, one can definitely add some nice out-of-focus effects.
* That would be an interesting research direction. I am not an expert in relighting, but I can imagine that this requires a lot of work to make sure that the scene composition looks believable.
MacOS has "Ken Burns" screen saver if you are on Mac. There's also plenty of examples of "Ken Burns" effect on youtube
The authors might want to change the description on their amazing effects if they want to avoid lots of threads and comments about it not being the Ken Burns effect (or maybe that will generate more comments)
Our framework takes a single still image and animates it via panning and zooming while adding 3D parallax. Does adding the 3D parallax not make it the Ken Burns effect anymore? Please let me know in case I am misunderstanding anything.
Yes, the examples that we have shown do not zoom as much as common Ken Burns examples. This stems from our framework processing the input at low resolution due to limitations in deep learning and the overall complexity of the problem. As such, there is not enough detail that could be zoomed into. This will be improved in later generations.
Specifically, the Ken Burns effect describes taking a still, 2D image and panning and zooming on the 2D image in a motion picture.
Adding 3D parallax makes it something quite different in the eyes of many of us on this forum. It is cool! We would just not use the words "Ken Burns effect" to describe what you've made.
One might even argue Ken Burns would have paired the panning and zooming with this parallax effect if he could.
I look forward to this enhancing basically all photo slideshows for a long time to come.
Googling "How to make a parallax video from 2d pictures" brings up lots of tutorials, none of which mentions "Ken Burns Effect"
Conversly searching for "How to make the Ken Burns Effect" will show that it's not the same effect your software makes
For instance, you take the source image at resolution n*n and do all processing at n/2 or n/4. Then at the moment you are going to composite the finished image you instead draw "pixels" from a source image which contains X,Y not RGB. Then you upscale the output image x2 (or x4) and replace each X,Y index with the relevant pixel from the source image (adding X%2,Y%2 or X%4 to the indexes to return to source resolution).
If you grew up watching PBS documentaries there was always a rostrum camera operator somewhere in the credits.
In the UK, Ken Morse was/is best known for this.
(Sorry for linking a million tweets but I didn't put these on a blog or anything, that's the only place they exist.)
Your version does look better though! But I was impressed I could just run random images through depth estimates and get anything cool like this.
I would love to your method take on some Escher paintings, can you try it for me?
Yeah, I love trying things that aren't supposed to work! They often do work and surprise you, or fail in interesting ways.
But only certain titles have the effect ... so I would imagine the studios provide a layered asset that can get composed together.
The background artwork is typically static, although I suspect this tool will be included in photoshop at some point at which time it will make it much easier to give this effect to backgrounds too.
Here are some specs if you're interested (there are various different docs depending on which assets you're submitting, this is just one example):
Why does the algorithm think that is part of her, rather than the background?
I thought maybe it grabbed it because it was the same color as the bouquet she is holding, but that is entirely within the white of her dress.
Every time I see an awesome landscape in person I think about whats missing from capturing that for other people to experience and I concluded it came down to depth perception and how our eyes dart around to create a composite experience resulting in ever slight shifts of depth
A 2D image cant capture that, but this seems a lot closer and its great it can use a 2D image as a base