Hacker News new | past | comments | ask | show | jobs | submit login
Film: Frame Interpolation for Large Motion (film-net.github.io)
209 points by memorable on Aug 31, 2022 | hide | past | favorite | 57 comments



Half a dozen articles on ML-based image manipulation on HN at once. Seems we're really entering into a golden age of AI-based real-world applications, at least in specific niches. Personally I'm really excited about the potential of this in design, art, movies, games and interactive storytelling. Hard to imagine what will be possible in 5-10 years from now, but I kind of expect RPG games with fully AI-generated aesthetics / graphics and stories, where only some core gameplay mechanics are still determined by the designers of the game. Really can't wait to see that.

The work described in the linked article is also extremely impressive and feels almost unreal, in any case.


> AI-based real-world applications

I don't know; I feel the real-world applications are still missing and what we are now seeing are tech demos (impressive ones!) and gimmicks. I'm still waiting to see all this ML stuff to be used in a productive context.


I personally use ML stuff in a productive context almost every day. Topaz labs software for denoising and upscaling images, Star Xterminator and Starnet for astrophotography edits. Don’t know if this counts as productive but the tools I use to analyze my chess games are also all utilizing neural nets. So if you’re not seeing useful ML applications in your day to day life it could just be that the things they’re good at don’t align well with your job or hobbies just yet.


Pixelmator Pro has a ML powered image resizer built in. I like that.

Also Retrobatch has a ML based image classifier, but I didn't try that yet.

Other than these, yet all these are impressive tech demos. We need them in production and preferably open source (model + training + process) versions.


>but I kind of expect RPG games with fully AI-generated aesthetics / graphics and stories,

I think Dwarf Fortress has the story generation part. The aesthetics/graphics part not yet..

And I think its procedurally generated, but with complex and strange results.

https://www.reddit.com/r/dwarffortress/comments/2ztnkw/i_thi...


> I think Dwarf Fortress has the story generation part. The aesthetics/graphics part not yet..

I suspect Ultima Ratio Regum is a step up from DF even in terms of procgen world lore. I mean the creator (author?) of that game wrote their PhD thesis on the subject

https://www.markrjohnsongames.com/games/ultima-ratio-regum/


I’ve been waiting for an AI that can fix the color on all the old color television footage from the 1960s, 1970s, etc.

News and sports, in particular.


That has been possible for few years now using GANs https://towardsdatascience.com/colorize-black-and-white-phot...


I'm very interested in this.

It got some weird pushback, but Peter Jackson's film "They Shall Not Grow Old" really helped make it's subjects so much more real by cutting through the limitations of the old footage from the 1st World War. Being able to apply similar techniques more cheaply and quickly will bring a lot of old footage to life and make the past much more real for the viewer.



I don’t think the pervasiveness of ML articles on HN are an indicator of anything except hype trends on certain subject matters. ML research in these spaces has been very high output for many years now.

As someone in the field of computer graphics , where there’s been considerable ML research over the past few years that are more reliably applicable to people’s lives , most of the exciting stuff doesn’t make it to the front page of HN even if it’s posted here.

There’s been lots of research in the past few years. The initial shiny stuff makes it on here, but it’s the follow up iterations that are highly catalyzing of change that don’t because public interest in those topics has waned in the interim.


Would be great if you could share your 3-5 most exciting developments that weren't discussed mich here.


My mind might have quite a bit of recency bias since I just wrapped siggraph a few weeks ago (without catching Covid!) so I’m a little scattershot. I might come back to this later to post more concrete lists but here’s what came to mind first.

I should add that my point is more that there’s not been a slow down in ML research, and the ebb and flow of interest on HN isn’t indicative of accelerated progress in the field. Simply of what things have captured the public mind.

Anyway on to the links…

Disney had a slew of papers out this year, the facial motion retargeting ones in particular are very interesting for use in production of films and “metaverse” characters

https://studios.disneyresearch.com/machine-learning/

Luma have had a lot of progress in their Nerf capture and on device representation which will likely have huge effects for e-commerce use cases among other things

https://captures.lumalabs.ai/unbounded

Nvidia released an ML based version of OpenVDB that will potentially improve effects in films, but could be huge for games

https://youtu.be/uAs8X5es1DE

There were also a ton of neural rendering papers at siggraph that I still need to separate in my head mentally since I saw them presented back to back, so I apologize for just sharing a dump

https://twitter.com/neural_fields/status/1555947856271446018...

Apple released some neural rendering content too that has a lot of implications for spatial product training

https://arxiv.org/abs/2207.13751


You started with downvotes but showed up with the links. Neural VDB looks wild - I can see games going nuts with detailed, interactive volumetrics now, with such a tiny memory footprint.


And imagine that its graphic is being generated on EVERY NEW GAME START so there won't be ever same experiences :O


Also can't wait to see where all of this will be in a couple of years!


Most likely somewhere around the bottom of the trough of disappointment: https://en.wikipedia.org/wiki/Gartner_hype_cycle


Speaking of which, is there any good ML-based superresolution algorithm out there? I'm trying to print a poster but some of my figures are in low resoltion ...


Maybe.. exciting times are back


Or it's just a hype


> Seems we're really entering into a golden age of AI-based real-world applications...

I wouldn't call moving pixels on a screen "real-world". Are these technologies going one day to have a physical effect on our lives, like, in the real real-world? I very much doubt it.


> very much doubt it

It's week two, give it two decades.

Or come and build some!

Amazon in 2000 didnt knew it would become infra for the world with AWS


Is it only me who noticed how teeth appears absolutely out of nowhere when people smile on the demo footage? And it looks not facinating. It looks horryfying.

Probably because of falling into the uncanny valley [0].

0 - https://en.m.wikipedia.org/wiki/Uncanny_valley


Indeed, this one example makes me want to eat garlic with every meal and hang crosses over all the doorways: https://film-net.github.io/static/images/000628/interpolated...

Don't get me wrong, it's an incredible feat, and seems to handily beat the other automagic interpolators (eg, 3:49 in the video at the bottom of TFA) in terms of minimizing "pop-in", but it's still clearly present in dentition.


ha, I'm glad you pointed that out. I didn't notice it at all and was actually thinking about what an amazing application that last transition was

I was going to download this thing and generate a bunch of samples to send to my family tomorrow, possibly dumping them right into the uncanny valley and being too unobservant to notice I was doing it



You can also see a lot of distortion on the left side of the frame here as ear comes into frame: https://film-net.github.io/static/images/000032/interpolated...

Would be interesting to see if this can be made more context sensitive - i.e. algorithm recognizes this as a person's head and fills in details more intelligently.


Wow, it looks like the teeth don't move with the head, uncanny indeed.



Inspired by their own Gulliver's Travels[0] example I tried it out on two frames of an anime with 15 FPS. Not quite ready for that type of animation[1], although that is to be expected since the differences in arm positions of the input frames are pretty extreme. Having said that it got a lot of other details right!

[0] https://replicate.com/google-research/frame-interpolation/ex...

[1] https://imgur.com/6GZSZSO


This feels like something that would be perfect for one man, or small team, animation studios. If this could draw the in-betweens i imagine a talented artist (which I am not) could produce films in literal fractions of the time it takes to draw every frame. If you're not happy with the result, just add another frame.


Hard to say, but this is kind of what A. 3D animation already does and B. Sort of a misunderstanding of animation.

Animated frames are supposed to convey intention. They’re fantastic at doing this since you can manipulate every detail of every frame. The idea that you’ll just run an AI through it, that might work for dialogue scenes of a typical Japanese TV anime where intention is low and mostly it’s indeed grunt work. But I would imagine it would be a bit lifeless - unless someone trains an ML specifically for anime using good animation as a reference.

Basically just moving between two frames is an example of extremely poor animation.

Source: am animator, sort of.


Isn't this what flash tweening already allowed 20 years ago? The technique here seems ideal for already-existing drawn images or photographs, but if you're drawing something from scratch you can provide a lot more context for interpolation by starting with vector data instead of raster frames.


This was my first thought too, even for large studios, even for existing media. Would be neat to see the comparison between an existing animation or stop-motion that was done at 12 fps and see it scaled up.


Whatever this is it's barely animation. It's interpolation and it's linear and far away from any thing that's animated or animation.

In an animation those two photos would be drawn and created as keyframes which would then get interpolated many ways (hopefully not linear and as robotic and weird as this).

Very interesting technology though. I could see this coming to an smartphone near you any day now. And there will be ways people animate with these tools but this isn't it.


We're going to get an explosion of indie animated shows. Will soon be possible to make as a year-long passion project what used to require $15 million and network exec buy-in.


I still can't wrap my head around how people absolutely ignore kids' rights to privacy putting their photos/videos without their consent.

I would have been pretty bummed by my teens if I found out all my life's history was there for the whole world to crawl, collect, train their ad/surveillance NNs on, etc.


Don't worry, by the time this kid's old enough to even care, he'll be unrecognizable. If it's any consolation. I cannot recognize this kid as anything other than a "kid". Good looking kid for sure, but still a kid.


No doubt that an AI could identify the same person at different ages given sufficient source material.


And thus prove that they took a bath 20 years before?

Sure you could potentially identify the kid, but nobody would ever have any reason to go through the effort.


That's the privilege rights give us: if given, we don't need to explain why we want them.


Which right are we talking about here, the right to run an image through an ai for no reason or the right to prevent an image being run through an ai also for no reason?

Privacy is not a right, it is a condition under which you have different rights. Whether that condition exists depends on social norms - for example a picture of someone in their underwear at a public locker room is very different from a picture of someone in an equal state of undress at the beach. A major factor in whether something is an invasion of privacy is the amount of effort others need to take for it to become public - you can for example have a private conversation in a public restaurant despite the fact someone could theoretically eavesdrop, it only stops being private when you start talking so loudly that there is no need to eavesdrop. Also to be considered is the likelihood of someone maliciously trying to gain information - a bank failing to shred financial documents might be a violation of privacy as someone going through their trash is a real risk; but my grandma doesn't need to shred old post cards. I would definitely consider trawling obscure websites with an ai to be in the eavesdropping/dumpster diving regime.

Coming back to the original point, yes you don't have to explain to anyone else why you are exercising your rights, but freedom isn't free and you need to be able to justify to yourself that what you gave up in exchange for your rights was worth it. Idealistic platitudes might at first glance seem comforting, but they make serious conversation impossible. At the end of the day privacy on the internet is an extremely nebulous concept, and without questioning "what's the point?" every now and then, it's easy to lose perspective.


I appreciate you going into such details, but I think we're losing the original perspective here. What I was talking about is, parents decide so many things for their children, justifiably so. This is just one thing where they could leave the decision to be taken by their child himself if and when he starts to care and understand the consequences, with the only downside being slight hindrance to pleasing their own ego.

PS: Privacy is a right, not a mere condition. In country of my residence it's meant to be protected by the constitution, so I think it qualifies.


You may not approve of it but I doubt you "can't wrap [your] head" around it.



Does anyone know if it's possible to run this on Apple Silicon GPU? I've been playing with Stable Diffusion on M1 and having fun, I'd love to be able to use this to interpolate between frames as shown in another recent post.



> synthesizes multiple intermediate frames from two input images

That's a neat use case, and definitely a good way to show off, but what about more than one image?

The overwhelming majority of video that exists today is 30fps or lower. The overwhelming majority of displays support 60hz or more.

Most high-end TVs do some realtime frame interpolation, but there is only so much an algorithm can do to fill in the blanks. It doesn't take long to see artifacts.

I would be more interested to see what an ML-based approach could do with the edge cases of interpolating 30fps video than 2 frames.


They can upsample FPS on videos (more than 2 frames), https://github.com/google-research/frame-interpolation#many-...


Actually most of the video frame interpolation programs in the market uses two frames interpolation. Theoretically, you can do a better job with multiple frames but this doesn't bring much more values beside of some extreme cases.


> Actually most of the video frame interpolation programs in the market uses two frames interpolation.

> Theoretically, you can do a better job with multiple frames but this doesn't bring much more values beside of some extreme cases.

Edge cases that require more information than is present in two frames are very common. That's why most frame interpolation methods also have an "artifact masking" feature.

But what if we did use the information from surrounding frames? That would probably be too complicated for traditional frame interpolation, but that's not what we're talking about.

What if we used a data set trained on the entire video file - or even a collection of similar video files - to fill in the gaps?


Just wait until someone releases a model trained on 10 second TikTok videos. That’s going to fascinating.


This seems like a good tool to turn <60fps videos into 60fps videos.


Yep. I'd also be interested at least in A/B-ing this against current motion interpolation methods used in televisions. Does it perform perceptually better in blind viewer tests? Does it get rid of the soap opera effect? Does it have its own flavor of "something's off about this video"? All questions I'd love to see answered.


For historical footage, I could see some use cases. For cinema, I don't know why you'd want to do this. < 60 fps playback of video that was shot at < 60 fps looks just fine. Even if the interpolation was perfect, what's the benefit?


Personally, I love 60fps+ videos. They just seem more "realistic" to me, as if the person moving on the screen was right in front of me. Ordinary 24fps is okay, but there's a certain "not real" feeling I get while watching it. It's like playing a videogame that stutters all the time.


It seems like this could be a good way to provide smooth weather / cloud animations using real or raw cloud images rather than those heat maps most apps use.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: