Hacker News new | past | comments | ask | show | jobs | submit login
Using neural networks to upscale a famous 1896 video to 4k quality (arstechnica.com)
349 points by eyegor on Feb 5, 2020 | hide | past | favorite | 158 comments

Am I the only one that finds the "improved" version much worse?

There are digital artifacts all over the place, like the steam at the beginning, the people's faces, any place there is movement.

I see that some things don't move as "jerky" but I think the original is far superior to watch.

Edited to add: Here's a link to a quick comparison image. These aren't exactly the same frame or size, but close enough to see what I'm talking about. https://imgur.com/a/A3roH54

I agree, the improved version is worse. At first glance, it looks better than the original, but then numerous artifacts become apparent. For example, check out the little girl's face. It's jumbled in the "improved" version, but very clear in the original.

It looks like most of the perceived improvement comes simply from the upgrade to 60fps, which is achieved trivially through interpolation.

I wonder if the video would have gotten as much attention had AI not been mentioned.

In most ways, yes. For me the first 5 seconds or so were fantastic. It felt like seeing a behind the scenes "How it was made" documentary where I was seeing the actors out of character. Or like it removed the sense of being from another era. Soon the artifacts and warping became distracting, and then watching the original put it back in time. Maybe it was a sense of nostalgia but overall the original seemed better to me.

Someone posted a colorized[0] version and it had somewhat "jumpy" colors that change over time. I think one problem is you need temporal consistency for this to really work properly. If you pick a color for an object, or you upscale an object a certain way, it has to stay that way over a potentially unlimited number of frames. That's already hard to achieve because neural networks are typically bad at long-term dependency, but it's even more difficult if the object moves, because it implies that the neural network has to understand object identity and movement, which requires understanding physics, and what a train or a person is to begin with. At the moment, neural networks don't have sophisticated models of the world.

[0]: https://www.youtube.com/watch?v=EqbOhqXHL7E

>...because it implies that the neural network has to understand object identity and movement, which requires understanding physics, and what a train or a person is to begin with.

I agree with most of what you said, but disagree with the need to know underlying fundamentals: most 5 year olds can throw and catch a ball without needing to understand calculus or Newton's laws of gravity.

they don't need to know the formulas or mathematical logic behind physics, but 5 year olds do have a working model of physics in their head. And AI wouldn't need to know the actual formulas either, but it would need some working model that owuld allow it to differentiate between those

That's right. It has to understand how objects move, and the boundaries between objects, what's part of a given object and what isn't, as well as how a given object will be affected by light when moving, shadows, etc. This is difficult to learn from data, even if you have a lot of it.

> It has to understand how objects move[1], and the boundaries between objects[2], what's part of a given object and what isn't[3], as well as how a given object will be affected by light[4] when moving, shadows, etc. This is difficult to learn from data, even if you have a lot of it.

Each one of the items you've listed has been severally accomplished by ML to varying levels of success (baseline being "workable", and more progress is being made.) I probably would have found better references had I spent longer than 5 minutes searching

1. https://www.youtube.com/watch?v=AGm3hF_BlYM

2. https://www.theverge.com/2019/12/9/20999646/google-arcore-au...

3. Same as 2 :-). AR Occlusion requires both. Additionally many recent phones use ML models to fake depth-of-field; boundary between foreground objects has to be detected

4. Also see 2; Augmented Reality is hard. bonus: https://www.awn.com/animationworld/how-klaus-uniquely-combin...

I work in deep learning research. In my opinion, pose estimation with one human centered in a frame is nowhere near the level of difficulty of constructing an accurate model of a complex a scene with multiple people and moving objects of various sizes. For instance, it wouldn't be so hard to detect that there is a train in this video, but to do upscaling correctly, you need a 3D model of this specific train. Yes, there exist deep learning models that try to do 3D reconstruction, and so far, I haven't seen anything that works robustly. Combining this information with an upscaling model is another challenge. It's not as easy as just plugging components together.

The same approach is being used to enhance the textures of older games, ranging from Doom, Witcher 2, Final fantasy 7, and others. People have turned them into mods [0,1].

I actually showed a few examples on day one of the deep learning course I'm teaching this semester as an emergent application of the technology.

[0] https://www.pcgamer.com/heretic-looks-stunning-with-neural-n...

[1] https://www.pcgamer.com/AI-upscale-mod-list/

See "AI Neural Networks being used to generate HQ textures for older games (You can do it yourself!)" https://www.resetera.com/threads/ai-neural-networks-being-us... (Dec 2018) - people create it for old games, it is easy to set in on one's own computer, or on Colab.

Also, super-resolution is being used not only to statically enhance textures, but also - work on the flight. See Atomic Heart RTX Tech Demo, with ray-tracing and deep learning super-sampling (DLSS) https://mundfish.com/en/rtxtechdemo/.

I'm pumped about the Dark Souls Remastered up-scaling! Thanks for the links.

There are a number of speed-corrected vintage film clips on Youtube. Here’s an example from late-1890s France:


While not upscaled, they are still pretty impressive, and interesting to watch.

Somewhat OT: Can someone enlighten me why exactly we need to "speed correct" them in the first place?

It always struck me as odd that those old b/w clips were often shown playing too fast on TV. Isn't it more like they were simply recorded at lower fps rates? Then how can you call this speed correcting, when you merely just played them at the wrong frame rate before, just because TV is 25/30/50/60/59.4i² frames/fields per second and you didn't want to convert it in any way. And then suddenly a few years ago these speed corrected versions started to pop up on the internet, like "hey we just fixed the past, people didn't actually move twice as fast back then!"

I guess I just don't like the term, but maybe there is more to it.

“ While the illusion of motion works at 16 fps, it works better at higher frame rates. Thomas Edison, to whom we owe a lot of debt to for this whole operation (light bulbs, motion picture film, Direct Current, etc.) believed that the optimal frame rate was 46 frames per second. Anything lower than that resulted in discomfort and eventual exhaustion in audience. So Edison built a camera and film projection system that operated at at a high frame rate.

But with the slowness of film stocks and high cost of film, this was a non-starter. Economics dictated shooting closer to the threshold of the illusion, and most silent films were filmed around 16-18 frames per second (fps), then projected closer to 20-24 fps. This is why motion in those old silent films is so comical, the film is sped up.”


Another take on film speeds says that films were projected at a variety of speeds, depending on the original filming as well as the economics of fitting multiple showings into a schedule. https://web.archive.org/web/20110724032550/http://www.cinema...

Old film was less sensitive, so it needed longer exposure time. Film was therefore cranked slowly to avoid under-exposure. During playback, it was generally preferred to run the film slightly quickly than to have a lower frame-rate (which would have made motion jerky)

It doesn't really appear that jerky after being slowed down. My taste apparently differs from people's in the past, since I'd rather have it that way than sped up...

Early film cameras were hand cranked, so the actual speed of the film was all over the place. I assume it was just too much of a hassle to manually correct the speed of these films, so they didn't bother.

It wasn't all over the place (in professional productions). Rather, it was an important part of the craft of cinematography to adjust the speed of cranking to fit the mood of a scene. Experienced cinamatographers would vary the speed even during a take, envisioning how the film would look when projected at standard speed - slower crank, faster movement, and vice versa.

But that does make it all over the place even if intentional

Probably in the same way that a visitor from the past would find the volume all over the place in a modern movie, or the lighting.

You put a nice twist on it and I almost agree. I think "all over the place" is more closely related to ideas like disordered, indiscriminate, every which way, unsystematic -- you could intentionally create a work that looks unsystematic, but that's not the case for the people and films we were discussing.

I agree that all over the place doesn't imply intentional or not. In this case all over the place had a human artistry twist and the final product as a piece was not all over the place but the framerate itself was.

As an anecdote, I remember when I was a kid (early 80s) my uncle who collected old cameras and projectors exchanged with me an old hand cranked pathe-baby projector for a solar powered calculator. The projector was working properly and had a few films cassettes and they were supposed to be played back at variable hand cranking speeds according to the scenes. It was an interesting experience to play those films back to my family in the only room that had absolute darkness, in the kitchen.

That's a great anecdote, and now I will not be satisfied until I too own a Pathé-Baby projector. The curse of knowledge.


If I remember correctly you could go backwards too. The film would collect a glass covered container inside the projector's body and when the film was over you had to crank it back inside the cassette.

I feel like others have probably replied well enough, but to add: you seem to be under the impression that they are only "too fast" because modern TVs play them at the wrong frame rate. On the contrary, contemporary audiences in theaters also saw them move "too fast."

This was a result of both the mechanics of the time, and the style, as other people have posted.

"Speed correcting," therefore, is simply an attempt to view these at natural speeds, even though this may not have been the expectation (or intent) of the filmmakers.

Peter Jackson made a whole ww1 movie using these techniques (documentary).

Totally unrelated but I'm curious if anyone else experienced this phenomenon.

When I was a kid, people in old pictures and movies(say, pre 1920s) looked genuinely strange. The structure of their faces, the shape of their bodies... it just looked really weird to me. Now that I'm older, I find people in old media very familiar, like I know people who look enough like them and I can suddenly relate to them on a new level.

I'm guessing my younger brain just hadn't seen enough faces yet. In any case it always struck me as odd.

"Middle age is when you've met so many people that every new person you meet reminds you of someone else."

- Ogden Nash

How does he define old age?

The vast majority of his work was humorous poetry, but on rare occasions he got serious. Since I like this one and it's sort of appropriate to the subject, I'll share it:

    Old Men
    by Ogden Nash

    People expect old men to die,
    They do not really mourn old men.
    Old men are different. People look
    At them with eyes that wonder when...
    People watch with unshocked eyes;
    But the old men know when an old man dies.
(Copy/pasted from https://westegg.com/nash/, which has several other good ones.)

When everyone you meet reminds you of someone dead?

No one, even the people you do know, you no longer recognize.

Unlike sibling comments, I don't think it's a diversity thing. I think it's just an age thing: Adults look different from kids, and as a kid you (and I, I experienced the same thing) were probably almost entirely around other kids in school. Most people recorded are adults, so they looked odd compared to child proportions, but look normal now that you (and I) are mostly around other adults.

My hypothesis: it wasn't really the structure of faces etc. that looked weird to you.

You were just baffled in general by the weirdness of watching such an old footage, and "blaming" this on what people looked like was a way your brain rationalized this elusive sense of weirdness away.

When we're adults, we've seen our share of old pictures, the sense of wonder weakened, and so the feeling of bafflement is gone. And along with it, the need for rationalizing it off.

I second this. When I was young, these were obviously people from a different epoch, quite like from a far away continent. Now, this impression has vanished. Where it does hold up, are differences in gait (e.g., in the S.F. Market Street film).

Unrelated, I find it fascinating in this early films how people are not reacting to the camera. They may notice it or even look at the person operating this curious contraption, but they are not reacting to it (as being aware of the gaze). This is a quality totally lost, ever since.

I think part of it is that people back then we're far more likely to not have diverse ancestors- the people in the OP video probably all had parents from the same small town, very different from our modern era. This probably enhanced a lot of distinct body and facial features that have mostly washed away in modern times. However, as an adult you've met enough people to still be able to recognize some of them.

Or, the procedural content generator of the simulation has gotten significantly better.

Culture transforms us

I am an author of this video, you can feel free to ask anything – I covered algos that I have used in video description, you can find them there

Is there any way you can upload the 4k original video somewhere downloadable? Happy to mirror it for you if bandwidth is an issue. The YouTube compression is terrible and doesn't really do it justice.

Why did you crop it? 25% of the picture is gone.

This is an age old question when converting 4:3 footage to 16:9. Do you keep the original aspect ratio (OAR) in the converted footage to keep the purists happy (small percentage of audience), or do you fill the frame with as much picture as possible to avoid mattes to keep the average content consumer happy (much larger % of audience)? At the end of the day, the producers will make the decision that they feel will please the largest group of viewers.

I have personally been involved with these types of projects. The purists are the smaller section, but they are much more vocal on the internet. While reading forums and tweets and other soapboxes, these purists will eviscerate you publicly. However, using other metrics (sales figures, independent surveys, etc), the numbers show that the overall product is well received. There are so many people online that believe they know everything, but know nothing; yet these people are the most vocal. I was once accused of uprezing existing video tapes rather than re-scanning the film like the company claimed. Never mind that I was personally carrying the film to the transfer house. We took pictures of the employees with the film, took shots of the film on the shelves in the vaults, and wound up making "making of" type documentaries showing the process of the film on the telecine, interviewing the colorist transferring the film, showing the artists doing the restoration, and the sound department working with the audio from the original optical track, etc. Still, the know-it-alls kept on yelling.

Are there any merits to removing 25% of a film, other than higher sales?

Your choices are to keep the OAR as 4:3 so that you have ~240 pixels of pillar bars on the left and right of the image (for HD), or crop into the image to make it full frame. The average viewer tends to not like mattes of any type. They tend to prefer a full frame image to fill up the pixels on their screen. Because of this, producers will make the decision to crop into the picture to get a full image.

The same decision happens when making a 16:9 (1.78) image from content that was shot in the common 2.35/2.40 aspect for feature films. To get those images to 16:9 so the image is full screen requires cutting of pixels from the sides of the original. Content shot at 1.85 requires even less cropping. Even true 4K content is cropped slightly to fit the 16:9 aspect of UHD. It was even worse when television was only 4:3. Content in 2.35/2.40 would crop so much of the content, a shot of 2 people could remove one of the people from the frame entirely. They would use the Pan&Scan technique to slide the image around in the frame to reveal the 2nd person. This was one of the reasons for the slate at the beginning of a movie saying that the video had been reformatted to fit the screen.

Video containers (or the stream itself) should really have a "center of frame" coordinate for each frame, and let playback devices crop based on that center. Maybe this already exists in some standards and im unaware.

For digital files, ideally, the file only contains active pixels so that the file is the frame size of the image. However, for video formats with a clearly defined standard (TV/Broadcast/DVD/Blu-ray/etc), the video frame size must be what is defined in the spec. This is why the mattes exist to fill in the space to make the content fit the space required.

If some people want to crop video to fit different screen aspect ratios, it would be nice for the frame to advertise its "center" so the playback device can apply the appropriate cropping and matting.

Maybe I have my player set to "3:2 full screen, up to 30% crop.". It's not something I would do personally, but it would be better than pan and scan being something implemented in the encode itself.

> Your choices are to keep the OAR as 4:3 so that you have ~240 pixels of pillar bars on the left and right of the image (for HD), or crop into the image to make it full frame.

That's not true. You could keep the original aspect ratio and simply let the player decide. That way you aren't removing any information, and you aren't encoding redundant black bars into the video.

The YouTube player is perfectly capable of filling all available space it is given.

I did it just to be more useful to watch on mobile phones :)

I was wondering if you tried to run something like warp stabiliser on it first to avoid the wavy effect?

I wanted to see what could happen with a video without any manual editing, wavy effect could be removed in after effect, it is not a hard task, but just a question of a manual work :)

Do you plan on upscaling the Patterson Bigfoot footage? (Kinda joking... kinda not).

I don't know why this is being downvoted. I'm interested to see if a zipper could be noticed!

And what about the Zapruder film?

There's a sort of swimming/breathing effect in the video that's nauseating to me in the same way that TV motion smoothing is. It looks like artifacts from trying to perfectly crop the video when the source film likely has become distorted over time, but it's hard to watch.

Could be the frame rate, I feel a similar way when I first started watching 60fps. Now I feel like it’s a more substantial jump than 4K->8k

Rolling shutter is one possible explanation of "wavy" effects if they digitized cinematic footage improperly. However, in this case I think we see aliasing effects of static Gigapixel AI.

You can see the problems in the side-by-side clip at the end. A guy walks through the divide and the upscaled side of his body does not move with the un-enhanced side of his body.

Come on guys, check the source video first.[0] (very end)

Tldr: the waviness is actually part of the film. This is common on old movies actually.


I wonder if we could use deep learning (or even something simpler) to correct for this waviness.

Yes it did seem weird.

For this very movie it would be presumeable quite trivial to correct the varying frame rate caused by the hand cranked camera with the help of the quite stable deceleration of the train and it's position to determine a corrected timestamp for each frame and then interpolate from there.

It's fair to assumem that at the time, a skilled presenter knowing the material was able to compensate for that too to some extent, since the Cinematograph was camera and projector in one, both manually operated. Stable frame rates were simply not a thing in earliest film.


Here is a colorized version:


Not as convincing when clothes change colors between frames

Makes it feel like a psychedelic trip without having to take anything though, so that's a positive. Maybe.

I think the framereate is what makes it feel more vivid. I didn’t experience that big of a difference with regards to resolution.

Yeah, it seems the NN in this particular implementation just does edge sharpening (almost like a "dumb" algorithm), the interior of textures and faces etc. still seem to lack detail.

Right. And it makes some things worse. In the original the lady who runs across at about :27, you can make our her nose and mouth (barely). Her again in the upscaled version at :38 (it's hard to stop it at the exact same spot), she is virtually featureless.

Can people in this thread genuinely not see a difference between the before and after or are you just being contrary? The change in resolution should be clear enough, but the FPS change is amazing and gives a great sense of presence to me when watching.

There are some jarring artifacts, such as the ladies' hats glitching at 10sec, but these artifacts are in the original and simply become clear in the upscaled version. It's very impressive.

Side note, They Shall Not Grow Old is just stunning, with its restoration pipeline. I can't wait until all media can be updated in such a fashion, and I assume it's just a matter of time before e.g. YouTube and Spotify have filters to offer restored VHS footage, old phonograph recordings, etc

I honestly couldn't see much difference. I did not study it in detail to find the differences, just a casual view of both clips. I did look at the comparison clips at the end of the 4K version but still didn't see much of a difference.

A problem may be that the source material is simply too high quality to begin with? I watched the original first and thought "wow, 1896, this enhancement is great", only turns out that was the original clip. They should have chosen something that has more room for improvement (which would also give them a lot more interesting material to choose from).

> A problem may be that the source material is simply too high quality to begin with? I watched the original first and thought "wow, 1896, this enhancement is great", only turns out that was the original clip.

Funny, I did the same

One thing I've never understood is why people keep playing back old film at too high a frame rate. It seems that the film cameras of the day recorded at a lower-than-expected framerate, and the playback machines played at a higher frame rate, resulting in the telltale weird effect of everyone moving too fast. If they'd recorded audio along with it, everyone would be talking like chipmunks.

Correct. Hand-cranked cameras are obviously extra-problematic (though fun to work with), but the first automatically cranked cameras worked at 18fps. Most broadcast systems work at ~30, 25, or 24 fps (and there's a system for doing 24fps playback on 30fps TV). Retiming video used to be ridiculously tedious/labor-intensive, and it's only in the last 5 years or so that it's become sufficiently easy to pull it off and have it look good.

Well, before everybody had frame interpolation, early 16FPS film tended to be copied to new film run at 24FPS. There were horrible hacks like "3-2 pulldown", to convert 24FPS film to 30 FPS TV. (30 FPS TV is 60 interlaced half-frames, so one 24FPS frame was shown for 3 half frames, then the next for 2 half-frames. This was done mechanically in the projector. Then that got copied to VHS...

Most of the footage we're talking about is before synchronized sound, so speech doesn't really come into the equation.

But, as others have noted, adjusting film speed today in digital may just be a matter of hitting a few keys. However, with physical film stock, it's a lot harder. Something you might do for the restoration of a famous film, but not something you're going to do routinely.

Are there techniques to ensure temporal consistency when performing this kind of upscaling? Currently it seems like each frame was upscaled independently leading to weird artifacts.

> it seems like each frame was upscaled independently leading to weird artifacts

Yep, they used an off the shelf nn (gigapixel) to upscale each individual frame. You could try to achieve your proposal using a guassian convolution windowed in 3d across frames. I suspect it may make a mess of the video's framerate, but the motion appears smooth enough that it might work.

It’s not as if the source material’s got a steady frame rate, there’s a lot of places where it jumps about 2-4 frames.

Which is quite possibly due to lapses in the hand-cranking of the film through the camera back in 1896, or maybe due to frames getting lost when the printed reel of film got damaged and spliced in the hundred and twenty-odd years between then and now. Possibly both, one could probably work out some of the cranking speed variations by looking at how the train’s frame-to-frame motions change, at least until it’s come to a stop.

I like that the two sibling comments suggest "train a model" and "model a train", respectively.

You could also train a model that conditions on the previous frame in addition on the surrounding pixels.

That would just create a slightly better onion skinning of overlayed frames, it wouldn't align them.

At 35 seconds, the woman running with the child in the direction of the camera, her face is almost completely blurred out. If you look really hard you can see an outline of her nose and what not but overall it is interesting to see how the neural net works in this region of the video.

Not surprising, but still drives the point home that good ML/AI algorithms have a bottleneck at the input data.

I think she is wearing a kind of veil, as are several other women.

I thought it was a veil also. This was France, so it could have been someone from one of their colonies at the time?

Old French (and Italian and Portuguese and etc) women commonly wear veils/babushkas, even today.

They're wearing a veil that covers the entire face though. Eyes included- I assume a very thin see-through veil.

Of course, sunscreen will be invented only in 1936!


I would question the "even today" part..

it doesn't seem so in the original video for me

Has anyone else experienced this: when watching old movies (or just black/white movies), it feels uncomfortable for only the first few minutes, and then you completely forget about the bad image quality. It's like the brain performs the upscaling/coloring for you.

I'm colour blind, and strangely when I start watching something in black and white, I don't know it's in black and white! - I only realise when someone else mentions it, and even then it takes a while for me to see it.

It's weird, like my brain is automatically filling in colours that aren't there.

I remember experiencing this with the stop-motion movie 'chicken run' from 2000 [0].

The movie has a rather low frame-rate (due to the stop-motion production) and when I started watching my immediate thought was: I can't watch this, this is terribly stuttery. But after about 10 minutes or so my brain adjusted and I no longer noticed the stuttering.

[0] https://en.wikipedia.org/wiki/Chicken_Run

What makes the difference for me is the sound quality.

If the sound track is synced to the video well, I don't care too much about the video quality/depth/colors/contrast.

Can someone point at specific things to see the upscaling at work? The two videos look the same to me, both when taken as a whole and when zoomed in to compare the same points in the videos...

Look at the ground in the thumbnail, for example, or the hill, for that matter, as well as the building in the background. And the fence, too.

The details are much more distinct in the upscale.

It actually surprised me, I was just expecting it to effectively be a de-speckle with edge sharpening, but it's exceeded my expectations.

It's sharper, but it's the type of sharper that happens in a dream, look at the text on the left building (quonset hut with a hole in the side). The text flutters as the net fever dreams context from it. Like a dream it has more detail the more you look but it also isn't real. It's like how I can read shit even why my eyes are very dry or it's beyond the extent of my vision, it's tossing it's guesses into a viewable object, but it's just extrapolations, like how I can tell the score in a nfl game across the bar, I know the game isn't 88 to 9, it's likely 28 to 7.

This doesn't make it any less cool, just putting it in context.

Zoom and enhance away.

The inference resolves to a high-detail and plausible result, but not necessarily one that has any fidelity with the ground truth. Which is somewhat like a dream, in ways.

Additional information -- say, a close-up image of the quonset structure's signage, or even knowing what the text was, might result in additional detail. A method that was specifically designed for extracting textual information out of blurred images (several exist, and they're frighteningly accurate -- any image-hiding based on only pixelation or blurring without additional noise introduction is not particularly secure) might also be able to identify a text from the original film source.

dream is a very apt word to describe this upscaling process. the AI just dreams up detail where there's none in the source material.

Using Deep Learning models trained on static images is usually not a good idea. But one can simply extend the model to use 3D convolutions instead of 2D, use time as 3rd axis and then feed a sequence of images for training, getting much better results out of it without the wavy effects.

And have the amount of training data and time scale from n^2 to n^3... That's why nobody has achieved good results with this yet...

Oh yes, but that might be comparable to attention-based models for NLP, complexity-wise. Using C3D is quite common since 2015 actually:



Training a great model is expensive these days, FB routinely pays 6 figures per training run.

I wonder if a similar technique could be used to turn old 4/3 videos to widescreen just by learning the surroundings of a scene through other shots, then mapping them back and forth to build a complete widescreen scene on all frames.

I've watched a few movies in what's called ScreenX[0]. It's a cinema experience where they project the movie on the main (front) screen, and project the surroundings on the two side walls. The result is an immersive viewing experience. Obviously the film is not recorded in the ridiculously wide aspect ratio, so usually the side images are computer generated. For landscape imagery, this is fairly seamless. You can't really tell that the images aren't real. However, during a scene with people, the side walls projected static images that drifted around slightly (as if they were cardboard cutouts). I guess the idea is that most things happening in your peripheral vision are lost anyway, so no need to make them detailed.

[0]: https://www.cineplex.com/Theatres/ScreenX

Off the top of my head a reverse content aware type system maybe could work.


Depending on the specific shot, you can borrow data from frames before/after the current frame rather than just flat out recreating from nothing. For instance, a shot where the camera pans left to right you can borrow frames from later in the shot to fill in the content on the right half of the image. Or a shot that zooms in/out will have actual footage from the scene that can be used as well. This kind of technique is used frequently for other things as well like when removing objects from a scene or even fixing/restoring film with scratches, holes, dust, etc.

Honestly, I watched the original and the 4k and don't really notice any big difference. The wider aspect ratio of the 4k is more pleasant, but this has nothing to do with the enhanced resolution.

Have not you notice FPS boost?

The aspect ratio looks like it was just accomplished by cropping.

A thought I had while reading about this was what is the limit to upscaling via this method? By that I mean if one where to just keep upscaling and had a monitor capable of displaying the image, how long until the image became unrecognizable? Surely upscaling an already upscaled image would result in some strange artifacts.

You'll end up seeing more and more of the training set. You might zoom in on a newspaper on a desk through a window across the street reflected off the windshield of a car and get perfectly legible text, but it's just going to have whatever lorem ipsum headlines were used to train the net.

That makes sense. Thanks.

There's no limit when you upscale with neural nets. Make one good enough and it can fill in the blanks properly.

> Make one good enough and it can fill in the blanks properly.

I would say it can fill in the blanks likely or plausibly. But one can't guarantee that it is the proper or real content, since the neural net is imagining / inventing the upscaled version.

Sure. I'm not saying it will represent reality 100%, but it could be good enough that you probably couldn't tell the difference.

Current neural nets might perhaps not be able to upscale with no limit and still make it look good, but it's not like it can't be done. A good human artist could take a small grainy image and given enough effort draw an upscaled version that would be a very reasonable representation of what a larger image would have looked like, there's no reason to believe that artificial neural networks can't do the same.


Could this be used to generate "missing" frames in stop-motion animation as well?

For example for this sequence from "The Terminator" (1984): https://www.youtube.com/watch?v=wSXl_XKXZAI

There is a user-friendly mature software for this already (which technically did not even use ML in the recent sense because it is much older): SVP project. You can connect it to many players and play any video of your choice in realtime. Then there are high quality tools in video editing software like Premiere or Davinci Resolve which do this also (pixel flow framerate adjustment)... Then there is hardware accelerated version of this in NVIDIA graphics cards which can be used in some video players too...

This is mostly a solved problem, it has existed for many years, no real need for ML here. (Possibly for some quality improvements?)

I can't help to think that we will -one day, probably soon- have enough 3D face models to just replace those in these movies with some that will fit the original version.

The resulting faces won't be an exact match due to the low quality of the original, but unless you knew these people, no-one will be able to tell.

Same with pretty much every other aspect of the scene: have good models that fit the original scene and re-create a full 3D version of the whole movie, at any level of definition.

Once this process can be automated, we'll be able to recreate all the classics, shot for shot.

It has become pretty standard in Hollywood where at the beginning of a production, the actors have headshots taken from various angles specifically for use in post. Enough actors have died during production to make this necessary to complete the project. I wouldn't be surprised if it has become a mandatory practice insurance companies require (this is pure speculation on my part) to minimize payouts.

For what purpose?

VR and re-editing come to mind.

That's a lot worse than the original.

I agree. The upscaled version is both more smoothed and more mottled. This matches the "look" of a lot of low bitrate HD video, but I would argue that information has actually been lost in the upscaling process, not gained. It would be much easier to tell with before / after screenshots, I think, but at any rate I don't see a lot of reason to hope neural net upscaling will be useful for many "real world" purposes.

The HFR was pretty convincing to me, however.

I agree with your points, but I'd also argue that Nvidias DLSS implementation is a real world purpose, no?

That's a great point. Using approaches like this as a less computationally complex method of resampling for the purpose of anti-aliasing (and not upscaling) seems worthwhile, definitely.

I was thinking more of attempts to create neural nets that map from e.g. the set of 480p images to the set of 1080p images. The "best case" results seem to be trained on low bitrate HD video, which gives results that "look good" to many people (especially those who grew up watching Youtube) but in terms of real detail are worse than a simple upscale (with e.g. Lanczos). I haven't yet seen results where content aware upscaling provides a real improvement over "dumb" algorithms for this purpose.

You can see comparison at the end of the video

You're right. Unfortuate that the comparison clip is so short, and there's really no replacement for being able to flip between individual upscaled frames. From that very limited comparison, I think I still prefer the original.

Math upscale'd it, and my internet connection downscale'd it. I suppose it was math on either end.

I think it's great we're developing this capability, but I prefer the original-- we're seeing the film as it was originally seen then. The captured scene isn't made better by higher fidelity, it's the opposite for me.

As with most advancements in technology and art, we'll see a proliferation of people using the new technique with mixed results at first. It's the shiny new toy. Then over time, people will figure out how to use it and apply it with intent.

The upscaling projects we've seen in gaming have generally produced spectacular results, probably because the intent in gaming graphics has always been for the sort of pixel density we're finally able to start achieving. Remember the huge disparity between the box art and the game? lol.

Gaming aside, I don't think we've left the toy phase just yet.

Seeing this little girl, long born and dead, jumping and smiling got me quite emotional...

Watching old movies is always a very melancholy experience for me for this reason. To think that everyone in the film you are watching is dead, when they seem so alive before your eyes.

(Interesting thing I read once about the Hanna-Barbera cartoons with the canned laugh track - it was using some old recording that had been knocking around since the 20s. So if you watch them now, the whole action is ringing with the laughter of the long dead. Now how's that for a macabre thought!)

It states that humanity's standards for video have risen in 125 years.

Viewing the film, it seems that the standard for clothing to wear whilst going outside has done the opposite, quite drastically.

And if there was sound, you could argue the same with the quality of the language.


From a linguistic perspective this couldn't be further from the truth.

We only perceive older language as more polite, but it doesn't mean it was.

What matters is the level of rudeness perceived by the receiver at the time. If someone was called a 'Scoundrel' they might have been shook while today it's a laughable insult.

I completely disagree. If you look at speeches or interviews from the 30s or the 40s, in French or English, the quality of the spoken language is significantly better than today. Sentences are better constructed, richer vocabulary. And if you look at XVIII french literature, it is night and day with anything written today.

> If you look at speeches or interviews from the 30s or the 40s, in French or English, the quality of the spoken language is significantly better than today... And if you look at XVIII french literature

There's probably some selection bias there. I'm not disagreeing there might have been a reduction in vocabulary, etc, but we tend to see older (usage) words as more erudite as well.

Compare the vocabulary/grammar on social media with current newspapers or a book and there's a big difference as well.

You scoundrel!

Call me crazy but I prefer watching the original

It is common in video player software these days to use neural network based filters to upscale and downscale, eg see https://artoriuz.github.io/mpv_upscaling.html

If we're killing garbage in garbage out, that's going to take us some weird places.

The upscaled version looks worse to me. Seems like it's lost information, not gained.

Strange to think none of these people knew what was coming their way 18 years hence...

This was filmed at La Ciotat, on the southern coast of France. This region actually survived WWI relatively unscathed. It did see a bit more action in 1944. But the war would have been a major event in these people's lives, even if the location wasn't directly in the middle of it.

The town did not see any action but, as all towns in France, all fit men were sent to the front lines, and many never came back, as illustrated by the monuments listing the deads of WWI and WWII in all French towns and villages.

See also: "A Trip Down Market Street," filmed just 4 days before the 1906 San Francisco earthquake.


Oh this makes me happy. Im stoked with the 4k results. The gravel under the tracks looks so good. I feel like it's a definite improvement and I know it's only going to get better.

Tangentially related, a real-time video upscaler: https://github.com/bloc97/Anime4K

IMO in the v1.0 comparisons, v0.9 is sharper in every case.

Yeah .9 has more aliasing artifacts but has much better contrast and is easier on my eyes.

Edit: Smooth gradients are much better in v1.0 though, look at the background differences here: https://raw.githubusercontent.com/bloc97/Anime4K/master/resu...

Anyone know of similar projects for audio? Music within a genre tends to be similar enough that you ought to be able to upsample. I tried doing this once myself but got nowhere.

I’ve seen lots of old sports and news footage from the 1970’s that looks bad. The color looks particularly bad, for example. It would be nice to automatically fix that.

It may appear higher resolution but I don’t enjoy looking at the extra details knowing that they are probably fictitious.

This is pretty awesome!

But those dresses the women wore look awfully hot an uncomfortable.

Not least if you consider than it's very sunny and 30+ degrees (C) during summer in that area.

We can finally csi pictures then!? Yeah

Applications are open for YC Winter 2022

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact