There are digital artifacts all over the place, like the steam at the beginning, the people's faces, any place there is movement.
I see that some things don't move as "jerky" but I think the original is far superior to watch.
Edited to add: Here's a link to a quick comparison image. These aren't exactly the same frame or size, but close enough to see what I'm talking about.
It looks like most of the perceived improvement comes simply from the upgrade to 60fps, which is achieved trivially through interpolation.
I wonder if the video would have gotten as much attention had AI not been mentioned.
I agree with most of what you said, but disagree with the need to know underlying fundamentals: most 5 year olds can throw and catch a ball without needing to understand calculus or Newton's laws of gravity.
Each one of the items you've listed has been severally accomplished by ML to varying levels of success (baseline being "workable", and more progress is being made.) I probably would have found better references had I spent longer than 5 minutes searching
3. Same as 2 :-). AR Occlusion requires both. Additionally many recent phones use ML models to fake depth-of-field; boundary between foreground objects has to be detected
4. Also see 2; Augmented Reality is hard. bonus: https://www.awn.com/animationworld/how-klaus-uniquely-combin...
I actually showed a few examples on day one of the deep learning course I'm teaching this semester as an emergent application of the technology.
Also, super-resolution is being used not only to statically enhance textures, but also - work on the flight. See Atomic Heart RTX Tech Demo, with ray-tracing and deep learning super-sampling (DLSS) https://mundfish.com/en/rtxtechdemo/.
While not upscaled, they are still pretty impressive, and interesting to watch.
It always struck me as odd that those old b/w clips were often shown playing too fast on TV. Isn't it more like they were simply recorded at lower fps rates? Then how can you call this speed correcting, when you merely just played them at the wrong frame rate before, just because TV is 25/30/50/60/59.4i² frames/fields per second and you didn't want to convert it in any way. And then suddenly a few years ago these speed corrected versions started to pop up on the internet, like "hey we just fixed the past, people didn't actually move twice as fast back then!"
I guess I just don't like the term, but maybe there is more to it.
But with the slowness of film stocks and high cost of film, this was a non-starter. Economics dictated shooting closer to the threshold of the illusion, and most silent films were filmed around 16-18 frames per second (fps), then projected closer to 20-24 fps. This is why motion in those old silent films is so comical, the film is sped up.”
Another take on film speeds says that films were projected at a variety of speeds, depending on the original filming as well as the economics of fitting multiple showings into a schedule. https://web.archive.org/web/20110724032550/http://www.cinema...
As an anecdote, I remember when I was a kid (early 80s) my uncle who collected old cameras and projectors exchanged with me an old hand cranked pathe-baby projector for a solar powered calculator. The projector was working properly and had a few films cassettes and they were supposed to be played back at variable hand cranking speeds according to the scenes. It was an interesting experience to play those films back to my family in the only room that had absolute darkness, in the kitchen.
This was a result of both the mechanics of the time, and the style, as other people have posted.
"Speed correcting," therefore, is simply an attempt to view these at natural speeds, even though this may not have been the expectation (or intent) of the filmmakers.
When I was a kid, people in old pictures and movies(say, pre 1920s) looked genuinely strange. The structure of their faces, the shape of their bodies... it just looked really weird to me. Now that I'm older, I find people in old media very familiar, like I know people who look enough like them and I can suddenly relate to them on a new level.
I'm guessing my younger brain just hadn't seen enough faces yet. In any case it always struck me as odd.
- Ogden Nash
by Ogden Nash
People expect old men to die,
They do not really mourn old men.
Old men are different. People look
At them with eyes that wonder when...
People watch with unshocked eyes;
But the old men know when an old man dies.
You were just baffled in general by the weirdness of watching such an old footage, and "blaming" this on what people looked like was a way your brain rationalized this elusive sense of weirdness away.
When we're adults, we've seen our share of old pictures, the sense of wonder weakened, and so the feeling of bafflement is gone. And along with it, the need for rationalizing it off.
Unrelated, I find it fascinating in this early films how people are not reacting to the camera. They may notice it or even look at the person operating this curious contraption, but they are not reacting to it (as being aware of the gaze). This is a quality totally lost, ever since.
I have personally been involved with these types of projects. The purists are the smaller section, but they are much more vocal on the internet. While reading forums and tweets and other soapboxes, these purists will eviscerate you publicly. However, using other metrics (sales figures, independent surveys, etc), the numbers show that the overall product is well received. There are so many people online that believe they know everything, but know nothing; yet these people are the most vocal. I was once accused of uprezing existing video tapes rather than re-scanning the film like the company claimed. Never mind that I was personally carrying the film to the transfer house. We took pictures of the employees with the film, took shots of the film on the shelves in the vaults, and wound up making "making of" type documentaries showing the process of the film on the telecine, interviewing the colorist transferring the film, showing the artists doing the restoration, and the sound department working with the audio from the original optical track, etc. Still, the know-it-alls kept on yelling.
The same decision happens when making a 16:9 (1.78) image from content that was shot in the common 2.35/2.40 aspect for feature films. To get those images to 16:9 so the image is full screen requires cutting of pixels from the sides of the original. Content shot at 1.85 requires even less cropping. Even true 4K content is cropped slightly to fit the 16:9 aspect of UHD. It was even worse when television was only 4:3. Content in 2.35/2.40 would crop so much of the content, a shot of 2 people could remove one of the people from the frame entirely. They would use the Pan&Scan technique to slide the image around in the frame to reveal the 2nd person. This was one of the reasons for the slate at the beginning of a movie saying that the video had been reformatted to fit the screen.
Maybe I have my player set to "3:2 full screen, up to 30% crop.". It's not something I would do personally, but it would be better than pan and scan being something implemented in the encode itself.
That's not true. You could keep the original aspect ratio and simply let the player decide. That way you aren't removing any information, and you aren't encoding redundant black bars into the video.
The YouTube player is perfectly capable of filling all available space it is given.
And what about the Zapruder film?
Tldr: the waviness is actually part of the film. This is common on old movies actually.
It's fair to assumem that at the time, a skilled presenter knowing the material was able to compensate for that too to some extent, since the Cinematograph was camera and projector in one, both manually operated. Stable frame rates were simply not a thing in earliest film.
There are some jarring artifacts, such as the ladies' hats glitching at 10sec, but these artifacts are in the original and simply become clear in the upscaled version. It's very impressive.
Side note, They Shall Not Grow Old is just stunning, with its restoration pipeline. I can't wait until all media can be updated in such a fashion, and I assume it's just a matter of time before e.g. YouTube and Spotify have filters to offer restored VHS footage, old phonograph recordings, etc
A problem may be that the source material is simply too high quality to begin with? I watched the original first and thought "wow, 1896, this enhancement is great", only turns out that was the original clip. They should have chosen something that has more room for improvement (which would also give them a lot more interesting material to choose from).
Funny, I did the same
But, as others have noted, adjusting film speed today in digital may just be a matter of hitting a few keys. However, with physical film stock, it's a lot harder. Something you might do for the restoration of a famous film, but not something you're going to do routinely.
Yep, they used an off the shelf nn (gigapixel) to upscale each individual frame. You could try to achieve your proposal using a guassian convolution windowed in 3d across frames. I suspect it may make a mess of the video's framerate, but the motion appears smooth enough that it might work.
Which is quite possibly due to lapses in the hand-cranking of the film through the camera back in 1896, or maybe due to frames getting lost when the printed reel of film got damaged and spliced in the hundred and twenty-odd years between then and now. Possibly both, one could probably work out some of the cranking speed variations by looking at how the train’s frame-to-frame motions change, at least until it’s come to a stop.
Not surprising, but still drives the point home that good ML/AI algorithms have a bottleneck at the input data.
It's weird, like my brain is automatically filling in colours that aren't there.
The movie has a rather low frame-rate (due to the stop-motion production) and when I started watching my immediate thought was: I can't watch this, this is terribly stuttery. But after about 10 minutes or so my brain adjusted and I no longer noticed the stuttering.
If the sound track is synced to the video well, I don't care too much about the video quality/depth/colors/contrast.
The details are much more distinct in the upscale.
It actually surprised me, I was just expecting it to effectively be a de-speckle with edge sharpening, but it's exceeded my expectations.
This doesn't make it any less cool, just putting it in context.
Zoom and enhance away.
Additional information -- say, a close-up image of the quonset structure's signage, or even knowing what the text was, might result in additional detail. A method that was specifically designed for extracting textual information out of blurred images (several exist, and they're frighteningly accurate -- any image-hiding based on only pixelation or blurring without additional noise introduction is not particularly secure) might also be able to identify a text from the original film source.
Training a great model is expensive these days, FB routinely pays 6 figures per training run.
I would say it can fill in the blanks likely or plausibly. But one can't guarantee that it is the proper or real content, since the neural net is imagining / inventing the upscaled version.
Current neural nets might perhaps not be able to upscale with no limit and still make it look good, but it's not like it can't be done. A good human artist could take a small grainy image and given enough effort draw an upscaled version that would be a very reasonable representation of what a larger image would have looked like, there's no reason to believe that artificial neural networks can't do the same.
For example for this sequence from "The Terminator" (1984): https://www.youtube.com/watch?v=wSXl_XKXZAI
This is mostly a solved problem, it has existed for many years, no real need for ML here. (Possibly for some quality improvements?)
The resulting faces won't be an exact match due to the low quality of the original, but unless you knew these people, no-one will be able to tell.
Same with pretty much every other aspect of the scene: have good models that fit the original scene and re-create a full 3D version of the whole movie, at any level of definition.
Once this process can be automated, we'll be able to recreate all the classics, shot for shot.
The HFR was pretty convincing to me, however.
I was thinking more of attempts to create neural nets that map from e.g. the set of 480p images to the set of 1080p images. The "best case" results seem to be trained on low bitrate HD video, which gives results that "look good" to many people (especially those who grew up watching Youtube) but in terms of real detail are worse than a simple upscale (with e.g. Lanczos). I haven't yet seen results where content aware upscaling provides a real improvement over "dumb" algorithms for this purpose.
As with most advancements in technology and art, we'll see a proliferation of people using the new technique with mixed results at first. It's the shiny new toy. Then over time, people will figure out how to use it and apply it with intent.
The upscaling projects we've seen in gaming have generally produced spectacular results, probably because the intent in gaming graphics has always been for the sort of pixel density we're finally able to start achieving. Remember the huge disparity between the box art and the game? lol.
Gaming aside, I don't think we've left the toy phase just yet.
(Interesting thing I read once about the Hanna-Barbera cartoons with the canned laugh track - it was using some old recording that had been knocking around since the 20s. So if you watch them now, the whole action is ringing with the laughter of the long dead. Now how's that for a macabre thought!)
Viewing the film, it seems that the standard for clothing to wear whilst going outside has done the opposite, quite drastically.
We only perceive older language as more polite, but it doesn't mean it was.
What matters is the level of rudeness perceived by the receiver at the time. If someone was called a 'Scoundrel' they might have been shook while today it's a laughable insult.
There's probably some selection bias there. I'm not disagreeing there might have been a reduction in vocabulary, etc, but we tend to see older (usage) words as more erudite as well.
Compare the vocabulary/grammar on social media with current newspapers or a book and there's a big difference as well.
Edit: Smooth gradients are much better in v1.0 though, look at the background differences here: https://raw.githubusercontent.com/bloc97/Anime4K/master/resu...
But those dresses the women wore look awfully hot an uncomfortable.