Hacker News new | past | comments | ask | show | jobs | submit login

This is interesting, but it's still not the same as a proper remaster. You can't get information out of SD video that wasn't there before with upscaling, so the algorithm is just interpolating. It's too bad that Paramount is so fan-hostile; they could just get the original film footage (assuming it was shot that way; ST:TOS and ST:TNG were, which is why we have excellent remastered versions today), scan it in as raw video, let the fans have at it, and they'd make the high-def FX for free.



No, it is not just interpolating. The underlying algorithm uses machine learning by applying a trained deep neural network. So there is value added besides a mere upscale.

You're ultimately right, though, and that a true HD is only going to come from the raw film content. What the neural network gives us are essentially plausible higher-res hallucinations.

Edit: as per the other comment, if the original exists only on video and not film, perhaps this is the best we're going to get.


The neural network is applying what it "knows" about photos and inventing new data for the missing pixels. It's "creative interpolation" ;-)


> Edit: as per the other comment, if the original exists only on video and not film, perhaps this is the best we're going to get.

I don't think that's quite right, at least it doesn't jibe with what the DS9Doc people have been doing (which consists partly of remastering pieces of DS9 scenes):

https://www.indiegogo.com/projects/what-we-left-behind-star-...

I think the footage really was on film, but the issue was that it was composited with low-quality CGI effects, or something like that. So you can rescan the film, but you have to redo all the compositing (and probably with your own models because I'm guessing the original CGI didn't look that good). That's why a DS9 remaster is so expensive.


That's still interpolating, by the definitions I know.

The main difference here is that the interpolation algorithm on your TV is online. It's handling 30 frames per second, over 9 million pixels per second. Doing the interpolation offline (ahead of time), you can take as long as you want, look at multiple frames to try to make better guesses, try multiple things and use some fitness measure to pick a winner, even a frame or a pixel at a time.

It's still interpolation.


No, if I interpolate a sequence 200, 400, 600, ... I might get 200, 300, 400, 500, 600, 700. I've not added info. If I look at real world situations and find that whilst the figures fall at the even hundreds it's more realistic that they fall in a range from 20 to 30 points below the hundred on odd-hundreds. Then I have added information, albeit statistically, and the resulting sequence like 200, 287, 400, 475, 600, 672 is no longer raw interpolation.

In this case they're using machine learning to add additional information about textures that isn't in the footage broadcast. They can add frames by interpolation, but the ML texturising and detailing is not interpolation.

Starting with a blob, if you interpolate you get a smoother blob, with this process you get a more structured figure.


It's more like hallucination than anything. You're just forward-projecting your assumptions on what things ought to look like and hallucinating detail that just isn't there.

It can still look nicer than naive upscaling though.


I see what you're getting at, but it still seems within the definition of interpolation. From wikipedia

> ... interpolation is a method of constructing new data points within the range of a discrete set of known data points


Is there any evidence for this? Showing bad 480p DVD rips alongside 1080p upresed video isn't really a fair comparison -- comparing it to a real TV upscaler's output would be fairer. And honestly, even the unfair comparsion doesn't show a whole lot of benefits to me.


Its not upscaling. Its taking what it knows about other non ST pictures and creating new texture and information.


Right, but how much does it actually add over a decent upscaler?


are you saying this specific program, or the idea in theory?

http://screenshotcomparison.com/comparison/132311


I'd be surprised to find a group more obsessed with specifics of what is and isn't quality, it would be the anime community.

There are entire catalogue of overlay comparisons of different releases, encodings etc. [0].

Example: http://compare.bakashots.me/compare.php?setId=3896&compariso...

[0] http://compare.bakashots.me/


Interpolation looks bad enough without completely sandbagging its color correction. Why resort to that kind of nonsense?


Are you asking why did I change the exposure? Because I did, I was playing around. It was the comparison I had uploaded, I have a copy without on a different computer.


Wow! That's a bigger difference than I expected.


> You can't get information out of SD video that wasn't there before with upscaling, so the algorithm is just interpolating.

In the fully general case of arbitrary video this is true, but in practice it isn't.

You can gather information over time to do superresolution, and if you want to get super fancy you can build a world model (e.g. get more information about what an actor's face looks like from a close up shot, and apply that knowledge to less detailed shots).

I expect ML based upscaling to eventually produce some truly stellar results.


a combination of a world model, along with non-source enhancements.

i imagine upscaling the Phantom Menace podrace to 4k, and giving the model a bunch of NEW rock texture info to use to create new detail.

Kinda of an automated way to combine these two ideas.

http://www.framecompare.com/image-compare/screenshotcomparis...

https://techcrunch.com/2019/03/18/nvidia-ai-turns-sketches-i...

What you dont want is every upscale to start looking homogeneous, so it would be best for a design team to specifically map old to new texture sets, giving each upscale a unique look.


I would expect multi-view rendered models of ships/stations etc to also help significantly as well... vs. re-rendering sfx, it would be upscaling via AI with knowledge of detailed views/models for the actors... training stills could go a significant way.


I was thinking particularly about the space battle shots, where you could provide 3D models of each of the ships, so that the software had the higher detail information, and would just need to match position, movement, and lighting.


AFAIK, STTNG has one other advantage. They used real models and DS9 mostly switched to CGI. Old CGI is harder to beautify than models.


Here is an example of what can be done to old Babylon 5 CGI by just rendering them in 1080p. https://www.youtube.com/watch?v=uHAuK_lDkk0

"They’re using the original Lightwave scene files for camera and model movement, lights, etc. It’s also the original 3D models and textures used on the show – and nothing has been updated in any way other than being rendered out at 1920x1080. It’s the raw CGI without any post work."


In theory at least, a neural network with sufficient training in the right domain should be able to add the slight imperfections and subtle light/texture cues our eyes expect to see, even to too-perfect 1990s CGI scenes. This would, however, require a network that can maintain continuity between frames so these details don’t appear to jump around.


Um, that's not what I read at all. What I read was that TNG's FX was all rendered with computers at the time, at NTSC resolution, so it was all useless when they did the remastering, and they had to make all-new FX shots. The place where the film was invaluable was for all the live-action stuff (i.e., anything with actors and sets, or on-location). It's much like what they did with ST:TOS, except TOS actually did film real models, but they were so awful by today's standards they redid them with modern CGI.


That's true of some effects, but in TNG the core spaceship / space station models were shot on film, and then composited with other effects like phaser shots.

In DS9 / Voyager as I understand it, they transitioned from using models to directly generating the whole shot with CGI at NTSC resolution. See https://memory-alpha.fandom.com/wiki/CGI#Acceptance

This means that in TNG they just had to recomposite the film with a newly created high-res phaser shot; whereas for DS9/Voyager they would have to recreate the whole shot in high-res CGI.


while I can't disagree with your conclusions, I would add that some of those models were not necessarily built with HD in mind, and could look just as crummy as old CGI when viewed in very high definition. Not all, of course, but I imagine experienced directors and model-builders would have constructed it for the duties it was expected to perform on the media of the day and not build/shoot to perfection in the off chance it might be remastered someday in the future. Small imperfections accruing from time-constrained model-building would get dramatically amplified in HD.


That's a reasonable supposition, but having seen the remastered TNG and the documentary about it's creation, I don't think it's true. The models are very detailed and look great in HD. One could surmise that the model creators knew their models might be used in TNG feature films (as indeed some of them were). Or maybe they were just super-dedicated and liked making really detailed models.


They could generate higher detail model compositions to train the AI.


> You can't get information out of SD video that wasn't there before...

Well, that is the difference between regular upscaling and upscaling with neural networks. With a neural network, the additional information is being stored within the network during training and added to the video during the upscaling process.

Ultimately, you could argue, that this is just interpolating too, but the quality of the interpolation depends on the training material. If you would train it on an original and use such a neural network to upscale a lower resolution version you could end up with the original (a perfect interpolation).

So it all comes down to the quality of the training material and while AI Gigapixel seems to have quite good material, I wonder if the result could be improved by transforming the video as a whole and not just frame by frame, as that would give the NN even more information to interpolate on.


> With a neural network, the additional information is being stored within the network

I've seen a few people say this in the thread. It doesn't seem accurate to me. Information is being created/hallucinated/interpolated. Re-scanning filmstock gets new information. If not, it doesn't matter how sophisticated your algorithm is (naive upscaling or deep learning), you're still interpolating values.


In the blog post, it says:

"While the popular Original Series and The Next Generation were mostly shot on film, the mid 90s DS9 had its visual effects shots (space battles and such) shot on video.

While you can rescan analog film at a higher resolution, video is digital and can't be rescanned."


The Next Generation also had its visual effects shot on video, this was a major hurdle for their remastering efforts. They had to recreate all of the effects so it was a slow an expensive process. On the positive side, the effects end up looking much better since this isn't the 80s and we can throw a lot of CPU power at creating them.


Could you please expand on why CPU and not GPU?

Edit: why am I getting downvoted for asking a question? Sorry :-(


The parent was likely using "CPU" to just mean "processing power", not being prescriptive toward whether that processing is done on a CPU or a GPU.

I assume your downvoting is because people (uncharitably) believed you were being an annoying pedant instead of asking a genuine question.


For what it's worth, the implications are about the same... though in practice, it would likely be ray-tracing frame by frame, then overlays... though the process in the article could be trained with very high detail models and fill in the blanks from that.


> You can't get information out of SD video that wasn't there before with upscaling

But you can infer visual detail from the information that is already there. Especially because ML uses information from the training set to help make sense of the information that is already there.

e.g. if I show you a picture of a key then you can figure out what the lock looks like because the key contains that information, even if it’s not visible.




Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: