Hacker News new | past | comments | ask | show | jobs | submit login
Upscayl – Free and Open Source AI Image Upscaler (github.com/upscayl)
318 points by faebi 8 months ago | hide | past | favorite | 71 comments



If like me you're using Real-ESRGAN-ncnn-vulkan [1] and are curious what upscayl-ncnn CLI [2] changed from it, there's not much and nothing substantial [3]. Not a criticism, just wanted to learn whether it's worth upgrading to for a CLI tool ($subj is a separate GUI app based on it).

[1] https://github.com/xinntao/Real-ESRGAN-ncnn-vulkan

[2] https://github.com/upscayl/upscayl-ncnn

[3] https://github.com/xinntao/Real-ESRGAN-ncnn-vulkan/compare/m...


Upscayl NCNN has several fixes (like latest vulkan, bug fixes, feature additions) and is coming up with more soon (we just haven't pushed the changes). Also working on CPU support.

If you're going to use Real-ESRGAN, you're going to have to deal with a lot of issues and a codebase that hasn't been updated in a long time. Afaik, Upscayl NCNN is the only project maintaining Real-ESRGAN NCNN as of now.


I've only used -i -o -n mode, but for me that just works. And when something just works, I tend to see "a codebase that hasn't been updated in a long time" as a good thing generally :)

(Don't take it personally, it's my burnout from modern software cycles.)


I understand. With Upscayl it's a little more configurable.

For example, you can add compression to the images, change the format (which Real-ESRGAN can't always do), drag and drop images, have extra models not shipped by default and in future even have face enhancement.


I came to the comments just to ask this.

I have used RealESRGAN for some years now, and several pictures hang on my house that were enlarged using this.

I enlarged a lot of old pictures from family albums, and then fixed the bad spots with something like Photoroom, Snapseed, etc., and then sent them to be printed.

Works really great.

And not only that, if I find an old picture on the internet that is low res with no high res found after much google-fu, Duckduckgo, or Kagi, I use RealESRGAN to upscale, and then use it as my wallpaper, presentation background or whatever!


Yes! I Real-ESRGAN is still the best algorithm for enhancing images. The only algorithm better than it is Topaz's but that's to be expected from a paid product.

Real-ESRGAN can get better with a better trained model too, which is on Upscayl's roadmap.


Does anybody have experience training/finetuning Real-ESRGAN models? I downloaded one of their datasets to see what they are working with, and it has 800 images at ~3Mp resolution. I am curious if others have results based on higher resolutions + more/fewer images in the dataset? I would like to use it for fine art paintings by period / style, etc.


I mean... if people prefer a GUI that's not a bad thing...


Especially for something involving graphics.


I used to laugh so hard at those tv and movie scenes when they would "enhance" an image: https://youtu.be/LhF_56SxrGk

I guess yesterday's science fiction is now our reality.


I think I've got a twitch now, when I think about this. How those stupid moviemakers would say, "enhance!" and I (along with many of my geek brethren) were like, "there are no more pixels, that can't be done!"

And now it exists. "Enhance" exists.


Or does it? If you enhance security footage to make a barely visible face recognizable it will make it but it won't be the person in the footage anymore.


It does not exist. The AI models generate "what could be" instead of "what was", i.e. they hallucinate heavily in upscale tasks.


> It does not exist.

The MRI scanner near you is potentially using AI to increase image signal, then to increase pixel count 4x (well, 2x in X direction, 2x in Y direction. Potentially also generating a slice between each slice in 3D datasets).

If you turn off the AI and actually acquire the information, the scan is long, the patient moves and it is blurry. However if the patient is still, the non AI images look like the AI images.

Disagreeing with it is sort of moot now - it’s in the machines and is running. It works well, I’ve been using it daily for a few years.

All vendors have it - I use the Siemens product ‘Deep Resolve’ [1]. Their PR department undersells it, in my view this is a bigger change than the 1.5T to 3T transition.

[1] https://www.siemens-healthineers.com/magnetic-resonance-imag...


The scanner is faster because it collect less data.

The image is sharper because it have more data.

Where did the missing data came from?

It came from the training data set, not the patient.

No matter how good the image look like, it is not something from from the patient.


Yet, it can be "close enough" for horseshoes, hand grenades, and surgery.


Which may well be, but those detectives still aren't going to suddenly be able to read the license plate from an "enhanced" blurry photo, like their TV counterparts were doing back in the early 00s.


Not necessarily. As long as some mashed pixels of the license plate remain, and assuming different license plates would result in differently mashed pixels, it might be possible to restore the original from the highly compressed image data.


From series of images, aka. video, sure. From a single image? Not so much.

In video there is a lot of temporal information and even if the spatial resolution wasn't high enough in a single image, one would be able to accumulate a higher resolution version of the scene using multiple observations.


Correct. And the extrapolation region for the MRI is reasonable, I reckon, versus extrapolation into pulling enhanced blurry photos!


If the model learns that it can get fundamentally the same information from a lower res scan then it’s moot though.


Some of this can be done without AI as well: https://en.wikipedia.org/wiki/Compressed_sensing


We have 7T at my old uni in Alabama


It would be really interesting to see what it upscales poor security footage to be when we know the people in the picture; how accurate was it?


The models usually bundled with ESRGAN don't enhance like CSI. They straighten lines and jpeg blocking a little and correct "subatomic" details, but cannot restore large areas like faces, objects, etc. It's more of an AI-flavored "Effects - Sharpen" than stereotypical "Enhance!".



The better example of "Enhance!" still remains this piece of software: https://www.youtube.com/watch?v=19wgu5GZDhk


It just hallucinates the details now.


Yes - I can't find it now, but a few weeks ago I saw a demo website for some other upscaler with an example image of a picture of food, and it was pretty clear that the upscaled "version" was different bread than the original lower-res version, and things like herb "sprinkles" (oregano?) in the original became pine nuts or something in the upscaled version....


Yes, I personally think it's false advertising. AI Re-painter is not AI Upscaling. I even received a review on Mac App Store saying Upscayl is worse because it does upscaling instead of repainting like Magnific.


That bread is from the front page here: https://magnific.ai/


> I guess yesterday's science fiction is now our reality.

That's actually what's written on https://upscayl.org, "From Science Fiction to Reality" haha


The correct phrase should have been "Hallucinate!"


Except that when you try to enhance the numberplate of a suspected killer you get a platenumber which is based on avarage noise and mqybe different than reality.


I tried this out back in December. It is very straightforward. Would recommend for anyone who is testing the waters and just trying to start exploring the various tools.

From my understanding though, the quality is pretty far behind that of the cutting edge. A friend recently recommended Topaz to me, but that isn't open source.


The quality is actually comparable in some places. Upscayl being a free project though, does not have its own model yet and as such, we depend on community models (which are great in their own right).

I do have plans to create our own robust model though, once I collect enough money :)

https://ckovalev.com/midjourney-ai/guide/upscaling-ai-art-fo...


What models do you use to upscale images other than RealESRGAN?


We have a custom-models repository with several models. Upscayl by default ships with 6 models.


extras tab in a1111 is really the best results I can get. You need to know what all those options mean of course - it's a more complicated interface for sure


xz utils is open source, so whats your point?


UpScayl is great, I use it a lot for work. Upscaling low-res graphics and illustrations for use in graphics in a pinch, upscaling portrait photos of people for print and photoshop… upscaling old copies of things for editing purposes, you name it.

It’s not perfect but no alternative is. Bloody useful though.


I'm glad you like it! We're working on making it even better :D and with the introduction of Upscayl Cloud the possibilities for professionals will be endless (within reason of course, haha)!


Like it? My brother I was trying to hide my enthusiasm to look cool in front of people on YC.

I use UpScayl pretty much every day and doing so feels like magic every time. Your user interface tweaks here and there are always appreciated and UpScayl is a joy to use.

Feel proud, UpScayl is witchcraft and I love it


Haha thank you so much! Always a joy to see a happy user!

Mind if I showcase your comment on the Upscayl website? :D


Yeah boiii. I’ll email you a proper quote tomorrow if that works?


Yessir!


upscayl is very approachable, but lacked many features i needed. i ended up using https://github.com/AUTOMATIC1111/stable-diffusion-webui after upscaling became part of my regular workflow, but for someone who just needs a few images enhanced, it's an ideal tool.


This should have been called Enhanse (sticking with the misspelling theme)


I still find current upscalers surprisingly underwhelming compared to advances in other areas.


Are there any models out there for cleaning up an image, not just upscaling? I have a bunch of old photos taken on early low-res point-and-shoots that have JPEG artifacts etc and this seems like something a modern model could easily be fine-tuned to resolve, but every few months I look around and have yet to find anything


Check out these models: https://replicate.com/collections/image-restoration

Most of them can be run locally, but I’d recommend testing them with replicate before investing in understanding cog/docker/hf…


Oh this sounds like exactly like what I've been looking for, can't wait to give these a try - many thanks


Unfortunately, for video, nothing I've seen yet has matched the quality of Topaz Labs' (paid) tool. Clarity and consistency always seem to be an issue with other implementations. If love to be proven wrong because I have a project that's stalled due to the low quality/resolution of the source.


On a related note, what would be the equivalent app for watermark removal?


Is there something similar for video? Or is topaz still the only option?


I still have video upscaling planned for Upscayl but we're just 2 people, working on Upscayl in our free time.


Video2x/Waifu2x GUI. Unfortunately, the models they ship with are tuned towards anime.


Another upscaler, for images and video

https://github.com/k4yt3x/video2x


Is there an upscaler which looks at other similar photos we took?

Been looking for The Magic Photo App, reverses motion blur, can interpolate multiple photos to unblink some eyes and select best photos of a scene.


The upscaled image feels sharper but loses some detail.


Every model will give you different results. You can try the custom models as well to see what works for you :)


What's the state of the art?


SUPIR and other diffusion based models.


But doesn't that defeat the purpose of upscaling? I see a lot of people promoting Magnific AI (which costs $40 a month) but it's not really an upscaler. It's more of an AI repainter. It replaces most details by re-imagining them instead of guessing the original details. I think we should be clear about what an AI Upscaler is and what an AI re-painter is.


ESRGAN is the same thing. You cannot recover signal that are already lost without additional information (in Diffusion case, model's prior). Upscaler are hallucinators by that definition.


But results differ in perception. ESRGAN doesn't hallucinate whole [sub]objects and that puts it into a different category. Latent/denoising i2i re-scale is different because its parameter range doesn't really intersect with what ESRGAN does.


What @cchance said. Also, Diffusion models have much more capacity than ESRGAN. Unable to hallucinate whole subjects is a limitation of model capacity and training data, not the reverse (i.e. a high capacity model can mimic what you want in a low capacity model if you steer it the right way). An unsuspicious drop of water can have a whole world inside it under a microscope.


It can only hallucinate if your i2i noise level is high enough, if you're adding enough noise that the latent is basically filling in giant swaths of noise areas then of course its going to make up new stuff, thats not hallucinations as much as you asking the AI to draw that entire area as something new.


OK, but I do hope that the hallucinations disappear when you downscale (normal resize) an upscaled image (?)


I wonder if image upscaling is actually reflective of a greater disease in society: we are exceptionally obsessed with holding onto everything. I mean, my God, if the situation comes down to a desire to upscale an image, just let the image go.


There are lots of reasons someone might want to upscale something, not just "holding onto everything".

I painted something on my iPad over the last few weeks. I'll be upscaling it so it's high enough resolution to do a large print on my wall. It's not something I need to "just let the image go" for.

(And, of course, there's nothing wrong with holding onto things...)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: