Hacker News new | past | comments | ask | show | jobs | submit login
Is JPEG 2000 a preservation risk? (2013) (loc.gov)
79 points by userbinator on July 17, 2022 | hide | past | favorite | 83 comments



Like many others, I don't think I will bother talking about JPEG2000 as a preservation risk. I guess it is, given the bugs and difficult implementation.

But for me, the problem is just how unwieldy the files are. Nothing seems to work with them well, except ImageMagick, to convert the files to some more-usable format.

I do a lot of Blender renders, and for a while I thought I'd use 12-bit JPEG2000 so that I could capture more details in the shadows for subsequent photo editing.

Big mistake. I ended up having to write a series of very complicated scripts to convert all to regular JPEG for 99% of whatever I'm doing with the images. GIMP doesn't work well with them. My own image viewer, based on Kivy, doesn't work at all with them. MacOS generates thumbnails on them at maybe 1/10th the speed of comparable JPEGs.

In the end I just decided if I really want to work with the darks in an image I will go find the original .blend and re-render the scene. Now I render straight to regular JPEG and I save massive amounts of time.


The arithmetic coder in JPEG2000 is often the culprit for slow encoding / decoding.

GIMP doesn't work well high-precision files, although it's gotten better. Photoshop can deal with these, even very old versions of Photoshop.

Use OpenEXR or HDR output formats and do the tone mapping as a separate step.

Also note that plain JPEG supports 12-bit in the first place.


Looks like the answer was 'yes', in something of a self-fulfilling prophecy. When was the last time you used a JPEG 2000? https://caniuse.com/jpeg2000 puts browser adoption today at 18% (!).


Fun fact, whenever you go to the movies, films (that are projected digitally) are encoding in JPEG2000 (wrapped in a MXF in XYZ Colourspace).

We also use it for our 3D renders for archviz, and is supported in our editing applications aswell... lower filesize compared to PNG, and better quality compared to JPG, and also supports alphas.


JPEG2000 is big in medical imaging as well.


I'll mention libraries and museums as well. Instead of having a tiff available of high quality record images, some institutions use jpeg2000 and a zoomable js library like openseadragon.

In this case, the jpeg 2000 is probably considered a derivative and thus not a big issue for preservation.


In other words, motion JPEG2000, and it's decoded by (no doubt extremely expensive) hardware. There's no way a general-purpose CPU can decode in realtime otherwise.


This benchmark is showing 4k decoding on commodity hardware from 2018 fast enough for movie theaters. https://www.fastcompression.com/benchmarks/decoder-benchmark...

So, presumably they could have swapped to commodity hardware a while ago.


The issue for commercial cinemas is not necessarily handling storage and playback of DCinema packages - it’s the DRM that the studios insist upon playing nice with the projectors. Most cinemas converted their screens at least a decade ago, financed partly by distributors, and the industry is sort of stuck in time as a result.


The fastest decoder there uses the GPU, not the CPU.

The fastest one that runs on a CPU, Kakadu (proprietary), is far behind.


> The fastest decoder there uses the GPU, not the CPU.

How does that matter? It was a reply to your incorrect claim:

> it's decoded by (no doubt extremely expensive) hardware



CPU or GPU it’s still commodity hardware.


With a Windows Update banner and forced reboot right at the climax of the movie.


Commodity hardware doesn't require Windows...

Though lets be honest, they'd probably use Windows anyways and that probably would happen.


A couple months ago, I've been to a 2020 anime feature film in a cinema here in Estonia. To my amusement, after the credits I was greeted with the VLC interface for a brief moment (though they've turned off the projector very quickly after that).

Edit: obviously, that's not how most movies are shown here :D


I wonder if that was because that film's distributor is relaxed about DRM, or if the cinema just decided not to care about licensing it assuming that an Asian (I guess) IP holder would be unlikely to find out or take legal action against a small cinema in Estonia?


It’s almost worse than that - the common commercial playback servers I’m familiar with run Linux 2.6.x and fail in all kinds of weird and wonderful ways!


The digital cinema projectors have hardware JPEG 2000 decoders. This IP is licensed from companies like IntoPIX:

https://www.intopix.com/JPEG2000

Most post-production software that reads/writes DCPs will be using a GPU JPEG2000 encoder & decoder, like this from Comprimato:

https://comprimato.com/products/comprimato-jpeg2000/


we use comprimato which can use the GPU. It's very performant


Do they really use straight up XYZ? Where can I read more about this format?


I think people mis-estimate file sizes these days - two hours of completely uncompressed 4K video is under 5 TB. Fits on a basic consumer external hard drive no problem. Send it by mail.

https://toolstud.io/video/filesize.php?width=3840&height=216...


Well, two hard drives in RAID-0 if you want to play it back in real-time without copying it first to faster storage, because a single SATA drive cannot sustain 597.2MB/sec. Unless your definition of "basic consumer external hard drive" is an NVMe SSD over 10Gbps USB 3.2 or better.


As the thread says - it’s compressed. I’m giving uncompressed sizes as an upper bound.


The GP is pointing out that storing an uncompressed film isn't useful if it can't be played back. Playing back an uncompressed 4K film is nearly 600MBps. That's a data rate you're not going to get out of any consumer HDDs.


But they're not uncompressed. They're compressed. You don't need to play them back uncompressed. You can even compress them losslessly.


I probably wouldn't call it "basic", but aside from that, https://www.amazon.com/Sabrent-Rocket-Thunderbolt-External-S... seems to fit the bill fine.


How common is 4K at 24fps though? I'd expect 4K to usually use higher framerates than that.


24fps is still standard for movies, in part because people have associated higher framerate video with amateur home video recordings so it looks cheap. :)


To elaborate on this, higher frame rate video registers the same way to people who aren’t aware that’s what they’re noticing. I was discussing this with my dad just a little over a week ago, where he noted some portion of a movie looked lower production than the rest of the movie. I had noticed it too, because it was visibly higher frame rate. Lower frame rates more closely match the visuals of film, and higher frame rates more closely match the production of daytime TV like judge shows and “soap operas”, and that’s what they visually evoke.


Thankfully this changes with latest generation brought up on 60fps YT and Twitch.


> I'd expect 4K to usually use higher framerates than that.

They tried higher frame rates with for example the Hobbit, remember, and lots of people didn't like it for some silly reason.



Sadly, I was going to reply with "daily" because of this


In what way is it interesting to note that the format is used by someone, somewhere, for something?

That much is a given, and addresses none of the article's points.


It was interesting to me.


Are they not common in pdf documents, for example those distributed by The Internet Archive? I suspect this is why so many of them render so slowly, even on fast systems.


Yes. It's very noticeable when you're reading an ebook with J2K page images, just due to how long it takes each page to render. In contrast, regular JPEG of the same dimensions is practically instantaneous.

PDFs are probably where the majority of users will encounter J2K.


Do there currently exist tools for transcoding the images to something which renders faster?


I don't think it's common. I've wondered for a long time why every time I try to process or edit an IA book scan - and only IA book scans - they come out bizarrely mangled and broken and inconsistent across viewers (and have mostly given up using them and scanning books myself), slow and surprisingly bulky compared to normal JBIG2-compressed JPEG PDFs. OP's observations about the unreliability and inconsistency of JPEG2k across tools would explain all of this...


There seems to be a functional and open reference implementation for JPEG 2000. https://github.com/uclouvain/openjpeg


That is mentioned in the article, along with its deficiencies.


I think a distinction needs to be made between “risk” and “potential inconvenience”.

Given that there are free, open source implementations of the JPEG 2000 decoder, There is no plausible preservation risk. Even if, in 150 years, the contemporary version of ImageMagick has dropped support for JPEG 2000, it would be relatively trivial to spin up an ancient virtual machine and run a 100-year-old version of Linux.

Or, you know, pay an intern to port some old code to whatever modern language they’re using then. Rust, probably.


The article addresses this. The open source implementations are not complete so there are compliant files they cannot render. Also they have rendering bugs which mean some content does not render properly, and worse some files they generate render fine now, but are not compliant and if the code is fixed those portions of the files may stop rendering correctly. Some of these issues only manifest at particular zoom levels, so you are never be sure what you are seeing is correct. This us all part of the problem with the spec being so complex.


Software rots man. It can be really hard to run some open source software from the 80s and 90s. This is actually one of the big arguments for Open Source: Access to the software is way less interesting than the support/community/ecosystem around it; there's no reason not to open source your software and not doing it actually harms the users.


Yeah, Pixar literally had to spin up VMs of their old software stack when they had to remaster the Toy Story films because it was less work than getting it running again on their current stack


Actually, it seems to me VMs are an ideal way to preserve things like that. Perhaps it should be standard procedure for the archivists of such big films to build and preserve a fully functioning, self-sufficient rendering environment in the form of a virtual machine.


Until we switch to another architecture. It's pretty likely the x86 architecture will be gone in a few decades. Rebuilding VM software to run on new hardware is probably harder than fixing bugs in a JPEG decoder.


But you only have to rebuild the VM software once to get access to all kinds of software that ran on that architecture.


A few decades? The IBM System/360 architecture is still being shipped, and that first shipped in 1965!


Nobody ships a PDP11 or VAX anymore. 6809? 6502? I think even the one ubiquitous 8080 isn't being produced anymore. And when IBM falls, System/360 will also be gone. There will be a transition period in which everybody scrambles to convert their systems, but after that it's EOL.


I know someone who got hired recently to get PDP11 software running in an emulator to control an industrial process.


I expect that to happen to /360 as well (there's more than one emulator; there's even support for the old consoles), but the hardware will be gone. And --in case of the JPEG2000-- once x86 is gone, it'll be an old decoder running on an old, unsupported OS running on unmaintained VM software running on a hardware emulator (which will also have a limited support). Not a great outlook.


Dude, that's why we use Virtual Machines. We will run them thru QEMU or another multi-architecture solution.


We're building an iPad reader app that uses Apple's PDFKit, but we encountered JPEG2000s in the PDFs of the client's library, and PDFKit doesn't render them correctly. The frustrating thing is we're stuck with the issue since it's such a niche case that I don't think Apple will ever fix it. There is a 3rd-party commercial library PSPDF for iOS that renders it right, but I don't think they want to pay for it.


This may be fixed in the newest macOS betas. The manual for my car wasn't rendering before and now is.


Hey thanks for letting me know! That's great to hear. I'll keep an eye on it.


I decided to use JPEG 2000 to save disk space on my photo archive a decade ago... huge mistake.

[edit] I thought I had the originals backed up on CD, it turns out I didn't.

DigiKam and many other programs don't support them natively.

At some point I'm going to have to write a script and convert them all to normal jpegs.


ImageMagick and GNU parallel happily max-out all cores on my machine and convert a few hundred images per second:

    find -type f -name '*.jp2' | parallel --xargs --max-args=1 convert {} {.}.jpg


Would be yet another lossy transcode, though.


You could convert to png or tiff if you wanted lossless. Jpg at high quality is pretty good for general photo saving if you aren't doing tons of modifications after the fact.

I think some editors now save the original and just store the modification steps (Lightroom does this for raw at least.)


Use JPEG XR, that has a lossless mode and better compression than you would get with .png. Though of course that puts you at risk of using yet another unproven format. Though it seems to be on track to become the JPEG successor from what I understand.


Why was it a huge mistake? There are decent libraries that support it. Did it just not work with your cloud setup or OS libraries?


Check the article’s discussion of rendering bugs and implementation incompatibilities.


Sure those were encountered there, but at least in my experience those are easily avoided. I have had to make test suites for this, and in my experience if you use reasonable libraries things work well.


Are you suggesting the parent sets up test suites for a cloud solution and call it a day?

How would that guarantee safe storage, given the previous discussion of all the broken implementations.


Convert them to JPEG XL instead. It seems to be the future widely supported format, and it can loselessly convert back and forth with regular JPEG.


I don't think you can ever convert to a regular JPEG and the outcome will be lossless. Like Webp ,JPEG XL has a lossless setting but it has to remain there.


I tried it on my computer. Converting from JPEG to JPEG XL and then back to JPEG gets you a bit-wise identical JPEG file.


I'm guessing the opposite (XL -> JPEG -> XL) does not work, i.e., "losslessly convert back and forth with regular JPEG".


I think the issue at hand is that behind the scenes, JPEG XL the image format actually combines three different kinds of image compression algorithms:

1. native JPEG XL lossy compression

2. JPEG lossy compression, but using a more space-efficient storage format than classic JPEG

3. lossless compression

The lossless conversion between JPEG XL and JPEG only works with 2).

Normally you'd only use 2) for existing JPEG images in order to avoid a lossy re-compression, and use 1) for any freshly-created images because native JPEG XL compression will compress the image data even more efficiently than the JPEG compatibility-mode, though if you've got strong interoperability constraints with clients requiring JPEG data you might possibly want to store even newly created images using sub-format 2) in order to get the ability to losslessly convert back and forth between JPEG XL and JPEG.


Just tried it and JPEG -> XL -> JPEG -> XL is bit-wise identical between the two JPEGs and bit-wise identical between the two JPEG XLs.


I was thinking about using JPEG2000 to archive scan masters, because I hate the look of original JPEG artifacts. In the end I decided to go with FLIF to eliminate the "preservation risk" - by being lossless, I can run the output through sha256 to make sure I've gotten all the original bits back. In the event that I regret this decision, I can simply transcode and not feel too bad (as opposed to transcoding from lossless and losing even more quality).


The Wikipedia page for JPEG 2000, for the past 16 years, has used the same sample image[0] which arguably makes JPEG 2000 look worse than JPEG at the same file size. I wonder how many people that one image alone has put off.

[0]https://upload.wikimedia.org/wikipedia/commons/5/51/JPEG_JFI...


> OpenJPEG C 49,892

> libjasper C 26,458

> PIL C 12,493, Python: 9,229

But does PIL actually implement any of those image formats, or is it simply wrapping other libraries in the python/c api?


The first count “PIL C” is for the C implementation of PIL including library code.


That's the 'C' code in PIL, but not including mandatory things like libjpeg or optional things like libtiff, libwebp, etc. Libjpeg itself is ~30k lines of C.


It certainly was a risk in 2013, when the open source libraries were slow and riddled with bugs. Today, there are two stable, well maintained and fast open source libraries available:

https://github.com/uclouvain/openjpeg and https://github.com/GrokImageCompression/grok

JPEG 2000 is a niche codec, but in the niches that it occupies it is the gold standard, even after 20 years:

1. digital cinema (part of the digital cinema standard) and broadcast 2. medical imaging 3. satellite imagery

For memory institutions, it still vies with Tiff as the top preservation codec.


JPEG 2000 is used a lot in raster maps (satellite images, with jp2/ecw/sid containers)


AFAIK The fast J2K libraries are closed source and expensive so you have to worry about licensing a 3rd party component which is a PITA.


Does it still use wavelet transform?


There is no such thing as preservation risk for digital formats. This has been studied.

Do whatever.

The only thing that matters is publishing it in the first place and backing up on hardware that's not at preservation risk.

This is great example of fake science (Is it just "science" now?) keep the underclass constantly anxious about things that don't exist so they keep paying academics wages to do nothing but supply fear and disruption.


Did you even read the article? What's the point of perfectly backing up a file on whatever hardware, if the format of that file becomes obsolete and hard to read in the future?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: