
Archiving JPEGs for long-term storage - turrini
https://github.com/danielgtaylor/jpeg-archive
======
gwern
The 'jpeg-archive' tool, if anyone was wondering, appears from the source code
to not produce an archive file of any kind but just a directory of lossily
compressed files, does no deduplication on the file or image level, stores no
hash sums, creates no forward error correction information, does not shard or
store across multiple storage services under the LOCKSS principle, and
otherwise doesn't do anything relevant for long-term archival storage. The
title of the submission and tool are more aspirational than descriptive.

~~~
klodolph
Agreed with most of these points, but FEC / sharding / storing should be
handled at a different layer.

~~~
gwern
What layer of the filesystem it should be handled at can be debated (a tar
with separate PAR2s? a single archive split into multiple ZFEC shares?) but
that sort of thing is precisely what a long-term archiving tool should be
handling for you. I don't want to mess with par2create by hand anymore than I
want to try Imagemagick on multiple settings to get an acceptable tradeoff.

------
chakalakasp
Heh. As a photographer I'm not sure what utility this would have for someone
like me. The last thing I want to do is smash my glorious high-bit-depth RAW
files with LR adjustments into 8 bit low quality JPEGs. And if I did, ol'
Photoshop can do that with image macros, and Lightroom can do it with an
export. Storage costs are so cheap now and continue to fall, even a guy like
me with 5TB+ of images isn't panicking about how to archive this stuff. A 9TB
RAID5 is cheap as dirt compared to the rest of the costs of photography, a few
rotating externals to back up is also cheap, and cloud backup is mostly only
limited by network speeds. Glacier will archive stuff for $4 per month per
terabyte (and falling), there is zero need in most circumstances to archive at
such a terribly lossy quality level, at least in my industry. Maybe there are
other applications out there, I dunno.

------
gribbly
Interesting stuff, although not a fan of the lossy recompressing, there are
good open source lossless jpeg compressors like packjpg, lepton, giving ~22%
compression, and a yet to be released one from the guy who made webp/brotli.

------
maxaf
So, this is a low-res facsimile of Flickr but without the original's battle-
tested infrastructure or a human-friendly UI.

------
faragon
Very interesting, specially the "jpeg-hash" tool, useful for finding quasi-
duplicates (different files, but quasi-identical at visual level).

------
roywiggins
I'd love something like this but for video formats that come off consumer
cameras and cell phones. Especially old ones that used Motion JPEG, you ought
to be able to compress them quite a lot without losing much perceptual
quality, but I've no idea what the best way to do that is.

------
nsuser3
How to actually archive images:

1\. Convert to png

2\. Save image on multiple storage mediums

~~~
monochromatic
I like that method better than:

1\. Re-encode lossily.

2\. ???

3\. Profit.

------
digianarchist
Slightly off-topic but does anybody know any services that you can pre-pay to
store images long term? Say 10+ years.

Flickr only allow you to buy an account for the next 2 years.

~~~
res0nat0r
Glacier is $0.004/GB

[https://aws.amazon.com/glacier/](https://aws.amazon.com/glacier/)

~~~
ktpsns
Just looked into the pricing and the lions share comes when you (especially
quickly) need the data again.

Why not buy instead one or even multiple cheap but large USB hard disc and put
it into your shelve or to your friends? It gives you immediate access for
free. I don't get it why one rents a service instead of staying independent.

~~~
willglynn
Data storage devices have a finite lifetime, so data storage is necessarily a
recurring expense.

Copies that can't be read aren't copies any more, which means regular access
is a requirement, which means another recurring expense.

Different requirements can reasonably result in different solutions, but a USB
hard drive on its own is simply not comparable to storage service. A redundant
set of USB hard drives with a specified replacement schedule and testing
procedure would be comparable -- but if you amortize all that out, we're
talking about $X + Y hours per month for Z GB of storage, which compares
directly to $X/GB-month if time has a dollar value.

From this perspective, I argue that storage services like S3, Glacier, B2, and
Digital Ocean Spaces are priced fairly.

------
gumby
Why choose a lossy compression scheme?

------
yread
You could make the comparison of compressions a bit more interesting by adding
100% crop to say the lips or any area with intricate detail. Also, if you do a
recompression wouldn't jpeg2000 be an option?

------
pornel
It's a good encoder. It will give you more consistent quality and better
compression than your bash script launching imagemagick.

