One format I'm missing: storage for conversations and social media posts. Both are complex media (text + images/videos + metadata), and one is actually a collection of such posts.
How would you go about storing those in a somewhat human-readable format? My goal is to archive my chats and social media activity.
Use a SQLite3 database. Have a table for the posts (or any other appropriate schema, depending on what metadata you have). Using SQLite3 has the advantage of future flexibility (new/different tables and schema as needed, full-text search, etc.).
You can have another table for attachments (images, videos, etc.). If they're small, store them directly in a BLOB. If they're not, store them alongside the database, and only store the relative path in the attachments table.
You may opt to convert images and videos to a single format (e.g. PNG and H.264 MP4), but you can lose information depending on the target format. It may be preferable to leave them in the original (or highest quality) format.
Depends on what you mean by humans readable. Sqlite, as the other sub comment mentions is good, but you could also just use a CSV file, unnormalised table, and sit the original media in the same or a sub folder. Hell, convert that CSV to a html table and you can display the data as a human readable local webpage. Through in some JS and you can navigate/filter it too.
The thing about archives is you either parse them now or parse them later. With how much JS and other crap is served in modern social media frontends, I'm not sure WARC is the best format for archiving from them.
But that is the point of WARC: otherwise, your archival method need some sort of general inteligence (ai or human behind the scenes) to store exacly what you need.
With WARC (and good WARC tooling like Browsetrix-crawler) you store everything HTTP the site sent.
How would you go about storing those in a somewhat human-readable format? My goal is to archive my chats and social media activity.