Hacker News new | past | comments | ask | show | jobs | submit login

I've been wondering about this per a unique business system I've been working on... Hypothetically, if you have a situation where you are not allowed to delete any prior business data (i.e. append-only event log), what would stop you from directly using prior log entries as the basis for some sort of incremental/recursive compression dictionary?

Put differently, is there any way to leverage scenarios where you have a practically-unlimited dictionary size and willingness to retain all data forever? Or, am I playing with some basic information theory equation the wrong way in my mind?




You can have an adaptive dictionary, favoring recently seen entries. The dictionary would have to be rebuilt periodically in practice, but it has an advantage of being able to cope with the ever-changing ("non-stationary") data distribution. Sometimes the dictionary doesn't even need to be stored and instead can be calculated as the data gets decompressed; this is so big deal that many compression algorithms indeed omit and adaptively build the dictionary instead.


What you're thinking of is a journaling archiver. Something like zpaq.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: