Hacker News new | past | comments | ask | show | jobs | submit login

> For such small string, any statistical compression won't have a smaller size.

You can hardcode expected frequencies and throw arithmetic encoding at it and the average size will probably drop a meaningful amount.

And I can't easily find an example corpus, but the description of these strings sounds like they'd often have repetition, so another symbol to encode repetition and make this into a normal compression algorithm is probably worth it.

I wonder how many of these string start with org.apache

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
