Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I had an idea for a file transfer system based on digits of pi. You'd give everybody DVDs of digits of pi (or they could calculate them themselves) , and then transfer files faster by just sending them the offset into pi.

At the time I thought it could work with a big enough bank of digits of PI on both sides. If transfer was expensive, and calculating digits was cheap then you could give everyone an infinite supply of digits of pi and have a nearly infinite compression system.

I discovered that often the offset into pi is much larger than the data you are sending. Turns out it's an expensive way to sent things.

Also, it turns out that this area was already well understood. There are no free lunches with entropy.

But it was a fun idea to kick around.



I think a lot of people had similar ideas, I know I did.

The idea of using Pi is essentially the same as having a shared dictionary as used in both regular compression and in lossy as waveforms (in layman's, terms - technically it's a best match) both ends have the dictionary and you simply index it.

In fact whole network protocols use this concept too such as protocolbuffers which are part of grpc.

The difference here being the dictionary is infinite and until we get quantum computers on the desktop indexing into pi will always be slow. Some of the address space may not be feasible too, e.g. your file may require a billion bit address/index.

It may still be feasible to do partial matches for a file, 50% one index, 50% another for performance improvement.

I guess the trade off is a balance between performance and number of indexes Vs the original file length, and where in Pi the address space becomes unfeasible.


It's interesting how zstd can use a shared dictionary to compress small amounts of data.

It's a more practical ( then my idea ) shared dictionary.


zlib also supports dictionaries, as does the deflate algorithm. The problem is that zlib doesn't provide any tooling to generate an optimal dictionary from a large sample, whereas zstd does provide such a utility.


> I discovered that often the offset into pi is much larger than the data you are sending.

Shouldn't the offset be roughly the same size? If you look for a 1000 digit sequence, you need to try about 10^1000 starting points, which makes your offset about 1000 digits.


Oh, is that true? Either way, certainly not an efficient compression scheme.


> Also, it turns out that this area was already well understood. There are no free lunches with entropy.

If I ever go mad and I breach the 'crankpot line' this will be it. There is something in entropy that I just can't fully grasp even though I understand (or I can convince myself that I do) Shannon, Kolmogorov etc.


I hear the compression tends to attract lots of 'out of the box' thinkers. So you'll have company.


technically "outside the box," meaning "their creativity is expansive, not within the narrow confines of a box".

"out of the box" means works right away, as soon as you take it out of the box, no assembly required.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: