Hacker News new | comments | show | ask | jobs | submit login
Πfs: store your data in π (github.com)
105 points by rnhmjoj 1 hour ago | hide | past | web | 23 comments | favorite





I've been working on an alternative implementation of this but using tau instead, I'm not sure if my calculations are correct but I think it offers a 2x speedup over this method.

It's still in stealth mode but I'm hoping to unveil it sometime in late June.

reply


Since the code already exists, within pi - or tau - surely all you have to do is to find the working code with it.

I'm sure that you can do that before noon of the first of next month.

reply


Actually I was going to unveil another project that day. I've been excitedly working on this idea since I had the epiphany last night. So, I was checking our logs over the weekend and noticed that no jobs or processes ran between 2 and 3 AM early Sunday morning. I thought it was a fluke of our system, but I started poking around in unrelated logs and I found the same thing. I even looked at a couple of public systems and saw the same thing. Now I don't know why this is but apparently NO Unix systems (and I think maybe Windows too) ran any processes during that window. Is it some bug in the OS? Maybe, but I can exploit it.

So I've designed a once-a-year data processing system. It's going to queue up all your requests throughout the year and save them for this "quiet window" and then distribute all the jobs across all network connected computers running this new code. I think I'll make a pull request to Linux for maximum coverage. The best thing is, there is absolutely no performance hit or impact on everyone's servers because they're ALREADY not being used.

reply


It was a joke about "Tau Day," on June 28th.

reply


That was a joke about "April Fools", on April 1st.

reply


Supposing I found the index location of the next Star Wars movie in pi, would it be copyright infringement for me to mention that on a forum?

reply


This concept reminds me of https://libraryofbabel.info/

"At present it contains all possible pages of 3200 characters, about 104677 books." (https://libraryofbabel.info/About.html)

reply


https://libraryofbabel.info/bookmark.cgi?bjrvq,mgobexe314

reply


I wonder, assuming the computation was instantaneous, how effective it would be for compression.

reply


If computation is instantaneous, then you could recurse down to a single pointer, which points to a pair of pointers, which each point to another pair, and so on until you have an arbitrary amount of your data. Since computation is instantaneous, it is instantaneous to compute this first pointer for the contents of any hard disk. And since computation is instantaneous, rebuilding the files on the disk from that pointer is instantaneous.

Your compression ratio would approach infinity.

reply


Nevermind, I neglected to take into account the length of that pointer:

https://news.ycombinator.com/item?id=13870098

reply


You mean, if there was an oracle for instant pi-index lookup? All finite-length strings can be expressed as natural numbers, so the pi-index is just a mapping from that number to another, pi: N -> N, with no guarantee of any more efficiency. Over the set of all finite-length strings, I do not think this would provide an advantage. But it might be fun.

reply


I think the offset is bigger than the actual information you want to store in most of the cases.

reply


> Now, we all know that it can take a while to find a long sequence of digits in π, so for practical reasons, we should break the files up into smaller chunks that can be more readily found. In this implementation, to maximise performance, we consider each individual byte of the file separately, and look it up in π.

So basically like a regular filesystem except now lookup each byte every time you intend to use it?

reply


I think this project might not be entirely serious.

reply


A joke, on the Internet? Surely not.

reply


This is a reference to a common situation that arises anywhere in the internet that compression is discussed. There is a segment of people who seem to find it personally, mathematically, or even perhaps morally offensive that there can exist no algorithm that can take all possible inputs and be guaranteed to emit something compressed by at least one bit. This seems to be true despite the fact the proof of this statement is very, very simple; it can be sketched on a literal post-it note and you could easily convince an open-minded 10-year-old of its truth.

In those situations, you can bet money that some compression scheme will be proposed that fits into at least one of the two categories "did not check to see if it actually compresses data" or "works by hiding bits where you forgot to count, but they still count".

"I'll just give an offset into pi" is in the first category; it does, technically, work, in the sense that it can validly represent the original data, but it turns out that in order to represent the offsets into pi it takes more bits than the content had in the first place. (You can get lucky every once in a while, and store 14159265 as "0/8", but the vast, vast majority of sequences get expanded.) Everyone proposes this one. It's almost funny how often it gets proposed, given that it is actually a very slow method for making your data bigger. Don't hold your breath for your Fields Medal for that one.[1]

An example of the second case is "I'll just store the first few bytes in the filename of the file and then 'forget' that I have to count the filenames in the size of the compressed content". Vigorous, month-long debates about whether or not something "counts" as part of the content will follow, which can be resolved by pointing out that the challenge is basically to give a stream of bytes over the network to a computer that only has the algorithm on it that fully reconstructs all the content, but this rarely satisfies Our Hero who has discovered perfect compression.

I add the word "moral" not just as a jab, but as the only way I have to explain how these people react to things like pointing out "if that worked, I could compress all files to one bit/one byte" or "I can't actually reconstruct the original content with that information". They get offended.

[1]: Just for fun, let's implement the ideal "index into a sequence" compression algorithm. This is not written to proof standards, but "give the reader the flavor of what is going on" standards. (Technically it assumes the proof, in an attempt to illuminate it from another direction.) The problem with pi is that it tends to repeat a lot. Let's chunk up 8 bits into a byte, since that's pretty common, and let's build the idea sequence of bytes that we can index into easily. For completeness, let's just index the 8-bit bytes in a one sequence:

     00000000000000010000001000000011
and so on, 0, 1, 2, etc, up to 255. The most efficient way to index this list is to break it up along the 8-bit boundaries, because you'll find (if you try it) that any more clever attempts to exploit overlapping numbers will end up at best matching that performance and at worse taking more bits. Therefore, we can index into that list to get a 0 as... 0. And the index for 1 is... uhhh... 1. And 143 is... 143. And so on.

You'll find this approach, extended into as many consecutive bits in a row as you please, will wildly outperform pi as a sequence to index into to get values. Because this is, of course, just the identity transform, and pi in practice does way worse than that. This transform is also wildly more efficient than the pi indexing operation, having computational complexity O(zero), if you'll pardon me the abuse of notation.

reply


Haha, I love it.

To anyone confused: Look today's date!

reply


They were talking about Pi day on the radio. "Today is the day the whole world celebrates Pi day". No, it really doesn't. It's just the US, the only country in the world that uses the MM/DD/YYYY order.

reply


Well, since you're being pedantic... this _is_ the only day the whole world celebrates Pi Day, because it only occurs in the US date format. So... their statement is correct.

reply


Other countries can celebrate Pi Approximation Day, on 22/7

reply


It's 14.3 over here in the UK, so I guess we need to increase the storage capacity 4.55 times.

reply


Is this inspired by "A Hard Boiled Wonderland and the End of the World" by Murakami?

A really great literary read btw.

reply




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: