Hacker News new | past | comments | ask | show | jobs | submit login
Calculating a record-breaking 31.4T digits of Pi with GCP (cloud.google.com)
129 points by ossama 12 days ago | hide | past | web | favorite | 113 comments

Meh, throwing raw power at the problem is not that impressive.

Bellard's [1] 2009 record was much more impressive, because he used a clever formula to break the existing record with a mere (albeit beefy) desktop computer: https://bellard.org/pi/pi2700e9/

The record he broke with his desktop PC was made using a supercomputer cluster.

[1] If somebody is not familiar with him, he is also the original author of QEMU, ffmpeg and the Tiny C Compiler.

And JSLinux - full PC emulation in your browser (this never ceases to amaze me no matter how often I play with it):


There are quite a bit of other very impressive projects under their belt as well. QEMU and FFMPEG to name some I use daily. What a legend.

According to the FAQ, the bottleneck on this process was hard disk speed, would running the algorithm today be noticeably faster on a NVME SSD? Could it be modified to run on a computer with a sufficient amount of RAM, perhaps in batches?

Have there been other improvements in single-core clock speeds, multi-core performance, or any other CPU hardware components to make running this noticeably faster on high-end consumer hardware?

Wow, I had no idea that the same guy is behind both QEMU and FFMPEG! That's really impressive.

I struggle to see any complexity into google approach - unlike bellard's one that is an amazing feat.

They basically pulled more machine to compute more. nothing really impressive. pretty much any dev with that computing power could have done it

It's fairly impressive.

Keeping a single server online for 111 days straight at full CPU and RAM usage over 96 cores and 1.4TB of RAM is a good start. The fact that such a machine exists is already mind-blowing. Then add 25 more nodes running iSCSI, all out 24/7 for 111 days. Hell, just mounting 240TB on a single system is a good stunt, go ahead and try it and let me know it's not "complex".

And your last point kind of IS the point of their marketing: any dev could do it if they have the skill, and they'll rent you the hardware.

While those are interesting systems administration tasks, I believe the point is that the previous record was held by someone who used smarter maths (than the previous previous record) and vastly fewer computing resources.

I think that's the point they are trying to make; that doing this with GCP is super boring and easy.

Exactly. It's a matter of K8s configuration, which a devops intern can do in a few weeks probably.

Compare this with the Chudnovsky brothers, who built their pi calculating "supercomputer" from commodity parts in their NY apartment, back in 1992: https://www.newyorker.com/magazine/1992/03/02/the-mountains-...

(and being mathematicians, they also discovered novel formulas for speeding up their calculation)

I don't think it's about the PI tbh, more like a pretty stunt to me

They did it so I could say,

There are three kinds of mathematicians, those that can count and those that can not.


If you keep posting flamebait we're going to ban you again.

How Many Decimals of Pi Do We Really Need? (NASA/JPL): https://www.jpl.nasa.gov/edu/news/2016/3/16/how-many-decimal...

I don't think anyone is claiming this is useful in any way, it's basically an advert for the stability of Google Cloud Services (apparently the VM was migrated thousands of times over the course of the computaton). It's also just kinda fun.

In short 40 is already crazy too many digits for most if not all applications. Yet in the original article they say «Granted, most scientific applications don’t need π beyond a few hundred digits, …». Is there scientific applications where they would really need more than 40? Or is it just the author making some guess?

The linked NASA article points out that with 40 digits of pi you could compute the circumference of the visible universe to an accuracy equal to the diameter of a hydrogen atom. I'm gonna say there's no practical application that would require even 40 digits, never mind a few hundred

You need more digits than that to accurately compute double-precision trigonometric functions (if the input is close to pi, you need enough accurate digits left after performing range reduction).

This paper claims you need 2/pi accurate to 1144 bits which is about 345 decimal digits: https://www.csee.umbc.edu/~phatak/645/supl/Ng-ArgReduction.p...

As a counterpoint, no real computation I've performed on a computer needed to compute the cos of 2^1023 radians. I can't imagine such a scenario either.

You can either implement the functions accurately or inaccurately. Implementing them inaccurately is a slippery slope. Intel botched the hardware implementations in their processors not only for large inputs but also for inputs nearish to multiples of pi:


>Is there scientific applications where they would really need more than 40? Or is it just the author making some guess?

It is still not clear whether Pi is a normal number or not [0]. Such calculations could in principle give an insight here.

[0] https://en.wikipedia.org/wiki/Pi#Irrationality_and_normality

Whether there's a use for 40 digits or 40 trillion digits, needing more accuracy is not why we find these numbers. There's no 'need' for this at all. The same way there's no 'need' to find bigger prime numbers. We're just seeing how far we can go, maybe seeing a new pattern emerge.

> The same way there's no 'need' to find bigger prime numbers.

From cryptography standpoint, there is always a need to find bigger prime number. I wouldn't compare this with a Pi.

Finding bigger prime numbers has no impact on cryptography at all.

The large prime numbers needed for cryptography are a few hundred digits long. The are generated by picking a random numbers and checking it’s neighbors for primality.

The largest prime numbers that have been discovered have millions of digits. Finding a larger prime would have no effect whatsoever on our ability to quickly generate primes with a few hundred digits.

How does finding a bigger prime help or hinder cryptography? And with many forms of cryptography (e.g. DH kex, ECC) primes only enter as the modulus of the modular arithmetic where being larger is usually not particularly helpful.

Finding a change in the behavior of Pi that emerges at high precisions would be a significant discovery.

But it's only 9s after the 762nd digit?

Absolutely! I'm not disputing that it's not "interesting", just that I wouldn't compare Pi number search to prime number search.

This! Leaving a decimal here .

Alexander Yee's writeup is interesting – CPU utilisation was only about 12%, they encountered quite nasty I/O bottlenecks (particularly for writes.)


Contradicts the Google blog a little, especially where he points out that they hit performance issues with live migration (Google said it worked fine without impact on the application.)

He's also the author of one of the most famous Stack Overflow answers of all time, 'Why is it faster to process a sorted array than an unsorted array?'


This is a time someone must cite 3Blue1Brown: [(183) The most unexpected answer to a counting puzzle - YouTube](https://www.youtube.com/watch?v=HEfHFsfGXjs)

The piano music for each digit at the A-π (API) page is particularly beautiful


Similar, but metal. This is the extended verison that goes to 110 digits.

"The numbers and rests in the formula translate to 16th notes on the kick drum, and 16th note rests. There is no kick drum beats where there are snare drums.

With the decimal point BEFORE the number, and starting with the first number, move that many decimal points to the right and insert that many 16th note rests. Use one 16th note rest to divide the numbers you passed (when applicable). Continue on throughout the rest of the figure. No repeats."

The details of the video have the full explanation


Martin Krzywinski has produced some brilliant Pi posters. http://mkweb.bcgsc.ca/pi/piday/

I have the "dots" one. https://fineartamerica.com/featured/768-digits-of-pi-up-to-f...

Literally this should contain all the songs written, and the ones haven't written yet.

I'm curious, did someone estimate how much it would cost to replicate this experiment on GCloud with the same infrastructure?

Using GCP's pricing calculator and the specs described in the blog, it's around 170,000 USD to run it for 3 months.


If you can break it into appropriate jobs that can handle interruptions, you should use interruptible instances to save money. And you really need to be able to handle that anyway in case an instance fails midway.

Don't miss the video on how Emma did this:

- https://www.youtube.com/watch?v=JvEvTcXF-4Q

(length 3:14)

this is perfect for mounting the Pi filesystem :)


Whenever I see the ridiculous number of places to which pi has been calculated, I wonder if anyone has checked to see if there is a repeating pattern. I mean, 31 trillion places leaves a lot of possibilities for repetition of a couple of billion digits.

Or is my understanding of what constitutes an irrational number outdated? Is there another definition that precludes even looking for repetition in hopes of finding a denominator?

Not consecutively repeating patterns.

But if you take any length pattern of digits, it would repeat an infinite number of times.

Let's take a one digit pattern, say '5'. Since the digits of pi continue forever, there would be an infinite number of '5's.

Now consider a longer pattern '53'. Since the digits of pi continue forever, there would be an infinite number of '53's. In fact, each 53 will be from one of the infinite number of '5's in the previous pattern '5'.

Now consider a longer pattern '537' . . .

. . . to continue . . .

It was long ago when I read Contact (the book), so I hope I don't misremember this too badly. At the end of the book the main character was given a budget, lab, resources, etc. They were working on looking for a message in the digits of Pi. Eventually her beeper beeped and they had found one! It must be woven into the fabric of the universe.

I think any sequence of digits that had any kind of message you are going to eventually find in Pi. Just like, if you look long enough you'll find a 5. If you keep looking you'll eventually find a 53. Keep looking, you'll eventually find a 537. Etc.

This is the concept of [a "normal" number][0] (which is a strange choice of word, I think). Apparently pi is not proven to be normal. But the implication as I understand is that, yes, any given sequence that you'd care to look for is there somewhere.


Almost all numbers are normal, so it seems like a perfectly reasonable choice of terminology =)

[On the other hand, almost none of the numbers someone on the street might name are normal, so ...]

Offhand, I would guess that “normal” is the most overloaded word in mathematics. We should probably use it less.

Ah, I didn't realize that, thanks! That makes sense now that you say it.

Almost all real numbers are a lot of things, though:

   - uncomputable
   - irrational
   - transcendental
   - ...
So it's not really that good of a justification. =)

Pi can be proven to be an irrational number: https://en.wikipedia.org/wiki/Proof_that_%CF%80_is_irrationa...

Quick question. Is 22/7 an approximate value of pie? What is the correct formula and why?

22/7 is an approximation for pi. I prefer to use 355/113, which has an error of about 2.7x10^-7. The reason these are good approximations is that they’re convergents for the continued fraction for pi. See https://en.wikipedia.org/wiki/Continued_fraction

Perhaps they subliminally hope the final pages of the novel "contact" to be real

Are the some PI-like constants, but much harder to calculate? Like even a million digits would be hard to calculate?

That's actually most numbers, but we don't know any yet. https://youtu.be/5TkIe60y2GI numberphile (math YouTube series) describing some related concepts here.

"we have lots of compute resources and access to software, aren't we awesome?"

Awesome, now we are a game of life which has generated Pi to 31.4T digits at least once :D

What is the file size?

> March 14 (represented as 3/14 in many parts of the world)


I wanted to comment the same part, but I wanted to let them the benefit of the doubt, so I went to Wikipedia «date format by country» page [1], and summed up from the table how many people have "MD" vs "DM" in their date format:

DM -> 3392/5550 ~= 61.1% MD -> 2158/5550 ~= 38.9%

Note: I ignored both green and red regions that have both "DM" and "MD" in their format.

So it is definitively not the majority of people. Using the word "some" instead would have been better, but "Many" is not totally wrong… I guess.

[1] https://en.wikipedia.org/wiki/Date_format_by_country#Usage_m...

But that is population not a number of countries. If you look at it from a country perspective, the percentage is tiny.

Probably. The statement talk about "parts of the world", and this concept of "parts" can be seen as countries, continents, surface of earth, atoms, etc. I chose to reduce it to the smallest meaningful entity concerning the concept of dates: humans. It is arbitrary, but justifiable :)

You seem to imply it’s wrong to refer to 40% of the population as “many” — which seems odd.

I treat "some" as an informal expression of relative quantity and "many" as an informal expression of absolute quantity. So it is valid to say many countries use "MD" and many countries use "DM", though most countries use "DM" and only some use "MD".

The US and Philippines use month-day-year.

The rest of the world doesn't. The most popular format is Day-Month-Year, followed by Year-Month-Day.

Yes. ISO chooses year-month-day, which puts largest component first and smallest last. This has the nice benefit that treating it as a string and doing alphanumerical sort matches the actual day sort. Ref. https://en.wikipedia.org/wiki/ISO_8601

The main benefit of Y-M-D, is that no-one uses Y-D-M, and the 'Y' component is easily recognisable. So if you use Y-M-D, then everyone knows it's Y-M-D and there is no ambiguity.

I found out recently the x509/tls certa are YYMMDD


For example. Amazing.

There are benefits to both little endian and big endian

Yes, but the US uses middle endian.

My two favourite date formats are: (1) yyyy-MMM-dd and (2): yyyyMMdd.


(1) eliminates any ambiguity regarding interpretation. You can have an instance of a server of an unknown provenience and/or regional settings and still count on yyyy-MMM-dd to be correctly recognized.

(2) on the other hand has a nice feature of being sortable and can also be stored as an integer value if needed.

The US standard of MM/dd/yyyy is ridiculous. But it's just one of many and I'm not going to fix the world ;)

On the main topic: if anyone needs to calculate the circumference of the Universe (radius: 50bn lightyears ish) with the accuracy of the Planck distance (approximately 1.6 x 10^-30 meters), they need less than 65 digits of Pi to do so. So, as already stated, it's just a PR stunt, nothing more.

yyyy-MM-dd is sortable (2019-03-14)

yyyy-MMM-dd is not (2019-MAR-14)

Yeah, but we can't celebrate on the other way around, because there's no 14th month.

There's the 22nd of July (22/7), also known as Pi Approximation Day, which in any case is more accurate than 3.14.

We can celebrate Pi day every month at 3 14:15:92.65359.

And April only has 30 days.

How can you tell if the results are accurate?

Something quite interesting about pi is that you don't need to start at the beginning to calculate any particular digit, so you can check if the 31.4trillionth digit is correct yourself. The algorithm you'd need to use is the Bailey–Borwein–Plouffe formula - https://en.wikipedia.org/wiki/Bailey%E2%80%93Borwein%E2%80%9... (I think there are others too.)

This is explained in the second paragraph:

> Yee independently verified the calculation using Bellard's formula[0] and BBP formula[1]

[0] https://en.wikipedia.org/wiki/Bellard%27s_formula

[1] https://en.wikipedia.org/wiki/Bailey%E2%80%93Borwein%E2%80%9...

Disappointed the Wikipedia article didn’t explain how in the world they arrived at that formula, why it even works. This type of stuff is pure magic to me.

There is a proof linked from the Wikipedia article: https://www.davidhbailey.com/dhbpapers/digits.pdf

Why would you need to do that? It's not like those digits are useful for anything at all.

I guess it depends on your definition of "useful", but one potential way to use pi would be as a way to transmit compressed data incredibly efficiently. If you could find the data you want to transmit in pi somewhere you'd only need to transmit an offset and a length to send anything than can be represented numerically.

The hard part would be finding what you want to send though...


Unless it's not clear, the size of that offset would be about the same size as whatever you wanted to encode.

Offset and a length in Pi are not going to be shorter than original data to transmit.

Not necessarily. A double gets you pretty far into pi for a cost of just 8 bytes. A little bit of rounding, or checksumming, or other tomfoolery in principle should make it possible to reach absurdly far into pi for a byte or two more.

Given that I personally have to have seen this proposal at least a dozen times myself, and I'm not even particularly "in the field", if it was as easy as people supposed, we'd be using it.

I will point out you're thinking of it incorrectly, though. The challenge with compression isn't to "reach far into pi". The challenge is to reach correctly into pi. A double can identify 2^64 places to reach into pi. It doesn't really matter how you specify those 2^64 locations, that's all it can do, by the simple argument that one double can't specify more than one location.

The location of the US Constitution in pi may be "far", but it's not the "far" we have in our fuzzy human brains where things sort of logarithmically just pile together until there's no meaningful difference to us humans between the 1,839,837,237,938,739,837,954th position of pi and the 1,839,827,237,938,739,837,954th position. But a compression algorithm off by that much is useless; indeed, even being off by one digit (in your choice of base) is going to produce garbage. So it's not just about being able to reach "far", it's about being able to reach far and precisely. It doesn't matter how you arrange the possible 2^64 locations a machine word can point to; specify it as 1,739,837,237,938,739,837,954 + the binary as a 64-bit int for all it matters. That reaches "far" into pi. But you won't find anything useful enough to pay off the 64bits you spent getting there.

(I mean, you want to reach far into Pi, I give you "Go BusyBeaver(64-bit int) digits into Pi". That's reaching in pretty darned far. But it's still useless as a compression algorithm, even with the mighty power of BusyBeaver there.)

(Amusingly, at BB(42) digits into Pi, you get "42" as the next digits, proving that 42 really is the answer. Prove me wrong!)

I would expect that about zero of the propositions have ever been serious. It's just a well-known, long-running amusement among programmers.

An IEEE 754 double type has a range from about 10^−324 to about 10^308. You start losing precision once you exceed 2^51, but beyond that it should be possible to add an extra byte or a few as an offset.

So a handful of bytes would get us addresses far in excess of the current storage capacity of the internet (estimated around 10^24 bytes a few years ago) ... if only pi were certain to contain every possible number sequence up to some arbitrary length (which last I heard isn't known yet) and if we had some reasonable way to search and index all of that content (which we don't).

If your offset is a million digits long you need to be sending more than a million digits of data to make it worthwhile, but the chance of your chosen million digits being available in that space is effectively zero. :)

Regardless of the actual distribution of values in pi, it seems to me that the best case scenario can't compress all data on average when you include the offset and length, because of the pigeonhole principle. Same as any compression method, except practically worse/useless because pi isn't constructed to have the values that occur in practical situations come first.

You've kind of re-invented, in a way, using DCT transforms to encode music more efficiently...

> Why would you need to do that? It's not like those digits are useful for anything at all.

People complain about that.

But . . . why do we have athletic competitions? It's not like it is useful for anything at all.

Or Reality TV for that matter.

Did they find the patterns (circle?) that was referend at the end of Carl Sagan’s Contact book?

Does pi compress? It must if it is only numbers, but only by a half?

Trillions of digits of Pi compresses insanely well - to Pi, of course.

The decompression just takes a while.

pi is effectively randomly distributed (I don't know if this is proven) which means it would be effectively impossible to compress is using conventional entropy or dictionary based techniques.

However, there are compact formulas which can generate pi to arbitrary precision, trading off compute time with space. So it;'s effectively compressible, I guess this relates to Kolomogorov complexity in some way.

If it's stored as ASCII/UTF8 text, then it will be compressible because it's just numbers. On the other hand, if they somehow have a number format that spans that many digits then it'll already be perfectly efficient.

Um, sure. But I don't think anybody really cares if pi is compressible because it's ASCII representation of numbers. At best, you'd be getting some constant factor improvement due to the unused bits, but there's nothing specific to pi in that.

If you consider compression to be 'representing information in a way that you can recover with some processing (from less information)', then any programmatic definition of Pi is an enormous compression of the value.

Nobody wants to steel man this question? The cute trivial++ answers are missing OPs point...

You don't get those gigajoules of heat waste back at the end

Why did she stop calculating? A time limit or resource issue?

They calculated π * 10^13 digits for Pi day (14th March, 3/14 in month/date format) so it was an artistic/aesthetic choice.

Absolute madmen

That could explain the slight trouble they had recently

"31,415,926,535,897 to be exact, or π * 10^13"

Uhm no, you mean, approximately π * 10^13.

Exactly ⌊π * 10^13⌋

I wonder if the blockchain (or contract?) approach could break the record after combining all that gpu power.

That’s not quite how it works.

Why not? It is. Calculating pi is trivially parallelizable. If you wrote good kernels, calculation distribution could be exactly the same.

Blockchains that perform computation (e.g Eth), aren't doing parallel computation as we think of it.

They perform the same computation on every node, there's no way to split up the work between nodes

But what role would a Blockchain have in the parallel computation of pi?

Besides Block chain being useless for this purpose, calculations of pi are disk io limited at the current time.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact