
A previously unnoticed property of prime numbers - tcoppi
https://www.quantamagazine.org/20160313-mathematicians-discover-prime-conspiracy/
======
dantillberg
I almost overlooked this article because I got turned off by the opening
description in base 10, as there is a lot of math trivia out there that is
specific to base 10 which holds little general significance.

But a little further down, the article discusses how this was discovered
originally in base 3, and I think it's much simpler to understand in that
context, since all primes except 3 (aka 10 base 3) end in just either 1 or 2:

"Looking at prime numbers written in base 3 — in which roughly half the primes
end in 1 and half end in 2 — he found that among primes smaller than 1,000, a
prime ending in 1 is more than twice as likely to be followed by a prime
ending in 2 than by another prime ending in 1."

~~~
vessenes
The article is too vague to assess how interesting the claims are, sadly.

It's too bad; I think it wouldn't have detracted from the article to put some
more math in. It's not on the face of it at all surprising that sequential
primes are more likely to be close to each other modulus any number (3, or 10,
or what have you), than they are to be far apart.

By way of analogy, a train comes at 1:09pm. Trains come about every 5 minutes
between 1 and 2 pm, and only on odd numbers. If you simulate a bunch of random
'next trains', 1 is much more likely than 9 because P(9) approx = !P(1,3,5,7).
This is true for all bases.

I think what you'd need to be able to say to say something interesting is 1)
calculate odds of finding the next prime. 2) Randomly generate numbers with a
similar distribution to that of prime occurrence in that range using the Prime
Number Theorem at the very least (1 / log(n) probability roughly). 3) check
final digits and compare to actual distribution of final digits.

If those numbers are very different, then you have in fact found some
underlying structure. But the article doesn't hit very hard on this angle, and
its hard for (probably) any of us to say just thinking about it with minimal
data whether or not there's structure.

~~~
haberman
This is addressed in the article:

> Lemke Oliver and Soundararajan’s first guess for why this bias occurs was a
> simple one: Maybe a prime ending in 3, say, is more likely to be followed by
> a prime ending in 7, 9 or 1 merely because it encounters numbers with those
> endings before it reaches another number ending in 3. For example, 43 is
> followed by 47, 49 and 51 before it hits 53, and one of those numbers, 47,
> is prime.

> But the pair of mathematicians soon realized that this potential explanation
> couldn’t account for the magnitude of the biases they found. Nor could it
> explain why, as the pair found, primes ending in 3 seem to like being
> followed by primes ending in 9 more than 1 or 7. To explain these and other
> preferences, Lemke Oliver and Soundararajan had to delve into the deepest
> model mathematicians have for random behavior in the primes.

~~~
vessenes
They did mention this, but they didn't talk real numbers. And, my second point
is (I think) slightly more subtle -- the probability distributions need to be
considered, not just the counting upward angle.

As I'm writing this out, I'm a little less sure that this would matter, but
I'll leave the comment out for the sake of discussion. :)

~~~
jlarocco
I don't know why you're nitpicking. The article's written for a more general
audience that may be interested in the property, but not necessarily the
nitty-gritty math behind it.

The first paragraph of the article links to the paper, so people who want more
detail can get it.
[http://arxiv.org/pdf/1603.03720v1.pdf](http://arxiv.org/pdf/1603.03720v1.pdf)

For your second point, I don't think there's anything wrong with a paper
announcing they found something interesting, even if they haven't completely
analyzed every aspect of it. Getting the info out early lets a wider audience
look at it, and opens their current research up to scrutiny.

------
mjs
"If Alice tosses a coin until she sees a head followed by a tail, and Bob
tosses a coin until he sees two heads in a row, then on average, Alice will
require four tosses while Bob will require six tosses (try this at home!),
even though head-tail and head-head have an equal chance of appearing after
two coin tosses."

How does this work?

~~~
filleokus
> Intuitively, first, both have to get a head. After that, if Alice "fails" by
> getting a head, then she still needs only one tail. Her first head doesn't
> get "reset" by failing her second try. But after getting a head, if Bob
> fails by getting a tail then he does get reset -- he has to start all over.

[https://www.reddit.com/r/math/comments/4abm4k/expected_numbe...](https://www.reddit.com/r/math/comments/4abm4k/expected_number_of_coin_flips_different_for/)

~~~
hellofunk
I thought the article was saying Alice must get a head followed immediately by
a tail. If that's not the case, then it makes total sense, but it would seem
the article is a bit vague about that.

Actually, the details in the article say:

>even though head-tail and head-head have an equal chance of appearing after
two coin tosses.

That implies that the tail is expected immediately after the head for Alice's
goal.

~~~
thefreeman
Correct, but the point is if Alice doesn't get a tail, that means she got a
head, so she is still in the same position as she was before the flip, only
needing a single tail to complete the sequence. If Bob gets a head, then a
tail, he now needs two consecutive heads to complete his sequence.

------
crnt2
The results are particularly striking in base 11 - looking at primes below 100
million, only 4.3% of primes ending in 2 are followed by another prime ending
in 2 (compared to the 9.1% you would naively expect) with similar numbers for
other pairs.

A prime ending in 2 (in base 11) is also unlikely to be following by a prime
ending in 5, 7 or 9, whereas it is particularly likely to be following by a
prime ending in 4 or 8.

It would be interesting to know what structure there is (if any) in this NxN
"transition matrix" for various bases.

    
    
       1: ( 1,  4.3%) ( 2, 13.0%) ( 3, 14.3%) ( 4,  7.7%) ( 5, 11.5%) ( 6,  6.3%) ( 7, 18.0%) ( 8,  9.0%) ( 9, 10.7%) (10,  5.2%) 
       2: ( 1, 10.0%) ( 2,  3.7%) ( 3, 11.3%) ( 4, 14.1%) ( 5,  7.5%) ( 6, 12.1%) ( 7,  5.3%) ( 8, 17.5%) ( 9,  7.8%) (10, 10.7%) 
       3: ( 1,  6.1%) ( 2, 10.3%) ( 3,  3.7%) ( 4, 12.5%) ( 5, 14.0%) ( 6,  9.2%) ( 7, 12.1%) ( 8,  5.6%) ( 9, 17.5%) (10,  9.0%) 
       4: ( 1, 11.1%) ( 2,  6.1%) ( 3,  9.9%) ( 4,  4.1%) ( 5, 11.5%) ( 6, 14.5%) ( 7,  7.7%) ( 8, 12.0%) ( 9,  5.3%) (10, 18.0%) 
       5: ( 1,  9.6%) ( 2, 12.7%) ( 3,  6.3%) ( 4, 11.5%) ( 5,  4.0%) ( 6, 13.6%) ( 7, 14.5%) ( 8,  9.2%) ( 9, 12.1%) (10,  6.4%) 
       6: ( 1, 17.9%) ( 2,  8.5%) ( 3, 10.6%) ( 4,  5.0%) ( 5,  9.6%) ( 6,  4.0%) ( 7, 11.4%) ( 8, 14.0%) ( 9,  7.5%) (10, 11.5%) 
       7: ( 1,  6.0%) ( 2, 19.1%) ( 3,  8.8%) ( 4, 11.1%) ( 5,  5.1%) ( 6, 11.6%) ( 7,  4.1%) ( 8, 12.5%) ( 9, 14.1%) (10,  7.7%) 
       8: ( 1, 12.0%) ( 2,  5.5%) ( 3, 17.5%) ( 4,  8.8%) ( 5, 10.6%) ( 6,  6.3%) ( 7,  9.9%) ( 8,  3.7%) ( 9, 11.3%) (10, 14.3%) 
       9: ( 1,  8.8%) ( 2, 12.4%) ( 3,  5.5%) ( 4, 19.1%) ( 5,  8.6%) ( 6, 12.7%) ( 7,  6.0%) ( 8, 10.3%) ( 9,  3.7%) (10, 13.0%) 
      10: ( 1, 14.3%) ( 2,  8.8%) ( 3, 12.0%) ( 4,  6.0%) ( 5, 17.8%) ( 6,  9.6%) ( 7, 11.1%) ( 8,  6.1%) ( 9, 10.0%) (10,  4.3%)

~~~
julien-c
It looks symmetric along the anti-diagonal. Strange.

~~~
Steuard
Wow, that's really interesting. The same seems to be true for base 7; I
haven't tried any other prime bases yet (and I don't know how it might extend
to non-prime bases). Anyone have an idea why this seems to hold?

Edit: This basically works for base 10, too. I feel like the reason must
either be very obvious or very deep.

~~~
julien-c
I also tested in base 10, with primes under 1e9, got this:

    
    
                 1       3       7       9
      1       0.0458  0.0746  0.0756  0.0540
      3       0.0599  0.0439  0.0707  0.0755
      7       0.0638  0.0677  0.0438  0.0747
      9       0.0805  0.0638  0.0599  0.0458
    

It appears to have the same transition probabilities as in base 5 (with the
two center rows and columns swapped):

    
    
                 1       2       3       4
      1       0.0458  0.0756  0.0746  0.0540
      2       0.0638  0.0438  0.0677  0.0747
      3       0.0599  0.0707  0.0439  0.0755
      4       0.0805  0.0599  0.0638  0.0458
    

I share your feeling about it being either obvious or deep.

~~~
Steuard
Well, whether it's obvious or not, I think it's in the paper. Immediately
under equation (1.1) of
[http://arxiv.org/abs/1603.03720](http://arxiv.org/abs/1603.03720), the
authors are discussing the second correction term to the distribution of
primes mod _q_ , and they state:

"We can also show that _c_ 2( _q_ ; ( _a_ , _b_ )) = _c_ 2( _q_ ; (− _b_ ,−
_a_ )) for any two reduced residue classes _a_ and _b_ (mod _q_ )."

I'm not 100% certain this is responsible for the phenomenon that we're seeing,
but it seems exceedingly likely. I think I'd need to stare at their formula
for _c_ 2 for a long time to understand where this relation comes from,
though.

------
crnt2
Here is my attempt to work through the math and figure out how "surprising"
this result is.

Clearly, we should expect that for small primes (< 100e6) it is less likely
that a prime ending in K (in base B) will be followed by another prime ending
in K - because for that to happen, none of the B-1 numbers in between can be
prime.

A (very naive) model of the distribution of primes says that every number n
has probability p(n) = 1/log(n) of being prime. Assume that a number n ends
with a k in base b. Define p = 1/log(n). Then the probability that the next
prime ends in k+j is, roughly,

    
    
      q(j) = p * (1-p)^(j-1) * sum_{i=0}^{infinity} (1-p)^(i*b)
           = p * (1-p)^(j-1) / (1 - (1-p)^b)
    

In this formula, j takes values 1 to b (where j = b represents another prime
ending in k).

For n ~ 1,000,000 and working in base b, under this model we would expect to
see around 6.97% of primes ending in k followed by another prime ending in k,
whereas we expect to see 13.7% of primes ending in k+1 (it is apparent how
naive the model is, since in fact we never see a prime ending in k followed by
a prime ending in k+1, except for 2,3). It would not be hard to extend the
model to rule out even primes, or multiples of 3 and 5, but I have not done
this.

Around n ~ 10^60 the distribution starts to look more equal, as the primes are
"spread out" enough that you expect to have long sequences of non-primes
between the primes, which blurs out the distribution to be roughly constant.

I think this is what the article is getting at when it quotes James Maynard as
saying "“It’s the rate at which they even out which is surprising to me". With
a naive model of 'randomness' in the primes, you expect to see this phenomenon
at low numbers (less then 10^60) and for it to slowly disappear at higher
numbers. And indeed, you do see that, but the rate at which the phenomenon
disappears is much slower than the random model predicts.

I think _that_ is why it is surprising.

------
c3534l
> This conspiracy among prime numbers seems, at first glance, to violate a
> longstanding assumption in number theory: that prime numbers behave much
> like random numbers.

I don't think this is true at all. Take a look at the famous Ulam Spiral:
[http://scienceblogs.com/goodmath/wp-
content/blogs.dir/476/fi...](http://scienceblogs.com/goodmath/wp-
content/blogs.dir/476/files/2012/04/i-8e4cdfc0a83e388851408bd4b44fd1e4-Sacks%20spiral.png)

You can see that while prime numbers are difficult to predict, they're
anything but random. I'm not sure why the article is claiming that
mathematicians used to think the distribution of primes was evenly
distributed, which is complete and utter nonsense.

~~~
mjn
At a high level it's not that far off, in the sense that most mathematicians
think nontrivial patterns in the primes are at least unusual, and they mostly
behave randomly. But it's true that there is some structure, which is in
jargon terms called a "conspiracy" among the primes when it's found or
hypothesized. As Terence Tao summarizes it,

> We believe that the primes do not observe any significant pattern beyond the
> obvious ones (e.g. mostly being odd), but we are still a long way from
> making this belief completely rigorous.

That's from this set of slides on structure and randomness in the primes,
which has some other relevant bits in it:
[https://terrytao.files.wordpress.com/2009/07/primes1.pdf](https://terrytao.files.wordpress.com/2009/07/primes1.pdf)

Especially relevant are slides 10-11 on treating the primes as a pseudorandom
set, and then slides 14-15 on using pseudorandom models of the primes to
rigorously (vs. heuristically) prove theorems. That's done by classifying and
ruling out all possible ways nonrandom structure in the actual primes (the
"conspiracies") could sink the specific theorem being proven.

------
valine
Can anyone say what the security implications of this are? Intuitively, it
would seem the less 'random' primes appear to be, the easier it would be to
factor the composite of two prime numbers.

~~~
stromgo
You have it backwards. Their result holds for _any_ set of numbers that behave
randomly, so it holds for primes _because_ primes behave somewhat randomly.

~~~
stromgo
Why the downmods? Everyone can easily verify that there are similar biases if
you replace "isPrime(n)" with "random() < 0.1" in the various code snippets
floating in the thread. The article even admits that the biases are explained
by the prime k-tuples conjecture, which is a model of randomness in primes
from 1923. So primes are not less random than we thought -- they are still
exactly as random as we thought.

------
dr_zoidberg
For those willing to try this over toy code, I did a (horrible, horrible, I'm
terribly ashamed of it) quick Python snippet to check it out:

    
    
        def primer():
            p = 3
            while True:
                is_prime = True
                for x in xrange(2, p):
                    if p % x == 0:
                        is_prime = False
                        break
                if is_prime:
                    yield p
                p += 2
        
        give_prime = primer()
        primes = [1, 2]  # had to separate this into 2 lines because Python
        primes.extend([give_prime.next() for x in xrange(9998)])  # so we get 10,000 primes
        primes_dict = {}
        for i in xrange(len(primes) - 1):
            p0 = str(primes[i])[-1]
            p1 = str(primes[i + 1])[-1]
            key = "".join([p0, "-", p1])
            try:
                primes_dict[key] += 1
            except:
               primes_dict[key] = 1
        # let's delete the 4 outliers from the begining
        del(primes_dict["1-2"])
        del(primes_dict["2-3"])
        del(primes_dict["3-5"])
        del(primes_dict["5-7"])
        

So long story short, my results over 10,000 primes:

    
    
        In [57]: primes_dict
        Out[57]:
        {'1-1': 365,
         '1-3': 833,
         '1-7': 889,
         '1-9': 397,
         '3-1': 529,
         '3-3': 324,
         '3-7': 754,
         '3-9': 906,
         '7-1': 655,
         '7-3': 722,
         '7-7': 323,
         '7-9': 808,
         '9-1': 935,
         '9-3': 635,
         '9-7': 541,
         '9-9': 379}
    

And you can clearly see that the tendency to avoid the same last digit is
starting to show, thow those that end in 1 are still not showing it
completely. Tried with 100,000 primes but the (horrible) algorithm kinda got
stuck so I settled with 10,000 to make this a "quick test".

Before you go, please believe me I'm sorry for _primer()_ and _give_prime_.
I'll try to never do those kind of things again.

Edit: I've edited this like 5 times already over little typos and bad
transcription mistakes I did all over the place. Should work now.

~~~
chucksmash
In the spirit of "every programmer loves to fizzbuzz", I rewrote this in Rust.
Aside from rewriting it in a different language, the biggest change I made was
only doing trial division against known primes <= the square root of a number
we are checking for primality. Able to get the first 1,000,000 primes in 20
seconds:

    
    
        use std::collections::HashMap;
    
        pub fn first_n_primes(n: u64) -> Vec<u64> {
            let mut primes = Vec::new();
            let mut candidate = 3;
            let mut count = 0;
            if n >= 1 {
                primes.push(2);
                count += 1;
            }
            while count <= n {
                let candidate_sqrt = ((candidate as f64).sqrt().ceil() + 1.0) as u64;
                let mut is_prime: bool = true;
                for prime in &primes {
                    if candidate % prime == 0 {
                        is_prime = false;
                        break;
                    }
                    if prime > &candidate_sqrt {
                        break;
                    }
                }
                if is_prime {
                    primes.push(candidate);
                    count += 1;
                }
                candidate += 2;
            }
            primes
        }
    
        fn main() {
            let mut last_digit_pair_counts: HashMap<String, u64> = HashMap::new();
            let primes = first_n_primes(1000000);
    
            for i in 0..(primes.len() - 1) {
                let last_digit0 = primes[i] % 10;
                let last_digit1 = primes[i+1] % 10;
                let digit_str = format!("{}-{}", last_digit0, last_digit1).to_string();
                let counter = last_digit_pair_counts.entry(digit_str).or_insert(0);
                *counter += 1;
            }
            last_digit_pair_counts.remove("2-3");
            last_digit_pair_counts.remove("3-5");
            last_digit_pair_counts.remove("5-7");
            let mut ordered_keys: Vec<String> = last_digit_pair_counts.keys().cloned().collect();
            ordered_keys.sort();
            for key in &ordered_keys {
                println!("{}: {}", key, last_digit_pair_counts[key]);
            }
        }
    

which outputs:

    
    
        1-1: 42853
        1-3: 77475
        1-7: 79453
        1-9: 50153
        3-1: 58255
        3-3: 39668
        3-7: 72828
        3-9: 79358
        7-1: 64230
        7-3: 68595
        7-7: 39603
        7-9: 77586
        9-1: 84596
        9-3: 64371
        9-7: 58130
        9-9: 42843

~~~
Houshalter
Oh this is fun. I tried it in Lua:

    
    
        primes = {}
        function inPrimes(n)
        	for _, v in ipairs(primes) do
        		if n%v == 0 then return false end
        		if v > math.ceil(math.sqrt(n)) then break end
        	end
        	return true
        end
        for i = 3, 1.6e7, 2 do
        	if inPrimes(i) then table.insert(primes, i) end
        end
        last = '7'
        totalDigits = {}
        for i = 4, #primes do
        	c = last..tostring(primes[i]):sub(-1)
        	totalDigits[c] = totalDigits[c] and totalDigits[c] + 1 or 1
        	last = c:sub(-1)
        end
        for k, v in pairs(totalDigits) do print(k, v) end
    

Gets just as many primes and runs in 7 seconds in LuaJIT.

~~~
eddyb
In a sibling answer, steveklabnik suggests the Rust version was compiled
without optimizations -
[https://news.ycombinator.com/item?id=11285569](https://news.ycombinator.com/item?id=11285569)

Could you try running both on the same machine? I'm curious if LuaJIT can
still beat Rust if both have optimizations working.

I know it can beat native code _sometimes_ , which is pretty impressive (it
finds common cases and specializes to them AFAIK, almost like "sufficiently
advanced optimizing compiler" fairy tales).

~~~
Houshalter
I don't know how to Rust, but you are welcome to try it. LuaJIT is amazingly
fast. It shouldn't be faster than native code in general, but it's still not
orders of magnitude behind like interpreted languages are. As I understand it,
JITs can sometimes do better by doing statistics on code paths and optimizing
them.

And of course, it takes 0 seconds to compile, if you factor in that time : )

~~~
eddyb
LuaJIT 2.0.4:

3.67user 0.01system 0:03.68elapsed 99%CPU

rustc 1.9.0-nightly (74b886ab1 2016-03-13) (-C opt-level=3):

5.18user 0.00system 0:05.20elapsed 99%CPU

Switching to BTreeMap gives me:

4.36user 0.00system 0:04.38elapsed 99%CPU

Using u8 as the key (last_digit0*10 + last_digit1) instead of a string:

4.18user 0.00system 0:04.18elapsed 99%CPU

I tried preallocating the vector of primes and it didn't help, strangely
enough.

Replacing the floating-point sqrt with squaring in the comparison does bring
it a bit lower:

4.04user 0.00system 0:04.05elapsed 99%CPU

I don't know how to bring that number lower without using a sieve, perf
reports that most of the time is spent in:

86,31 │ div %rbx

I've also just noticed that the Lua and the Rust code don't give the same
results, but I can't easily tell why.

Oh! The largest prime is 0x00ec4bab, so they can be stored as u32. Final Rust
result:

2.33user 0.00system 0:02.33elapsed 99%CPU

Code:
[https://gist.github.com/eddyb/51a92fa2edf20d6e23fe](https://gist.github.com/eddyb/51a92fa2edf20d6e23fe)

~~~
Houshalter
Nice. I suppose I should try optimizing the Lua code some more. There are some
nasty branches in there that might slow it down.

The Lua code is not exactly identical to the rust code. I test all numbers
less than n, as opposed to counting n primes. I set n so it got slightly more
primes than the rust code though.

------
grandalf
Primes seem to me to be more of an information theoretic concept than a number
concept.

Primes are the simplest way to encode specific kinds of graphs that
_unambiguously_ encodes all sub-graphs.

If you try to come up with a bit-representation that is equivalently rich it
becomes difficult to think of one that is as simple yet preserves the
semantics of the factorization tree.

So I guess my point is that the factorization tree of numbers is the
fundamental concept, and it's information theoretic. Primes happen to be an
encoding of that fundamental concept into integers, but if we found an
equivalently rich representation using a different encoding, we might
understand primes better. I doubt that the quirks of the encoding has anything
to do with the fundamental concept however.

------
Houshalter
I once was really interested in finding patterns in prime numbers. I got a
long csv file of prime numbers from the internet. I used symbolic regression
on it, to try to predict the next prime in the list.

Symbolic regression basically uses genetic algorithms to fit mathematical
expressions to data. The program I was using, Eureqa, tries to find the
_simplest_ expressions that fit, with only a handful of elements. To prevent
overfitting, and give a human understandable model.

Anyway this actually worked. Far from perfectly of course, but it was able to
get much better than random predictions. It was definitely finding some
pattern.

Unfortunately I used up Eureqas free trial forever ago, and I'm not going to
pay thousands of dollars to buy a subscription. But I am now thinking of
writing my own software to do this, and then running it on a dataset of
mathematical sequences like the primes.

------
Jabbles
I'm shocked at how simple a pattern was previously unknown.

[https://play.golang.org/p/ajn-wMo_3V](https://play.golang.org/p/ajn-wMo_3V)

~~~
jessaustin
You might want to sort that by frequency before printing it. The repeats sort
of get lost when they're just mixed in with everything else.

------
personjerry
Wrote some code to compare random numbers to the primes for this property. To
generate the random numbers, I apply the Prime Number theorem as a probability
to determine if we want to select it, and then compare the stats to that of
the actual primes.
[https://gist.github.com/personjerry/c58483daaf372acbe1fa](https://gist.github.com/personjerry/c58483daaf372acbe1fa)

    
    
        cumulative:
        1 to 1: 30768 rand, 28289 prime
        1 to 3: 53573 rand, 51569 prime
        1 to 7: 44306 rand, 53263 prime
        1 to 9: 36968 rand, 32816 prime
        ratios:
        1 to 1: 0.18578027352594872 rand, 0.17048036302934247 prime
        1 to 3: 0.323479153458322 rand, 0.3107745710721539 prime
        1 to 7: 0.26752407692539926 rand, 0.3209832647330011 prime
        1 to 9: 0.22321649609032998 rand, 0.19776180116550257 prime
    
        cumulative:
        3 to 1: 37015 rand, 38455 prime
        3 to 3: 31015 rand, 25900 prime
        3 to 7: 53377 rand, 48596 prime
        3 to 9: 44594 rand, 53082 prime
        ratios:
        3 to 1: 0.22298058445431052 rand, 0.23161058343823215 prime
        3 to 3: 0.18683622387816942 rand, 0.15599308571187656 prime
        3 to 7: 0.3215462557454473 rand, 0.2926888028283534 prime
        3 to 9: 0.2686369359220728 rand, 0.3197075280215379 prime
    
        cumulative:
        7 to 1: 44412 rand, 42590 prime
        7 to 3: 36923 rand, 45728 prime
        7 to 7: 30588 rand, 25886 prime
        7 to 9: 53404 rand, 51800 prime
        ratios:
        7 to 1: 0.26863125805222376 rand, 0.25656008288956894 prime
        7 to 3: 0.2233331518747694 rand, 0.275463241849594 prime
        7 to 7: 0.18501515179008873 rand, 0.15593600154213152 prime
        7 to 9: 0.3230204382829181 rand, 0.3120406737187056 prime
    
        cumulative:
        9 to 1: 53453 rand, 56602 prime
        9 to 3: 44489 rand, 42837 prime
        9 to 7: 37022 rand, 38259 prime
        9 to 9: 30902 rand, 28144 prime
        ratios:
        9 to 1: 0.322266166664657 rand, 0.3413007561413876 prime
        9 to 3: 0.2682225410873838 rand, 0.2583000687401261 prime
        9 to 7: 0.22320427332907286 rand, 0.23069548124118136 prime
        9 to 9: 0.18630701891888632 rand, 0.1697036938773049 prime
    

Unless I'm doing something wrong, it honestly it doesn't seem like the actual
prime numbers have a statistic that deviates from random numbers with a prime
distribution. Hence it looks like to me just the result of a) specifying the
"next" number which naturally favors the digit after it and b) probability of
a given number being prime (prime number theorem).

------
JoeAltmaier
Its supposed to be true in every base. But of course in Binary its not true.
Every prime in Binary ends in a 1; its followed by another prime that ends in
a 1.

~~~
eterm
Not every prime, 10 is of course prime and ends in 0.

~~~
JoeAltmaier
Likewise in base ten, 2 and 5 are prime. But this is a statistical argument.
So that's cute but not significant?

~~~
jessaustin
"100% of the base-2 primes ending in '0' are followed by a prime ending in
'1'."

------
Terr_
> This conspiracy among prime numbers seems, at first glance, to violate a
> longstanding assumption in number theory: that prime numbers behave much
> like random numbers.

I wonder if this is really an artifact like Benford's Law, which _also_
involves first-digit-frequency (in any base) and _also_ involves certain kinds
of "random" numbers.

To recycle a past comment:

> If you have a random starting value (X) multiplied by a second random factor
> (Y), most of the time the result will start with a one.

> You're basically throwing darts at logarithmic graph paper! The area covered
> by squares which "start with 1" is larger than the area covered by square
> which "start with 9".

~~~
etruong42
Applying the ideas from Benford's law to these findings seem very promising to
me as primes follow a lognormal distribution[1] and that is where we expect
Benford's law to apply[2]

[1]
[https://en.wikipedia.org/wiki/Prime_number_theorem](https://en.wikipedia.org/wiki/Prime_number_theorem)

[2]
[https://en.wikipedia.org/wiki/Benford%27s_law#Multiplicative...](https://en.wikipedia.org/wiki/Benford%27s_law#Multiplicative_fluctuations)

------
taf2
Does this have any ramifications in security? I vaguely understand we rely on
prime numbers to create secrets that are hard to guess... So does this in
someway make it easier to possibly guess?

~~~
contravariant
Shouldn't be a problem, from what I understand in cryptography you randomly
generate numbers and check if they're prime, the result is a uniformly random
prime. Even if you know exactly which numbers are prime that still doesn't
help you in figuring out which prime was used.

Unless the random number generator was flawed of course, but that's a
different issue.

~~~
chippy
I wonder, do random number generator use primes?

~~~
contravariant
Quite a few of them do. For example: Mersenne twisters, linear congruential
generators. Not sure about cryptographic random number generators though, but
most of them probably use primes one way or another.

------
arghbleargh
It should be noted that from the original paper, the asymptotic formula that
Oliver and Soundararajan conjecture still says that each possibility for the
last digits of consecutive primes should occur about the same number of times
in the limit. It's just that the amount by which the frequencies vary is more
than you would expect from the most naive model of primes as being "random".

------
silveira
I created a ulam spiral visualization for this article using JavaScript and
HTML5 canvas. The demo and source code are at
[http://silveiraneto.net/2016/03/14/the-prime-conspiracy-
visu...](http://silveiraneto.net/2016/03/14/the-prime-conspiracy-visualized-
over-an-ulam-spiral-in-html5/)

------
ms013
For those who have Mathematica and want to experiment with this, here's a
quick function to generate the data:

    
    
        f[n_, base_] :=
         Module[
          {m, d, dpairs},
          d = Table[Last[IntegerDigits[Prime[i], base]], {i, 1, n}];
          dpairs = Table[{d[[i]], d[[i + 1]]}, {i, 1, Length[d] - 1}];
          Map[#[[1]] -> #[[2]] &, Tally[dpairs]]
         ]
    

For the first n primes in a given base, it returns the mapping {i,j}->count
for the all pairings of digit i followed by digit j. E.g., for the first
million base 5 primes

    
    
        {2, 3} -> 68596
        {3, 0} -> 1, 
        {0, 2} -> 1, 
        {2, 1} -> 64230 
        {1, 3} -> 77475 
        {3, 2} -> 72827 
        {2, 4} -> 77586 
        {4, 3} -> 64371 
        {3, 4} -> 79358 
        {4, 1} -> 84596 
        {1, 2} -> 79453 
        {4, 2} -> 58130 
        {4, 4} -> 42843 
        {1, 1} -> 42853 
        {3, 3} -> 39668 
        {2, 2} -> 39603 
        {1, 4} -> 50153 
        {3, 1} -> 58255

~~~
Steuard
It took me embarrassingly long to notice that Last[IntegerDigits[x,base]] is
just Mod[x,base] (which ought to be faster).

I guess the author wanted to avoid discussing modular arithmetic in an article
for general audiences?

------
hellofunk
>If Alice tosses a coin until she sees a head followed by a tail, and Bob
tosses a coin until he sees two heads in a row, then on average, Alice will
require four tosses while Bob will require six tosses (try this at home!),
even though head-tail and head-head have an equal chance of appearing after
two coin tosses.

Now that is particularly interesting to think about.

~~~
dhbradshaw
That's really cool. An easy way to understand it is by thinking about
bunching. Since you're only flipping until you hit the first matching
sequence, on average you'll hit the more evenly distributed sequence more
quickly than the bunched sequence.

Multiple heads in a row are more bunched than transition sequences because,
for example, a sequence of three heads in a row will include two sequences
with two heads in a row. You can't do that with a transition sequence--it
takes at least four tosses to get two identical transition sequences.

------
kordless
I just spent 5 minutes looking for a chart showing the distribution of
reserved commands in Python. Didn't find much.

A while back, I read something about different number bases' ability to help
find additional primes. The base itself was prime, so maybe 7 or 13. Can't
find the article ATM. I hypothesized that prime numbers are "code" provided by
this universe to allow us to access other data stored in other primes. Quines
of a sort, if you will. One way to invalidate this hypothesis would be to do a
mean distribution of basic operators in a simple programing language and
compare it to what we are seeing in primes.

------
wallacoloo
As a non-mathematician, this is a pretty neat read. I was distracted by the
personification of the numbers though (they have 'likes' and 'preferences',
which in my day-to-day vocab are concepts applicable only to things that
possess the ability to think). Is this common in mathematical writing, or is
this paper an abnormality in that sense?

(I don't mean to nitpick - I'm genuinely curious. I recall seeing the same
thing in high-school chemistry, but never in physics, for example, and I'm
curious if entire fields see this effect or if it's a product only of the
audience being written to).

------
jamieb007
"If Alice tosses a coin until she sees a head followed by a tail, and Bob
tosses a coin until he sees two heads in a row, then on average, Alice will
require four tosses while Bob will require six tosses (try this at home!),
even though head-tail and head-head have an equal chance of appearing after
two coin tosses."

Counter-intuitive at first but makes sense - the outcomes as a whole converge
towards the average (50% heads, 50% tails). Nonetheless, it shows that each
toss is related to the others. One can expect that primes are even more
related - or at least to the primes that came before.

~~~
jastr
This is actually not quite the reason that HT occurs before HH, and is called
the Gambler's Fallacy [0].

The reason HT is likely to occur quicker is: after a failed win, HH needs two
flips to win (1/4 chance), but HT can win in just one flip (1/2 chance).

[0]
[https://en.wikipedia.org/wiki/Gambler%27s_fallacy](https://en.wikipedia.org/wiki/Gambler%27s_fallacy)

~~~
jamieb007
right, so the coin toss example is not related to all the outcomes but rather
just the most recent. In contrast, the OP seems to show that a prime is
related to a previous prime - and that previous prime is related to a previous
- so by extent, they are all related.

------
aaronchall
This phenomenon feels trivial - Think of 3 - X11, X13, X17, X19, X21, X23,
X27, X29, X31, X33, X37, X39 - how many of these pseudonumbers will be
divisible by 3? I count 4 where X is 0, 1, and 2, one time each for numbers
ending in 1, 3, 7, and 9.

Just based on this knowledge, I know that a prime number is guaranteed not to
be immediately followed by another one with the same ending 1 time in 2.

I'm not sure these fellows have found anything particularly interesting, but
if so, and I have missed something, kudos to them.

------
mjevans
It would be interesting to know how well this holds up over different scales.

Does a prediction based on base 3 hold up better over primes under 100 than
1000 and 1000 than 10000?

Is the ratio of how well a base ending sequence is predictive scale to a
predictable range based on the base that the prime number field is viewed
within?

Just thinking about what might be happening, I would imagine that the answer
is yes, but that a lot of crunching would be needed to graph and deduce a
relationship to an actual predictive property statement.

------
CarolineW
Secondary discussion:
[https://news.ycombinator.com/item?id=11282749](https://news.ycombinator.com/item?id=11282749)

------
jeffdavis
I am surprised this took so long to discover -- wouldn't this be one of the
first things to examine when looking for non-randomness?

~~~
vorg
Perhaps it has been discovered before, maybe many times. But the only way to
know a discovery in math, or any science, is the first time is to search all
the academic journals, an activity that's only feasible if you're a part of
the university mathematics research industry, especially considering how much
some academic journals charge for subscriptions. Then you need to announce
your discovery, after peer reviewing of course.

------
Too
Isn't this similar to the property that _any_ number taken from a natural
distribution will tend to contain more lower digits than high, regardless of
unit and base.

Because 100x will become >200 "slower" than 200x becomes >300 etc. With slower
meaning lower value of x. x in this case is usually a random variable centered
around 1.

------
undoware
So, the million dollar question is: how does this affect my security and
privacy? Does this pattern mean encryption based on the assumption of the
inherent randomness of primes is now less secure? E.g. is there now less
entropy in a given set of primes?

I have a premonition of Quite a Bit of Trouble coming down the pipe.

------
baby
> Looking at prime numbers written in base 3

I HAD THE EXACT SAME IDEA. But I would probably have reached no conclusion.

------
lohankin
Possible generalization (example):

23=(7) _3+(2);

7=(2)_3+(1);

2=(0) _3+(2);

0=(0)_3+(0);

Take only remainders, and form a vector a=(2 1 2 0) What can be said about
components of the vector for the prime next to p? E.g., do i-th components
repel, like the 1-st ones?

------
synred
So odes this anti-correlation with of last digits exist in other bases?
Clearly, not in base 2.

If we had 8 fingers would we have notice something similar. Would it be even
stronger.

If we used base 60 would it even be there?

-Traruh

------
girkyturkey
This is absolutely incredible. This is why mathematics is so amazing, that
something so small can be missed for centuries. All about how to look at
things!

------
synred
Does this anti-correlation exist for other bases?

Clearly, not for base 2.

How about base 816 or 60? How about the unpopular odd numbered bases?

------
caf
So I wonder if a similar pattern is observable for Prime_i and Prime_i+n with
some n > 1?

------
callesgg
It is not quite clear to me how. But to me it seams like it has to do with the
fact that we use a number system that has a base.

------
porcodio
Interesting

------
Kenji
_Soundararajan showed his findings to postdoctoral researcher Lemke Oliver,
who was shocked. He immediately wrote a program that searched much farther out
along the number line — through the first 400 billion primes._

This is how modern computers revolutionized even the most theoretical fields
like number theory. Remarkable, I love it!

------
CarolineW
That's bizarre - I tried to submit this four hours ago and was told it was a
duplicate. I searched, and couldn't find the original submission to upvote it,
and now it's submitted again, _after_ my submission was declined.

I don't understand.

But it's a great result, so I've upvoted it, despite being confused.

~~~
jsnell
Dup detection applies to deleted posts, but you can't find them using search.
So what might have happened is that somebody submitted this link, deleted it,
and then you tried to submit it.

~~~
mod
The question would be how it got submitted after that by someone else.

~~~
jedberg
After a while if an article doesn't get a lot of traction it's removed from
the duplicates list so it can be submitted again.

------
mikek
> Quanta Magazine NUMBER THEORY Mathematicians Discover Prime Conspiracy A
> previously unnoticed property of prime numbers seems to violate a
> longstanding assumption about how they behave.

Zim + Teemo for Quanta Magazine By: Erica Klarreich March 13, 2016 Comments
(2)

Share this:facebooktwitterredditmail PDFPrint Two mathematicians have
uncovered a simple, previously unnoticed property of prime numbers — those
numbers that are divisible only by 1 and themselves. Prime numbers, it seems,
have decided preferences about the final digits of the primes that immediately
follow them.

Among the first billion prime numbers, for instance, a prime ending in 9 is
almost 65 percent more likely to be followed by a prime ending in 1 than
another prime ending in 9. In a paper posted online today, Kannan
Soundararajan and Robert Lemke Oliver of Stanford University present both
numerical and theoretical evidence that prime numbers repel other would-be
primes that end in the same digit, and have varied predilections for being
followed by primes ending in the other possible final digits.

“We’ve been studying primes for a long time, and no one spotted this before,”
said Andrew Granville, a number theorist at the University of Montreal and
University College London. “It’s crazy.”

The discovery is the exact opposite of what most mathematicians would have
predicted, said Ken Ono, a number theorist at Emory University in Atlanta.
When he first heard the news, he said, “I was floored. I thought, ‘For sure,
your program’s not working.’”

This conspiracy among prime numbers seems, at first glance, to violate a
longstanding assumption in number theory: that prime numbers behave much like
random numbers. Most mathematicians would have assumed, Granville and Ono
agreed, that a prime should have an equal chance of being followed by a prime
ending in 1, 3, 7 or 9 (the four possible endings for all prime numbers except
2 and 5).

“I can’t believe anyone in the world would have guessed this,” Granville said.
Even after having seen Lemke Oliver and Soundararajan’s analysis of their
phenomenon, he said, “it still seems like a strange thing.”

Yet the pair’s work doesn’t upend the notion that primes behave randomly so
much as point to how subtle their particular mix of randomness and order is.
“Can we redefine what ‘random’ means in this context so that once again, [this
phenomenon] looks like it might be random?” Soundararajan said. “That’s what
we think we’ve done.”

Prime Preferences

Soundararajan was drawn to study consecutive primes after hearing a lecture at
Stanford by the mathematician Tadashi Tokieda, of the University of Cambridge,
in which he mentioned a counterintuitive property of coin-tossing: If Alice
tosses a coin until she sees a head followed by a tail, and Bob tosses a coin
until he sees two heads in a row, then on average, Alice will require four
tosses while Bob will require six tosses (try this at home!), even though
head-tail and head-head have an equal chance of appearing after two coin
tosses.

Can someone explain this?

~~~
mikek
Sorry, this site has some javascript that copies and pastes the entire article
when you try and copy and paste a few sentences. Didn't catch this before
submitting on mobile.

~~~
ghurtado
Eh? That is not something a webpage can do, for all sorts of reasons. HN can
only paste what you have copied before, so I'm afraid this is most likely a
case of user error.

~~~
stordoff
I agree that this is probably just an error, but I have seen sites that
manipulate copies before (usually to add "Read more at URL" or something)

~~~
ghurtado
You're right, I didn't explain myself correctly. What I meant to say is that
Javascript from one site is forbidden by default from operating on the
contents of another site. So if you didn't copy the text from site A in the
first place, it is not possible (at a basic security level) for site B code to
access the text from site A which never made it into the clipboard.

~~~
ghurtado
It would be much more helpful for everyone if you explained where you think
I'm incorrect, rather than mindless downvoting. Having worked professionally
with Javascript for almost 20 years, I would hate to miss an opportunity to
learn something more about it.

------
sparrish
While fascinating, I fail to see how this qualifies as a "conspiracy". Are
there definitions of "Conspiracy" in mathematics that I'm unaware of?

~~~
knughit
It's just figurative language.

------
tdsamardzhiev
I've thought about that when I was a little kid. True story!

------
learnstats2
Perhaps I have missed something, but the introductory example seems to follow
from simple probability and therefore I do not find it mathematically
remarkable.

Say, there is a fixed and equal probability that each number ending with 9 and
1 is prime. I could go along with that assumption, although the fact that
primes get less likely as you go higher is potentially relevant.

What the authors consider here is starting with a prime ending in 9. So the
next potential prime ends in 1. If only because 1 is the next number to be
checked, a 1-prime is more likely to appear next than a 9-prime. The
probability of that can be calculated, depending on your assumptions, as a
geometric sequence. In any case, P(next prime is 1) > P(next prime is 9).

"Most mathematicians would have assumed, Granville and Ono agreed, that a
[known] prime should have an equal chance of being followed by a prime ending
in 1, 3, 7 or 9" So - I'm a definite nope on that.

This result appears to be exactly what I would have assumed was the case.

~~~
crnt2
This is explicitly ruled out in the article -

> Lemke Oliver and Soundararajan’s first guess for why this bias occurs was a
> simple one: Maybe a prime ending in 3, say, is more likely to be followed by
> a prime ending in 7, 9 or 1 merely because it encounters numbers with those
> endings before it reaches another number ending in 3. For example, 43 is
> followed by 47, 49 and 51 before it hits 53, and one of those numbers, 47,
> is prime.

> But the pair of mathematicians soon realized that this potential explanation
> couldn’t account for the magnitude of the biases they found. Nor could it
> explain why, as the pair found, primes ending in 3 seem to like being
> followed by primes ending in 9 more than 1 or 7.

~~~
stromgo
Ok so the random model "1,3,7,9 mod 10" doesn't fully work, but let's look at
what happens mod 30. Large primes have the following possible remainders mod
30: 1, 7, 11, 13, 17, 19, 23, 29. We see that when a prime ends with a 3 then
p + 6 (ending in 9) is always an option, but p + 4 (ending in 7) is an option
only half of the time. I think that this fully explains why a prime ending
with 3 is more likely to be followed by a prime ending in 9. So basically the
OP is on the right track, and his random model just needed to be refined a
bit.

~~~
justin_vanw
So you figure two respected mathematicians, and of course everyone they have
told about this result so far, missed a completely trivial observation?

~~~
stromgo
No I don't. The article continues:

> The primes' preferences about the final digits of the primes that follow
> them can be explained, Soundararajan and Lemke Oliver found, using a much
> more refined model of randomness in primes, something called the prime
> k-tuples conjecture.

So I guess that my observation is just a special case of this "prime k-tuples
conjecture".

~~~
justin_vanw
What makes you think that?

~~~
stromgo
Are you contesting it, or just curious? You already know that my observation
explains the "3 followed by 9" bias. You already know that the mathematicians
call the conjecture "a much more refined model of randomness in primes" which
is similar to how I described what I was doing. In addition, MathWorld's
article on the k-Tuple Conjecture talks about residues mod q, which is similar
to what I'm doing when I look at primes mod 10*3. All these elements point at
some connection between the k-tuple conjecture and my observation.

~~~
justin_vanw
Well 'points to a connection' is not the same as 'is a special case of', I'm
not an expert on this subject, but looking at the conjecture is about the
_asymptotic_ distribution of certain patterns in prime numbers. I don't think
that an example like the one you are giving is 'related' except in a hand-wavy
vague way that anything dealing with prime numbers and patterns is related to
everything else dealing with prime numbers and patterns of primes.

~~~
stromgo
All I meant by "special case" was that "mod 30" isn't the whole story -- more
like the most significant correction on top of what the OP said, with other
smaller corrections possible, and the entire set of corrections being
described by the k-tuple conjecture.

It's amazing how people can be picky and negative on HN. Someone positive
would instead congratulate me for making the gist of what the prime k-tuple
conjecture says about the biases easily understandable. Oh well.

~~~
justin_vanw
So you are making comments about pure mathematics. If you want to use
imprecise language and not be corrected, you should probably go write a book
review or something. In math, precise language and correcting someone or
forcing someone to give justification for something is expected and completely
usual. It would be bizarre when talking to a mathematician about mathematics
if they didn't immediately correct or demand clarification and justification
when you say something vague or incorrect or unjustified.

