Hacker News new | past | comments | ask | show | jobs | submit login
Is the largest root of a random real polynomial more likely real than complex? (mathoverflow.net)
203 points by jjgreen 15 days ago | hide | past | favorite | 121 comments



Wow that is crazy that it is between chance and 1/phi.

In the linked MSE post, it has now been proved that the probability that the largest root is real is at least 6.2%

And! at least 1/10th of 1/phi. Hahaha! :)

I think the link of phi with number theory is natural in the sense that the primes are not random (as is commonly mistaken) but arise recursively from prior primes: they are exactly the 'gaps' not hit by multiples of prior primes, so there's a kind of 'natural growth' pattern that we would expect to reflect some manner of e and phi

It's a very fundamental pattern of magnitude. Truth beauty et all


You get e from the primes by averaging the set size of only growing gaps (2,3,5)(7,11)(13,17)... or only non-shrinking gaps (2,3,5,7,11)(13,17)(19,23,29)...

Not a property of primes per se, just a property of growth. It works better with a set of randoms.


I thinking that works for primes because primes are approximately randomly distributed.

But primes are more directly "e" if you look at π(N), the count of primes less than N. That is ~n/ ln(n)

https://en.m.wikipedia.org/wiki/Prime-counting_function


I think that's also another way you can get e from primes.

Can you explain more what you mean by "set size of only growing gaps" - I like this!


(2,3,5) because 2 to 3 = 1 and 3 to 5 = 2. 5 to 7 is also 2 so the gap does not grow, so 7 starts the next set.


Hey that’s cool I like that! Haha :) Thanks for sharing this. You have any GitHub repo or blog talking more about this??? :)


Beautiful! Phi always seems to show up when there is crowding.


Thanks! Let's collab. We should start a "primes investigation" github repo!!!! :)


I love number theory in the context of waves. Here’s an example:

https://arxiv.org/pdf/1910.10751


Thanks. When you mention waves I'm wondering how we can leverage FFT to make a function of the next prime? But it has to include all primes, I guess? Or is this Reimann, already??


Yes, high-level descriptions of the RH are often vague about the connection between the zeta function and the prime numbers, but the short version is that, for every point on on the complex plane, there is an associated real-valued "Riemann harmonic function"; if you sum up the Riemann harmonic functions of the non-trivial zeros of the zeta function, you get the exact prime-counting step function, and this function is much more predictable if the RH is true.


Cool, thank you for confirming what I knew!!! It's nice to have the connection between intuition and formal :)

It is strange that there's such a tight analogy between the zeroes (gaps*) and the primes (gaps in multiples). Suggests some patterning even more fundamental than just gaps of integer magnitude multiples.

Although maybe it's just another description of the same thing for now and can't pierce through to yield more insight...Unsure! I suspect there's something more fundamental out there waiting to be discovered that will illuminate the whole thing :)

* i guess you can call zeroes gaps of a function in a few ways: simply they are just a gap in the magnitude, as they zero it out; or, in the sense that they are 'gaps in the curvature of whatever it integrates to / is the differential of'; but also I suppose a nicer analogy is that the zeroes are 'factors' of the function [Y(x) = 0 at x = z, then (x-z) is a factor of Y right?] maybe that's the tighter connection, but would RH then imply some close connection between factors of complex functions or polynomials and prime factors?? Anyway, just explorin I totally understand if you don't wish to reply no probs! :)


What do you get with discrete convolutions of the prime series, and what is the negative space called?


Interesting question. If you Mean it like intending to provoke more thinking, I answer: I don’t know. Would be interesting!

But if you mean it like you’re asking me then: discrete/or-not convolution of “sine waves” with wavelength p for each prime (and optionally prime powers), to give … well… I don’t know! Haha :) it would be interesting to try to create something.

Presumably, it leads to some function that can give information as to the next prime. I mean, you don’t need the FFT analogy or processing for that, but the hope is that somehow, through that process in the natural analog with convolving sine waves, you get, a sort of advantage that let you predict the next prime with less information, somehow or somehow advantageous. Haha! :)

The nodes are multiples, the and the spots where none of waves go to zero are primes (because they haven’t been landed on by any of the previous waves), and at any spot the prime-power waves that land at zero are the factors. You can only get unique factors not the factorization with powers if you don’t make waves/multiples of the prime powers.

But I don’t think I know what you mean by negative space - it sounds interesting. What do you mean by that?


Two questions that immediately jump at me are:

1. What is random? They seem to have run numerical experiments, which suggests they used (something equivalent to) uniform integer coefficients from a bounded set. And that could affect findings a lot!

2. Did they look at odd degrees (which always guarantees a real root), even or both?

Not trying to knock this fun post.


As stated in the definition, the distribution is that of independent uniform coefficients in (-1,1).


True, but they give a justification for that distribution, and the justification is incorrect.

> The number of real roots of a random polynomial with real coefficients is much smaller than the number of complex roots. Assume that the coefficients are independently and uniformly random in (−1,1) for if not then we can divide each coefficient by the coefficient with the largest absolute[] value to scale each coefficient to (−1,1).

It's true that, if you have a known polynomial, you can normalize the coefficients in this way... assuming that (-1, 1) is a typo for [-1, 1].

But it's definitely not true that you can produce a random polynomial as the input to this idea. That can't be done. The concept of "a random polynomial with real coefficients" is known not to exist; you would have to specify some kind of nonuniform distribution over the reals for the coefficients to be drawn from. Once you have done this, the distribution of coefficients in the normalized equivalents that have coefficients in [-1, 1] will not be uniform.

In other words, the question says "assume that the coefficients [of a random polynomial] are independently and uniformly random in (-1, 1), for if not, we can exhibit equivalent polynomials whose coefficients are dependently and nonuniformly random in [-1, 1]". But this observation makes no sense.


>The concept of "a random polynomial with real coefficients" is known not to exist;

The concept of "a random polynomial with real coefficients" means what we want it to mean in each particular context; mathematicians don't care too much about semantics, but interesting questions. And this is a very interesting question.


He's right that this question isn't fundamentally about the space of polynomials but rather about this particular arbitrary-looking distribution. But yeah, the question is interesting nonetheless.


I don't get your objection. (-1,1) uniform distribution is equivalent to uniform distribution over (-inf, +inf) when it comes to roots. Because you can map one to one between those intervals without affecting roots.


It's impossible to have a uniform distribution over (-inf, +inf). Since the distribution is uniform, each digit has an equal chance of being zero through nine, but that's true of all digits or else it's no longer uniform. At that point 100% of the values drawn in any sampling of the distribution are infinite.

If scaling the coefficients down to [-1, +1] gives a uniform distribution, then the original distribution must have been uniform in [-x, +x] where x is the largest coefficient in the original polynomial. That, too, is a contradiction! It means that either -x or +x are guaranteed to be one of the random coefficients, which is very much not a uniform distribution.

(That said, I think the (-1, +1) version of the problem still has interest!)


Typically you would do a uniform (-x, x) and then take the limit.

This isn’t necessarily the same as the (-1, 1) version though


Its the same because the uniform distribution over (-x,x) can be obtained by taking a random variable uniformly distributed over (-1,1) and multiplying it by x.

This doesn't change anything for the question about roots because the roots of a polynomial don't change when you multiply it by x.


> Typically you would do a uniform (-x, x) and then take the limit.

But this is completely uninformative for the interval (-inf, +inf). Taking limits is a tool you can use, and it works well when the infinite behavior matches the limit of the finite behavior. The opposite is the case here and the infinite behavior is radically unlike the finite behavior. You might as well try to determine the value of an impulse function at zero by seeing what the limit is as you approach zero.


There is no such thing as a uniform distribution over (-inf, +inf) for the simple reason that the Lebesgue measure of (-inf, +inf) is +inf and you can’t make the integral of the whole set be equal to 1. For it to be a valid probabilistic distribution, the integral of 1 over the whole space relative to the distribution should be 1. But it won’t be. In U(a,b) a and b need to be bounded, so that you can divide the Lebesgue measure by (b - a).


You can't do it directly, that's what the scaling is for. Another trick would be to answer the question for (a,b) and then see what the answer is when b-a -> inf.

For some questions neither of those tricks will work. For this question scaling trick seems to work perfectly.


You're using a shorthand that captures an important point but does not quite dispose of the objection. Let R(X) be an indicator function that's 1 when largest root is real and 0 otherwise; then we want E[R(X)]. I think your point is that E[R(aX)]=E[R(X)], so the choice of Uniform(-1,1) rather than Uniform(-LargeNumber, LargeNumber) is a 'without loss of generality' simplification. Likewise we could take any scale mixture E[R(AX)] for A real, almost surely nonzero, and independent of X. That's a broad class of distributions, but it doesn't cover every 'natural' way of generating coefficient vectors; for example, you could sample X from a hypersphere instead of a hypercube.


> but it doesn't cover every 'natural' way of generating coefficient vectors;

But it covers one of the 'natural' ways. It seems like reasonable question to ponder.


Oh yeah, definitely reasonable. I'm a statistician by trade, so if I see R^n, I want to put a multivariate normal measure on it, which means hyperspheres. Hypercubes are cool too. Just as long as we don't treat the limit from growing the hypercube to cover R^n as being equivalent to a uniform measure on R^n. Unlike, say, in probabilistic number theory, where they shamelessly get away with defining the equivalent of a uniform probability measure on infinitely many integers, much to the consternation of every other kind of mathematician.


There are ways to 'approximate' a uniform distribution by just choosing greater and greater ranges and then normalising the result.

One problem with this is that there's no one way to do it. You could argue equally well that the result should be a uniformly random point on the S100 sphere.

What it definitely cannot be is a random point on the [-1,1]^100 cube because if you divide by the greatest absolute value then one of the coefficients will be -1 or 1


Right. The problem as originally stated is paradoxical. You can't uniformly sample an unbounded set. I assume OP used a random floating point, which are bounded.

But the important part of the question can be asked for any possible distribution over the reals.


> You can't uniformly sample an unbounded set.

You definitely can, as long as you are accepting the use of limits (and without that, sampling uniform over any subsegment over the reals would also be pretty much impossible). “Sample a uniformly chosen real” just is shorthand for “Sample over a real from [-N, +N], then we are interested in the answer then N goes to infinity”. In this case, then just divide all the coefficients by N, which obviously doesn't affect the answer. So uniformly over [-1,+1] is as good as uniformly over R.

This is very common. See e.g. number theory, where you can ask questions like “what is the probability that two randomly chosen natural numbers are coprime”; the set of natural numbers is also unbounded, but it doesn't prevent this question from being well-defined, and the answer is 6/π² (about 0.608).


This comment opens perhaps a bigger can of worms than intended.

The original statement:

>> You can't uniformly sample an unbounded set. [of real numbers]

is true, provided that by "sample" we mean "sample in a way that obeys Kolmogorov's axioms". (I'm adding the qualifier "of real numbers" to keep this somewhat grounded, so that we don't get distracted with abstract metric spaces, or worse.)

In the above comment, by placing a limit on the set ([-N, +N]), you have made the set bounded -- thereby contradicting the premise of the original statement!

*

Your last paragraph is interesting. Suffice to say, the situation is more complex than your summary would indicate, precisely because of the above issue -- that is, "what does sample mean in this context".

It led me to this paper by Lei and Kadane (the latter, the well-known Bayesian theorist) --

https://arxiv.org/pdf/1806.00053

It is well worth a read, if you're into such things.

See especially section 1, and section 7. In essence, the above statement ("You can't sample uniformly") is true, if you interpret "sample" as "following a scheme that obeys Kolmogorov's well-known axioms."

However, if you are willing to abandon countable additivity of your measure, and drop down to finitely-additive measures, there are several classes to choose from that support a notion of uniformity.

(Again, just thinking about measures on integers, not real numbers. But these are not measures in Kolmogorov's sense.)

These measures support a notion of uniformity, but they may not have some properties that you might hope for, such as that the measure of a set A of natural numbers is the same as A + i, where adding "i" shifts the set by "i" units left or right.

For some of these measures, the result about co-prime numbers is as you say, and for others, the result is indeterminate. That is, the limit exists, but it can be any number between 0 and $6/\pi/\pi$.


>It led me to this paper by Lei and Kadane (the latter, the well-known Bayesian theorist)

Not to be confused with J-P Kahane, who also studied this area (of the posted article) via Fourier series


> You can't uniformly sample an unbounded set.

I was asked to do this as an interview question at google! It was about log file processing not math (sample an infinite log file), and I thought my answer was ok? But that's why I don't work there so I am really enjoying this whole thread.


If I had to guess, what they wanted was to sample from a finite set you're only allowed to stream ("unbounded" just means you don't know the bound). Or if it really is unbounded, they still just want a uniform sample up to that point (i.e., still bounded).


What's the correct answer (and what was yours?)


The usual trick to sampling one line at random with uniform probability from an arbitrarily large file or stream is worth knowing because people have been asking this as an interview question for >20years at this point[1].

1. Store the first line with probability p=1

2. When the second line arrives, either keep the first line or with p=.5 drop it and replace it with the second line

3. Continue in this manner with each line. That is to say on the n-th line you either keep the line you have or with p=1/n you drop the line you have and replace the stored line with the new n-th line[2]

4. When you reach the end of the stream or the user quits the program or whatever, return the line you have stored. This will have been selected with uniform probability from the lines you have received. You can sketch the grid for a few lines you can see at every point you have all the lines with uniform probability.

It's worth noting that by doing the algorithm in this way you are not in fact ever selecting with uniform probability from an unbounded set so noone is going to revoke your maths license or anything. You are only ever choosing between two things (keep the thing you have or drop it and replace it with a specific new thing). An intuition behind the proof that this does not violate any laws is that in the case of a truly infinite stream this algorithm would never terminate.

[1] I know because over 20 years ago I was asked this as an interview question.

[2] It's something like this probability. If I was actually answering the question I would work it out properly to be sure.


Yep that was my answer too. I didn't get the job so I'm just guessing but I'm pretty sure that's it.

It was definitely a dorky question, and a glaring sign of the interviewer's lack of maturity to blithely burn through your precious time in this manner.

But if you're going to play Google's game, this what you're going to get apparently. So what was your answer? One guesses that they weren't so much interested in a correct answer (since it's a bullshit question anyway) but that you were willing to give it a college try (and reference some concepts about limits, and perhaps the geometric layout of this "file system" in the Doctor Who universe they were apparently hoping to roll out this system in, the speed of light, etc).

In order to, you know, pass their dipshit filter.


Given the description about "sampling from an infinite" something, it's likely this was intended as a variation on reservoir sampling.

https://en.wikipedia.org/wiki/Reservoir_sampling


Yes, I'm sure that's what they were looking for. I hadn't heard of it before but I improvised my way in that direction until the interview was over. :)

The "guess my Easter egg" interview pattern.

As if this what they do all day: ask each other fundamentally useless questions with no meaningful answer (that anyone would actually care about), and which you aren't expected to provide an actual working answer to, anyway. But which have a cute partial (shibboleth) answer embedded in them. Which if you manage to come up with (under interview time constraints and pressure) -- and most importantly: if you ignore the request to actually solve the problem as stated -- will tell the person asking "how you think".


"I refuse categorically to entertain fun maths problems."


They are also discrete, coming from a finite set. I remember at the undergrad differential equation class the teacher asked for some arbitrary coefficients so we can classify the result.

She was very surprised that 3 differential equations she got all turned out to be parabolical, which form a set of probability 0 in the full space. But of course the probabilities for equations with small integer coefficients (which is what people wrote) are totally different.


If you're looking to run some numerical experiments with this, there is built-in support in R for this sort of thing:

    plot(polyroot(runif(101,-1,1)))
will give you a visualization of the roots of a polynomial of degree 100.


> Assume that the coefficients are independently and uniformly random in (−1,1) for if not then we can divide each coefficient by the coefficient with the largest absolutely value to scale each coefficient to (−1,1).

Not sure about my intuition here, but won’t scaling this way create a non-uniform distribution for all but the largest coeffs, since the scaling is a divide?


The important thing is that multplying a polynomial by a constant doesn't change its roots.


Ah right, of course. The original coefficient space would still be uniform. I guess this means that a bi-unit [-1,1] interval for the random numbers is equivalent to any space of coefficients [-n,n] when it comes to answering the probability question?


Yes


Yeah, but you can't pick a random real number with uniform distribution. Something's gotta give.


As there's no formula for degree-5 polynomials and above, how do you distinguish between a real root vs. a real number + epsilon*i?


You use Budan's theorem https://en.wikipedia.org/wiki/Budan%27s_theorem to certify that there's exactly one (or exactly zero) real root in the interval (r - ɛ, r + ɛ]. As long as your guess r is within ɛ of a root, this allows distinguishing between real and complex roots even if you don't have the exact value. (There are instances where Budan's theorem cannot give an answer, e.g. most obviously if there's more than one root in the interval.)


Just because of the impossibility of an exact formula for the roots of a high-degree polynomial that does not rule out the possibility of figuring out, say, the distribution of roots of such polynomials. The question is not about any polynomial in particular (hence Abel's theorem is no barrier).

edit: Think of the following example: take a polynomial a_n x^n + ... + a_0, where the coefficients a_i are i.i.d. Bernoulli random variables. Even though the degree n might be large (> 4) I can say with confidence that such a polynomial has a real root (x = 0) with probability 1/2. Similar though more sophisticated arguments are at work in the linked question.


If we have polynomial with integer coefficients, we can bound the largest and smallest root by an expression that is maximum of abs. value of coefficients to some power. There should be similar bounds for the smallest imaginary value a root can have. If your polynomial have real numbers for coefficients having a formula doesn't help - we have the same problem of being not able to tell if number is equal zero. For example in the quadratic formula, we can have discriminant equal minus epsilon. If epsilon was zero there is no imaginary root but if it isn't we have an imaginary root.


I could imagine that one uses the derivative here. If x+iy and x-iy are complex roots, and y is really small, then the derivative at x should be small as well (if y = 0, you have a double zero, so p'(x)=0). So if p'(x) is far away from zero, you have a single real zero. And I guess double zeroes don't really happen with random coefficients.


The estimations are not based on numerics, but rather reasoning. For example, since real-valued polynomials are stable under complex conjugation, each complex root must have a complex conjugate. For polynomials with three distinct roots then, one must be real. This is just an EXAMPLE of the kind of reasoning that can be used to infer whether a root is purely real or complex.

As for formulas: no, there is no GENERAL formula for the quintic polynomial or higher. General formulas only exist for polynomials of degree four or lower.

Of course, there are SPECIFIC formulas for specific KINDS of higher-degree polyomials. But for the general quintic and higher, nothing. That's already been proved classically.


> As for formulas: no, there is no GENERAL formula for the quintic polynomial or higher.

Note that there is no general formula for quintics (and higher) in radicals.

But it is a bit arbitrary to draw the line there. There is also no 'general formula' using basic arithmetic operations (addition, subtraction, multiplication, division) that gives roots for x^2 = 2.

However it is quite useful to be able to talk about those roots, so we added a dedicated symbol for them (sqrt), and more generally extended the definition of exponentiation to fractional powers.

If it were generally useful to talk about the roots of quintics we also would've added common notation for their roots.



[flagged]


Hi, I read the post. The numerical approach gives the data. But the proof of the result uses analysis. I was talking about the proof. Thanks.


The roots of a polynomial vary smoothly as the coefficients vary. This is intuitively obvious. This is an important part of Abel's proof of the unsolvability of the quinitic. The probability of a random number being ambiguously real or complex has measure 0 (measure < epsilon, for all epsilon > 0), so you can ignore all computationally ambiguous cases. You don't need an exact calculation of any specific polynomial's root, to get the right answer of the measurement over all polynomials.


I think the others answered your question.

In addition, I suggest you to forget this "degree-5 polynomials" thing. They are not different from degree 2 polynomials in any meaningful way, if you talk about computing their roots or reason about their roots.


There's an incredibly simple result of Sturm that allows one to compute intervals bounding the real roots of polynomials. https://en.wikipedia.org/wiki/Sturm%27s_theorem

That said, if you're working with finite precision, then there's an interesting question somewhere here -- given a polynomial with a certain number of real roots, how precisely do you need to know the coefficients before the answer changes?


There is a formula for arbitrary degree polynomials, it’s just difficult to apply: https://en.m.wikipedia.org/wiki/Thomae%27s_formula Apparently it’s the integrals involved that make it a challenge to use?

You’re probably thinking of the theorem that these can’t always be expressed with just exponents and fractions and other plain algebraic operations.


Sligthly OT, but I always have fun reading these maths posts here: I enjoyed maths in University very much (I did CE) and the joy from my professors always inspired me. I'd love to learn more and dip into solving things with it (maybe with a hang towards numerics?).

But been out of Uni for 2 years now and haven't done much, so probably gotta relearn quite a bit. Does anyone have some good ideas where to start and find fun ideas? Not numerics, but I did quite a few projecteuler problems during my bachelors - maybe there is more like this? Any ideas welcome.

Or should I redo all tasks in my textbooks first? ;-)


Mathematics and Its History (https://link.springer.com/book/10.1007/978-1-4419-6053-5) is a really fun book for someone who already knows some university-level math and is just curious.

You may also enjoy Proofs from THE BOOK (https://link.springer.com/book/10.1007/978-3-662-57265-8).


A bit unrelated to your question, but if you love Maths you can't miss The Math Sorcerer: https://www.youtube.com/@TheMathSorcerer


I went to University to do maths 6-7 years after leaving school (with decent maths results there) so had forgotten everything. Nothing beats doing all of the exercises that you did before, and actually you find that you've not quite forgotten everything. Good luck!


I assume this is one of these things where it is just like ... Oh I don't know this kind of math.

Because in my mind I take a "random" polynomial and only consider the largest two roots. Then I consider doing two reflections: across the bottom/top of the curve, and across the x axis. If those top two roots aren't degenerate the combination of the four curves using the reflections have two curves with a largest real root, and two curves with a largest imaginary root (I think). If it is degenerate then the largest root is real.

So I would then conclude that (1) there are more real largest roots than imaginary and (2) the "advantage" is vanishingly small since there are infinitely more curves with a largest non-degenerate root than degenerate root.

As one can tell I barely know the proper nomenclature. I assume I'm mainly missing some consideration about the uniformity of random coefficients not resulting in a uniform distribution in space? Or possibly my reflections can violate one of the conditions (real coefficients?)? Or I'm just plain wrong.


I don't follow your reflections. Reflecting the graph of a polynomial p(x) along the x-axis means replacing p(x) by p(-x). What do you mean by reflecting "across the bottom/top of the curve"? Reflecting along the y-axis? This would mean p(x) is replaced by -p(x).


What do you mean with reflection across the bottom/top of the curve?

With combination you presumably mean taking (polynomial+reflected polynomial)/2?


Ah there we go. At the very least I was only considering the subspace where the polynomial's derivative has only real roots?

Think of a simple quadratic. The reflection is across the min(f(x)) axis. But for higher order polynomials that doesn't always work so that is certainly an issue. Thanks


I am not sure why the point that reals appear to be more likely would be counterintuitive. If instead of uniform random real coefficients in an interval one picked uniform random roots in a circular disk centered at the origin of the complex plane as the way to construct polynomials, there is a fat chance that one almost never gets a polynomial with real coefficients back, whereas if the random roots were real then the polynomial is guaranteed real. So a priori neither of the possible answers would be surprising and getting reals to be more likely seems a bit more intuitive to me, though not necessary.


A polynomial of order n has n roots (real and complex). A random polynomial has roughly log(n) of those be real and n - log(n) be nonreal.

For large n, log(n) is tiny compared to n, so it's very surprising that over half the time the largest root is one of the tiny number of real roots


Thanks. Finding the right combo to balance the largest root if it were non-real and end up with real coefficients seems harder than balancing a smaller root, ie the polynomial with the larger real root also has an n-1 degree polynomial with real coefficients and smaller maximal root as a factor, whereas for the one with a non-real largest root this n-1 polynomial factor has to have non-real coefficients in a way that exactly balance the largest term to make all coefficients real again and now has less space in the complex plane in order to do so. I am not explaining it well as I am away from a keyboard and I am sure it is very complicated to prove but I am not a priori very surprised either way.


I think you're thinking about it in a slightly weird way, if I am understanding you correctly.

Picking random roots would be a very different thing. Here were only dealing with polynomials whose complex roots come in conjugate pairs (I.e. if a+ib is a root then so is a-ib) this gives you real coefficients "for free" since

(x - (a+ib))(x-(a-ib)) = x^2 -2ax + a^2 + b^2

Theres no coincidental balancing act in choosing the roots to obtain real coefficients. It happens because of the type of polynomials we've chosen.


Say that it is true that for degree N-1 polynomials the largest root is real with more than 0.5 probability. Then for degree N, either you multiply a degree N-1 with real coefficients by a linear (first order) polynomial with a real root that may also be larger than the one in the N-1,, or you multiply it with a first order polynomial of a complex root that has to be conjugate to an existing one in the (non-real coefficient) N-1 degree which can only be as large as the one already in the degree N-1; that particular degree N-1 polynomial has to have a distribution that is similar to those of the ones with real coefficients except for one additional unbalanced root. For a degree 1 polynomial with real coefficients the probability that the root is real is trivially 1, so my natural guess is that with a careful framing of a (double?) induction one could work out a proof that the probability remains larger than 0.5 for all N. (The second induction would be over polynomials that are products of those with real coefficients times a complex first order polynomial). I am guessing it is not that simple because somebody would have blurted it out, but my own (perhaps slightly weird) way to think about it doesn’t lead to any prior surprise.


This reasoning doesn't quite work.

If you're multiplying a polynomial P by a linear factor L = (x-a) with a being a nonreal complex number then at least one of these statements is true

1. P does not have real coefficients

2. PL does not have real coefficients

I think you try to avoid this in what you say by multiplying P by a linear term with a complex root whose conjugate is already in P, but if the conjugate is in P then P wasn't real to start with.

The term with the complex root has to "balance" with its conjugate for things to work out. E.g. (x-a)^n (x-a*)^k has real coefficients only when n=k or a is real.


That’s why you have to do a double induction (I’m not a mathematician so my terminology may be wrong) and also keep proving that those special polynomials that are a product of a complex linear term with a polynomial of real coefficients also have a higher (or equal) to 50% chance of having the largest root be real. For degree 2, I think one can prove this directly, and the rest is also an induction.


I don't think this extra claim about polynomials which are a product of a complex linear term and a polynomial with real coefficients is true.

Let a be distributed uniformly in (-1,1) and let b be distributed uniformly in the unit disk (i.e its a complex number of absolute value less than or equal to 1).

The roots of (x-a)(x-b) are (obviously) a and b. The average absolute value of b is 2/3 while the average absolute value of a is 1/2, you can also work out that the probability that |a|<|b| is 2/3, so with probability 2/3 the largest root will be complex.

This just gets worse if you let b be uniformly distributed in the unit square instead of disk, so you need some quite funky distribution on b to make this work which seems to be unjustified.


Thanks. The funky distribution of b has to be exactly the one that results in uniform (or the desired) random distribution of the real coefficients of the 3rd order real polynomial that is the resulting polynomial when you multiply your example polynomial by the linear term with conjugate b. And if this hold for the resulting 3rd order polynomial with real (random) coefficients then I think that the double induction works.

(Edit: the distribution of b has be to such that the distribution of the resulting 3rd order polynomials together with the distribution of 3rd order polynomials with 3 real roots have the desired random distribution of the coefficients.)


Is it obvious that such a distribution even exists?

We know the desired random distribution of the 3rd order real polynomials (a product of uniform random distributions say), and I believe we can work out the distribution of the subset that have linear roots only and the fraction of random real polynomials that have real roots, so the distribution for the polynomials with roots of type a, b, and conj b, comes from the (weighted) difference of these two, and leads to the joint distribution of a, b (because conj b is fixed given b).

I didn’t want to claim that a full proof is not going to be messy in the details, I just think that this type of construct makes me feel not surprised that the largest root is more likely to be real.

It feels like this problem must have been solved previously, probably in some physics context (maybe in areas related to applications of random matrix theory or dynamical systems).


It is possible that this second recursion doesn’t hold at N=3 (because there is enough of the density covered by the fraction of real polynomials), but then only starts at higher degrees (it will eventually be easier because the N-2 degree will have a higher chance to have a root higher than the new complex b). So this makes is certainly more complicated to prove but still seems intuitively correct to me.

Looking through the links from here one of the answers by Boris Hanin may have resolved a question I had for a while.


Does random polynomial mean with real or complex coefficients, though?


If you allow complex coefficients then the distribution is rotationally symmetric in the complex plane. So a root is no more likely to be in the real line than in any other line through the origin. There would be vanishingly few polynomials with largest root real.


The question posed specifies real coefficients, looks like experimentally tested on random coefficients selected from [-1, 1] from uniform distribution and a sort of scaled normal distribution.


This stuck out to me, and might be a place where our intuition breaks down. As a human if you tell me, "give me a polynomial with degree n with coefficients drawn uniformly from [-1, 1]," I will likely produce one with a lot of zero coefficients, just because a lot of the polynomials you encounter in the wild have a lot of zero coefficients.

In reality, if you grab a coefficient uniformly from [-1,1], the probability that it is 0 is 0.


And how random sparse polynomials behave would also be an interesting question!


Likely (can’t find the test code) tested on coefficients selected from the IEEE doubles in [-1, +1], so from a subset of numbers of the form m/2^n with m and n integers and m between 0 and 2^53 or thereabouts.

For this problem, I don’t see how that would give different results as picking any number in [-1, +1], but that’s not a priori guaranteed. Mathematicians are well-trained as nitpickers.


Aren't roots continuous functions of the coefficients?


Why would that matter? I was thinking “If you’d use 4-bit ‘IEEE’ floats, it wouldn’t be very surprising to see an anomaly like this. Would it be for 5-bit ones? 6-bit? What’s a large enough number of bits?”

I also said “For this problem, I don’t see how that would give different results as picking any number in [-1, +1]” because I think 64 is (more than) large enough, but still, I think you’d need an argument if you want to draw a mathematically sound conclusion “this is worth looking into” from such a simulation.


True, the function could in principle have a structure such that sampling it at m/2*n gives a misleading impression of its behavior outside that set. I can't think of an actual function which behaves like that, though (IANAM).

The first line of the question says "real".


But that first line is a statement about something related but different (the question is about largest root, this statement is about any root), and the term "random polynomial" without specifying anything is used throughout the rest of the content


Now it surprised me as a guy who studied a lot of math. I would think the complex plane is much larger than the real line so the probability would go towards 0.


That would probably be true if all the coefficients were complex numbers. The biais here comes from the fact that all the coefficients are real numbers.


The real line and the complex plane have the same cardinality.


This reminds me of another of those random polynomial questions where a good answer showed why the question asker's chosen distribution of coefficients lead to the result they were seeing. Anyone recognize that and have a link? It seems like it's relevant here too.



There's many of these then!

I think I found the one I was thinking of:

https://mathoverflow.net/questions/182412/why-do-roots-of-po...

See answer by Will Jagy


For those not well versed in the Stack Exchange network, what's the difference between Math Stack Exchange and Math Overflow?


MSE is for noobs, think HS and uni students. MO is for pros, think PhDs, researchers etc.


Adults, who might or not have graduated or attended high school or University, also use MSE.


There are probably some very clever people who are not PhDs nor professional researchers using MO too, GP was just giving examples ('think [...]').


This is not a helpful comment.


MO is for research-level discussions, questions are often bumped from one to the other (more often from MO to MSE).


Math Overflow domain and name are owned by a nonprofit organization, see https://meta.mathoverflow.net/questions/969/who-owns-mathove...


> The MathOverflow corporation is completely independent from Stack Exchange and its mission is to ensure the continued operation of the site in a manner that meets the needs and expectations of the community.

Does it make it more or less hostile than Stack Exchange?


They claim "The MathOverflow corporation is completely independent from Stack Exchange"

and then when you click Legal notice, you see that all conditions and legal of Stack Exchange applies.

In addition:

> This letter is to confirm that, per our communications, that MathOverflow ("MathOverflow") agrees that the website, MathOverflow (located at www.mathoverflow.net, www.mathoverflow.com and www.mathoverflow. org (the "MathOverflow Domains"), the assets of which are owned and controlled by MathOverflow, will be upgraded to Stack Exchange, Inc.'s ("Stack Exchange") version 2.0 software model and, in conjunction with this upgrade, MathOverflow accepts Stack Exchange 2.0's Terms of Service (http://stackexchange.com/legal/tems-of-service, the "Terms"

===

"completely independent" is perhaps exaggerated claim.


The corporation is independent, and has the right to unilaterally leave the website at any time. The current website is not independent.


MSE is sometimes perceived as hostile to beginners because it often closes off newbie questions that lack context or are thought to be of low quality. MO is not suffering from this because almost all such questions are migrated to MSE. In this sense, yes, MO is less hostile than MSE because its hostility is outsourced to MSE.


> Thievery corporation


My experience with MathOverflow was funny, the first time I posted a question it was almost immediatly downvoted and criticized by the moderators. Later the same day, I was surprised when my question was answered by Terence Tao, then the moderators quickly changed their opinion and people started to upvote the question.


Mathematics even worse than Software Engineering because lay practitioner can miss significance of question. If you have link to question, would appreciate. Just out of curiosity to see if I would realize why question is interesting. That in itself is a skill.


To be fair the question was kind of naive, about language, but T. Tao bother to answer and made my day. https://mathoverflow.net/questions/206544/why-do-people-use-...


Thank you for responding. Assuaged my curiosity :)


I believe one was for professionals and the other one for students. Don't remember which one, I guess Overflow is for professionals since we use it for coding.


Honestly it’s Stack Overflow that’s really the "noob" level programming discussion, with 95% of questions incredibly elementary, whether "professional" or not. cstheory.stackexchange.com is more analogous to mathoverflow.com in that it’s about research-level questions.

I think Math Overflow was the first overflow instance after SO, so it has a sort of a special status and its own domain. *.stackexchange.com came later to meet the growing demand for more instances.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: