
The Unreasonable Effectiveness  of Quasirandom Sequences - panic
http://extremelearning.com.au/unreasonable-effectiveness-of-quasirandom-sequences/
======
mturmon
Anyone who does Monte Carlo simulations would at least be _interested_ in the
notion of Quasi Monte Carlo, in which points are sampled "more evenly than
random."

As noted in OP, this can lead to O(1/n) convergence of averages to underlying
target values, rather than the expected O(1/sqrt(n)) convergence ("n" is the
number of probes of the underlying system). This is shown, strikingly, in Fig.
2 of the OP.

There is a conference, tutorial papers etc. E.g.:
[http://mcqmc2018.inria.fr](http://mcqmc2018.inria.fr)

~~~
extremelearning
Absolutely! This is actually the original reason I began exploration into
quasirandom numbers. I wanted to see if I could replace some of the stochastic
process in current machine learning architectures with quasirandom processes.

In principle, the curse of dimensionality kicks in after a dozen or so
dimensions, as convergence is O( log(N)^D / N). But in practice I have found
that in a surprising number of applications and circumstances, quasirandom
sequences can have offer a substantial improvement even when the number of
dimensions are in the hundreds or even thousands. (Some authors have suggested
this works because the solution space is of low dimensions but embedded in a
much higher dimensional space...)

However, to get the full benefit I needed to find a low discrepancy
quasirandom sequence that did not suffer the many of the parameter degeneracy
problems that many of the conventional ones exhibit for very large D. For me,
the new R-sequence nicely solves this parameter selection problem by not
having any parameters to optimize!

~~~
jacobolus
Can you explain how you figured out that these were the right generalization
of the golden ratio? Were you doing a literature search? Just experimenting?

BTW, in the past couple months I have directed several people who were
interested in spiral phyllotaxis and the like at your page. I think your
sequences could make for some interesting art projects.

Personally I plan to try them out for dithering, picking well-spaced colors
for diagrams, picking points in a region when trying to optimize a map
projection, etc.

~~~
extremelearning
Thanks for the kind words and promotion! ;) These findings were a combination
of a lot of literature searching, perseverance, decades of mathematical
intuition and computer modelling of various hypotheses. I had a problem to
solve so I was determined to find a solution. And then combine this with my
obsession with simplicity, it meant I didn’t just want to tweak a million
parameters to get a result.

------
extremelearning
Author here. OMG! Someone submitted my post here and it's got to the front
page!! Happy to try to answer any questions that people have. Enjoy!

~~~
nestorD
Very nice blog (I subscribed to the newsletter). I am looking forward to the
"quasirandom stochastic descent" and "extreme learning machines" sections.

------
leni536
Very cool! My only gripe of low discrepancy sequences is how they are used in
Quasi Monte-Carlo integration:

    
    
       1/n \sum_1^N f(a_n),
    

so each point gets equal weight. One can generalize this that every point gets
it's own weight:

\sum_1^N w_{N,n} f(a_n).

I have never seen this generalization elsewhere. I only study Quasi Monte-
Carlo as a hobby, maybe I just missed it. Once I had significantly improved
discrepancies (close to theoretical minimum) with a modified Van der Corput
sequence and linearly diminishing weights. It's all in one dimension though.

My other idea for two dimensions were to use the Hilbert-curve (the infinite
limit) to map a one-dimensional sequence to a square area, but AFAIK this was
done by others before.

~~~
extremelearning
Yes, using an open (infinite) one dimensional low-discrepancy sequence in
conjunction with a space-filling curve such as the Hilbert-curve has been
explored. For example by Owen [1]

Niederreiter himself has a beautiful paper that includes a section on weighted
low discrepancy sequence [2] where he also cites several references within it
that might be of interest to you.

[1]
[http://statweb.stanford.edu/~owen/reports/extgrid.pdf](http://statweb.stanford.edu/~owen/reports/extgrid.pdf)
[2]
[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.614...](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.614.2202&rep=rep1&type=pdf)

------
SeanLuke
Forgive me if you make this clear in the text (I may have missed it), but you
have various pull quotes which say: "The new R2 sequence is the only
2-dimensional low discrepancy quasirandom sequence where..." Are you saying
that it's the only _known_ sequence? Or the only one _possible?_ If the
second, do you have formal proofs for those claims?

~~~
extremelearning
I somewhat stumbled on this during my efforts to improve some machine learning
architecture I was working on, so unfortunately no formal proofs as yet. This
is one reason why I went down the route of technical blog post rather than
more academic routes which would no doubt require more rigour and precise
language.

Although in many cases I have reason to strongly believe that it is not
possible for other sequences to have similar properties, until such proofs are
established, I agree that in a technical paper it would be prudent for me to
preface most of the quotes with “only known...”

~~~
jacobolus
One thing I wondered is if there’s a way to tweak this (e.g. by skipping
values in one direction) to have better region filling (with respect to
Euclidean distance) when the aspect ratio is not square.

Try sliding the top slider at [https://beta.observablehq.com/@jrus/plastic-
sequence](https://beta.observablehq.com/@jrus/plastic-sequence) to see what I
mean.

~~~
extremelearning
Unfortunately, I haven’t found a nice solution to this problem yet. For those
requiring quasirandom Monte Carlo integration over a say a 2x1 rectangular
grid there is always the option of scaling the unit square points by 2 and
then rejecting half of the points. Not super elegant and not very efficient in
high dimensions, but at least it is unbiased and maintains the low discrepancy
characteristics.

~~~
DoctorOetker
While your R2 sequence is obviously better than the others for hyper uniform
distribution on the sphere, I can't help but notice there are still spiral
patterns. (are we looking at a pole of the coordinate transform btw?).

Is this fundamentally the same or similar problem as the rectangular stretch
mentioned here? But with a "jacobian" stretch as opposed to a linear stretch?

Could you please read the short section
[https://en.wikipedia.org/wiki/Orthogonal_matrix#Randomizatio...](https://en.wikipedia.org/wiki/Orthogonal_matrix#Randomization)
?

It is very short, first it defines what is meant with a random orthogonal
matrix, then it describes the problem with an incorrect approach, ... and then
it describes a recursive procedure to generate a random orthogonal matrix of
dimension D using D * (D+1) / 2 _uniformly_ distributed numbers:

>Stewart (1980) replaced this with a more efficient idea that Diaconis &
Shahshahani (1987) later generalized as the "subgroup algorithm" (in which
form it works just as well for permutations and rotations). To generate an (n
+ 1) × (n + 1) orthogonal matrix, take an n × n one and a uniformly
distributed unit vector of dimension n + 1. Construct a Householder reflection
from the vector, then apply it to the smaller matrix (embedded in the larger
size with a 1 at the bottom right corner).

I predict this will give a better uniform distribution on that sphere, if you
share your code for generation and visualization I _might_ try myself :)

Also, I applaud you for the extremely clear exposition of the problem, thought
process and analogies presented. The one section I found harder to grasp was
the one on dithering, as I don't fully understand the process from input
picture to output pixel locations and intensities...

~~~
extremelearning
Thank you for your kind words! Regarding the mapping of R_2 to the surface of
the sphere, I totally agree that although it is better than other options, it
is far from optimal. Also, yes, these visualizations are looking from the
North pole downwards as I thought this viewpoint highlighted the differnces
and discrepancies the best. It was this clearly suboptimal mapping that
spurred me to write a prior blog post on improved ways to place N points on a
sphere [1].

I think that much of the problem is not with the particular unit square point
distribution but rather with the fact that we are using a mapping (lambert)
that has a singularity at each pole.

For constant/given n, there is an excellent reference by Saff on mapping
points to a sphere. However, if n is not known (ie you need an open sequence
of points) I am not aware of any method that is better than simply/naively
mapping a low discrepancy sequence from the unit square to S2.

I think that the most ideal situation will work directly on the surface of the
sphere and consist of rotations from one point on the surface to the other.
Thus, although this is not my area of expertise I wouldn't be surprised if
methods such as what you cite may assist someone in finding a better spherical
distribution.

And finally regarding the section on dithering. Thanks for the feedback. I
will add some more background explanations to ensure that it is clearer for
those who do not directly work in the graphics rendering space.

[1] [http://extremelearning.com.au/evenly-distributing-points-
on-...](http://extremelearning.com.au/evenly-distributing-points-on-a-sphere/)
[2]
[http://www.math.vanderbilt.edu/saffeb/texts/262.pdf](http://www.math.vanderbilt.edu/saffeb/texts/262.pdf)

~~~
DoctorOetker
>I think that much of the problem is not with the particular unit square point
distribution but rather with the fact that we are using a mapping (lambert)
that has a singularity at each pole.

I agree that the mapping is causing the remaining issue, and that the
singularity is _emphasizing_ it, but I _think_ it is any stretching of a
mapping that is causing the spiral clustering towards the equator.

>For constant/given n, there is an excellent reference by Saff on mapping
points to a sphere. However, if n is not known (ie you need an open sequence
of points) I am not aware of any method that is better than simply/naively
mapping a low discrepancy sequence from the unit square to S2.

I understand the advantage of your R2 method regarding open sequences, and the
quasi-magical property you can use it to just keep appending points to the
collection without any need to "remember" the points already added to position
the fresh point. This is clearly a desirable property.

>I think that the most ideal situation will work directly on the surface of
the sphere and consist of rotations from one point on the surface to the
other. Thus, although this is not my area of expertise I wouldn't be surprised
if methods such as what you cite may assist someone in finding a better
spherical distribution.

It seems you misunderstood me thinking I was proposing an alternative to an
R_2 based method. On re-reading my comment I see I should have been clearer on
what I was proposinng:

I proposed using R_2 for a sequency of 6-dimensional tuples (or perhaps just
5-dimensional if the first random number should be 1). Each tuple generated by
your R_2 would be used to construct the 3 dimensional random rotation matrix
as described. Each rotation matrix then rotates the same reference point
(0,0,1). I was proposing an alternative to the lambert mapping, but still
using your R_2 sequence, so you keep having an open sequence of points.

I believe you have discovered something very fundamental here, and I can
identify with the person who asked for the 1x2 rectangle: perhaps by modifying
R_2 for enough simple variations on the problem we can generalize to arbitrary
shapes, manifolds or metrics. I am unsure what the best approach is: modify
the core logic for generating the next 2-dimensional point in 1x2 rectangle
such that it remains hyper-uniform or treat the R_2 sequence as the
fundamental primitive, and generate higher dimensional tuples for the uniform
hyper-cube and then use math to somehow project it to 2 dimensions.

Thanks for sharing your insights!

~~~
extremelearning
1\. Stretching I think you and Jacob are on to something when we realise that
the R_2 sequence (and presumably R_D) loses some of its properties when it is
naively stretched. I think if we solve this problem then we can make some real
progress in the sphere problem.

2\. Regarding rotations. Yes, I agree. When i wrote 'rotations' i meant it
very generally. Any direct transformations from one point on the sphere to
another without requiring intermediate mappings such as on a Torus.

3\. 6-tuples. This is a really interesting idea, and sorry I missed it the
first time. I will definitely explore this idea more.

~~~
DoctorOetker
regarding the nr 1: the reason I'd prefer solving the rectangular version
first is because I see 2 potential sources of problem: on one hand an
anisotropic but still _flat_ metric (i.e. no curvature) and on the other hand
the shape or topology of spaces like sphere (which admits elliptic metrics) or
double torus (hyperbolic geometry) which force curvature on the metric. That
the problem arises already for a flat metric is why I think it deserves
attention.

Although the problem can be seen with visual inspection, it incurs some effort
to inspect. I propose the following visualization: after generating say N=1000
points, for each point select the closest neighbour, and plot its distance
with respect to the origin, then ideally one would get a circular band
centered on the origin. Alternatively, plot the angle for each nearest
neighbour vector and ideally one should see a uniform distribution.

It's unclear which part(s) need to be generalized for the rectangular version:
the equation for the phi's? some rescaling of the phi's? the update rule
itself? or some function on the output of the update rule?

Another approach may be to change the concept of hyper uniformity and
generalize it to "hyper convergence of [arbitrary function]" i.e. can we have
fast convergence to a hat or tent function? If we take the 2-dimensional R_2
and compute the sum of the pair of pseudo-independent numbers, does it
generate a tent function? still faster than random sequence? Instead of
looking at the distance between adjacent points for the non-uniform tent
function, we might multiply this distance by the distance to 0 or 2 (whichever
is closest). I.e. points near the floor (0 or 2) should be further apart but
will have a lower multiplier since its close to 0 or 2, while points near the
top of the tent function (1) will have small distances between them but a
higher distance fromm 0 or 2.

For a couple of moments I incorrectly conjectured that spaces have "magic
geodesic steps" i.e. for a torus moving straight up or straight to the right
does not create a dense point set on the unit square, while the R_2 step does.
The reason I think there is no general "magic geodesic step" or at least that
it is less related to geodesics as I thought is because _every geodesic_ on
the unit sphere turns back on itself.

------
tlb
I'm going to try this instead of Halton sequences in my current project.

One detail of the implementation that may be worth improving:

    
    
            z[i] = (seed + alpha*(i+1)) %1
    

When i gets large, say 2^30, you're losing 30 bits of precision in the %1, so
(with IEEE doubles) it can only take one of 2^23 values.

The usual implementation of Halton sequences get slower with large values of
i, but doesn't suffer from quantization.

~~~
extremelearning
Thanks tlb. You're totally right. I wrote the example code to match the
notation used in the post which is a reflection of my maths background. I
suspect that from a programming perspective using the recurrence relation:

z[i+1] = (z[i] + alpha) %1

is better practice in terms of both speed and accuracy. I have updated my post
to make this clearer. Thanks.

------
infogulch
> The methods of creating fully deterministic low discrepancy quasirandom
> sequences in one dimension are extremely well studied, and generally solved.

Are you referring to something like quadratic residue [1]? It's basically that
the sequence x^2 mod P where P is prime appears to be somewhat random, and it
goes through the whole sequence before repeating. But it's obviously finite,
and it's not perfect [2]. I'm mostly just curious how it relates.

[1]:
[https://en.wikipedia.org/wiki/Quadratic_residue](https://en.wikipedia.org/wiki/Quadratic_residue)
[2]:
[https://arxiv.org/ftp/arxiv/papers/1612/1612.05852.pdf](https://arxiv.org/ftp/arxiv/papers/1612/1612.05852.pdf)

~~~
extremelearning
My intent when I wrote that phrase was two-fold. Firstly to emphasise the
large body of work that was done by Weyl, Kronecker, Hurwitz et al relating to
the equidistribtuion theorem, badly approximately numbers and diophantine
approximation. Armed with this knowledge it is easily proved that the
recurrrence relation using the golden ratio mod 1 is optimal using almost any
measure.

Although some other readers on HN may be able to see a connection between one
dimensional quasirandom sequences and quadratic residue, unfortunately I can’t
immediately see an explicit one. Sorry!

My second intent was to contrast this with how little is _provably_ known for
higher dimensions. To my knowledge, thanks to the phenomenal work by Halton,
Sobol, Niederreiter et al, we know that many of the contemporary sequences are
optimal in the limiting Big O sense, but there are no proofs even for d=2 for
optimality for any of the sequences for general finite values of n.

------
targafarian
I'm curious why Sobol outperforms the R-sequence for numerical integration of
an 8-dimensional Gaussian and whether that result holds in lower dimensions
(and whether that's tied to the infintely-long-tailed nature of the Gaussian;
maybe there's a bias in Sobol that is beneficial in this particular case...?).

~~~
extremelearning
Sobol's sequence is really quite amazing. However, there are three reasons why
it doesn't get as much attention as it probably deserves.

First is that the maths is far more complex than that of other sequences. Here
is a the easiest description I have seen so far [1].

Secondly this complexity as well as the non-trivial requirement to carefully
select the basis parameters means that the computing side is very very
complex. To get an indication of its complexity, the code for Sobol generation
in Mathematica and WolframAlpha is not done by Wolfram itself but rather
"comes courtesy of the Intel MKL libraries,... Specifics of the
implementation, such as choice of initial values, are not documented.
Evaluation of this implementation in terms of discrepancy, projections, and
performance in application remains for future work." [also 1]

Thirdly, as defined by d* discrepancy, ( which is by far the most common
formal method of quatifying discrepancy) the Sobol sequence does not have an
asymptotic discrepancy as low as many of other sequences.

Despite this last point, it has been found that in practice Sobol generally
yields as good or better performance when used for numerical integrations.

I personally have a suspicion that this is because of the special property,
called "Property A" [2] that all Sobol point sequences possess. However, I
will let people much smarter than me comment on whether this belief is
justified or not...

In one of my other posts [3], Sobol sequences outperform all of the other low
discrepancy sequence (including my new R2 sequence) hands down.

Finally, please note that for brevity and clarity, my blog only shows one
example for most topics. I will leave it to other authors, or future posts to
more rigourously show how consistent the comparisons are....

[1] [http://www.mathematica-journal.com/2011/12/a-toolbox-for-
qua...](http://www.mathematica-journal.com/2011/12/a-toolbox-for-quasirandom-
simulation/) [2]
[http://www.andreasaltelli.eu/file/repository/HD_SobolGenerat...](http://www.andreasaltelli.eu/file/repository/HD_SobolGenerator.pdf)
[3] [http://extremelearning.com.au/how-to-generate-a-sequence-
of-...](http://extremelearning.com.au/how-to-generate-a-sequence-of-evenly-
balanced-combinations/)

------
Tomte
[I failed to reply to the author, but I guess you'll see it here, as well]

Please remove the plea for upvotes from your site.

The mods usually react harshly to this and bury the submission.

Your article doesn't need that to stay some time on the front page, anyway.
It's high-quality and people have recognized it as such.

~~~
extremelearning
Thanks for this advice. I am new to blogging and certainly new to HN front
page etiquette. I think my excitement got the better of me! I have now taken
it off.

~~~
Tomte
Great, thanks!

------
charlesism
Neat! Looks like a pretty decent way to generate colored noise. Wish there
were an audio demo.

~~~
extremelearning
I have now inserted some sound demonstrations into my blog post! You can now
listen to what 1-dimensional (mon) and 2-dimensional (stereo) quasirandom
sound might be like.

I will leave it to you to decide which ones you prefer, and maybe even think
why some of the quasirandom sequences are more pleasant than others.

For those who have already visited the site, you may need to clear your
browser cache to see/hear this update.

~~~
charlesism
Thanks! Interesting. At some point I want to play around with this myself. I'm
curious to see what it would sound like to combine the x and y values together
(eg: if the frequency is 10 samples between each x, randomly choose between
x=0 to x=10 for the first, between x=10 to x=20 for the second). That would
probably make it sound less like an FM tone, and more like hissing. I'm not
much for math, but a lot of tutorials on colored noise are based on multiple
filters. If this algorithm works, it would just need a single low-pass filter
for smoothing.

------
bzzzzzt
This looks similar to the 2D "random number generators" at
[http://pippin.gimp.org/a_dither/](http://pippin.gimp.org/a_dither/) but is
even more evenly distributed.

~~~
extremelearning
Yes. I think that using the dither matrix based on the R_2 sequence should be
competitive with any of the the dither masks mentioned on that link.

Thus in situations like video, and/or ray tracing where computing speed is of
the essence, maybe a better mask such as the R2 dither mask would offer an
improvement on existing methods.

However, in terms of end-quality I am not aware of any dither masks that are
anywhere near as good as the state-of-the-art error diffusion algorithms.
Thus, for a small number of images such as our favourite animated gifs, I
would still be recommending an error-diffusion method. :)

------
strainer
I would like to be able to make this sequence, it looks nice. ~~Unfortunately
the included python code is not valid~~. ( * ) It will take me quite a while
to grok the math discussion well enough to implement it.

* - pasted it wrong.

~~~
extremelearning
That's strange. To double-check that i typed it into my blog correctly, I
copy-and-pasted the code from the post into my python IDE and it compiles and
runs properly in my environment. For what it is worth, I am using Python 2.7
on a 64-bit windows machine. Maybe someone else reading has some ideas?

You should notice that after my example code, I have included two other
people's code demos so that you can see the same algorithm from a different
perspective. Maybe these will help you understand what's required.

In summary, for 2 dimensions, x and y coordinates of the n-th term (n =
1,2,3,....) are defined as:

    
    
      g = 1.32471795724474602596090885447809 
      a1 = 1.0/g
      a2 = 1.0/(g*g)
      x[n] = (0.5+a1*n) %1
      y[n] = (0.5+a2*n) %1
    

where %1 is the mod 1 operator and takes the fractional part of the argument.
Hope that helps.

~~~
strainer
I pasted it wrong _blush_

And, Thankyou that seems as clear as can be, when I get my head screwed on
straight I will have a better read through your fine insights of the workings.

I will be trying include your sequence in my quirky collection of random
helpers for Javascript, illustrated here.
[http://strainer.github.io/Fdrandom.js/](http://strainer.github.io/Fdrandom.js/)

And will be sure to attribute you and message when I get it in.

~~~
extremelearning
That's great. I'm glad you got it working. ;)

------
PavlikPaja
Please note the article only shows correctly in Chrome.

~~~
PavlikPaja
It seems to be fixed now.

