
Random Solutions are Often Good Enough - luu
http://www.sultanik.com/blog/RandomSolutions
======
ajtulloch
This seems to be a specific case of the more general principle of
"concentration of measure". Terence Tao at [1] has some useful notes. From a
quick scan of the paper at
[http://arxiv.org/pdf/1402.0423v1.pdf](http://arxiv.org/pdf/1402.0423v1.pdf),
Theorem 1 seems to essentially be a CoM result for the order statistics of
symmetric RVs.

This same principle can be applied in other contexts [2] - e.g. Johnson-
Lindestrauss is essentially a statement that random projections are "often"
nearly isometries.

[1]:
[http://terrytao.wordpress.com/2010/01/03/254a-notes-1-concen...](http://terrytao.wordpress.com/2010/01/03/254a-notes-1-concentration-
of-measure/)

[2]:
[https://news.ycombinator.com/item?id=7971188](https://news.ycombinator.com/item?id=7971188)

~~~
thanatropism
Where is those people "programming isn't math" God now, eh?

------
ISL
"Offered the choice between mastery of a five-foot shelf of analytical
statistics books and middling ability at performing statistical Monte-Carlo
simulations, we would surely choose to have the latter skill." \-- Numerical
Recipes in C++, 2nd ed. p.696.

------
jwecker
In the context of clever startup solutions, this brings to mind the quote from
George S Patton: "A good plan, violently executed now, is better than a
perfect plan executed next week." Even if they didn't approach optimal
solutions, the fact that generating random feasible solutions can be much
easier and result in more tractable mental models (and therefore easier to
maintain code etc.) means you can be 'violently executing' in the field of
battle much earlier.

The fact that they approach optimality is gravy.

------
bshimmin
Slightly worthless comment, but I think that is the first web site I've ever
seen typeset in Computer Modern. Strangely, it looks rather nice to me (not
that I think Computer Modern is particularly an attractive font, but my
massive exposure to it over the years has given it a certain quality of
reassurance).

~~~
zwieback
It's not very readable but lends instant gravitas to the material, sort of
like the anti-Comic Sans

------
iandanforth
This page is _awesome_ , I love it. The author presents the material clearly
and carefully. So many authors dealing with mathematical concepts are lazy and
just throw up equations. Yuk! Here we see a careful progression of complexity
supported with complete visual examples and numerical values, using the
equations, worked out. I wish this were not so unusual, but huge thanks for
having done it!

------
swayvil
Get the depth (distance from edge to deepest point) of an irregular polygon.
At first I was doing all kinds of crazy geometry and the future looked bleak.
Then I was "fuck it" and did a randomsample, reduce, randomsample, reduce
thing. It's fast and precise.

~~~
lifeformed
I had a similar experience. I was writing a polygon intersection algorithm,
heavily optimized for the domain of inputs, but kept having issues when
vertices were coincident or when edges were colinear. I spent days trying to
fix it, until I decided to just truncate the vertex values to some good-enough
level of precision, and add a random unique jitter to each value (past the
truncated precision). This made sure that there were never coincident or
colinear situations. Then I just truncated the resulting intersection points
again to remove the jitter.

It was trivial to add, and it was faster than all the extra checks I'd have to
do for the edge cases. Also, it gave me peace of mind, since I knew it worked
in theory, instead of my other feeble attempts which worked most of the time,
but always failed in some new edge case.

------
pdonis
I wonder about one statement in the article: he says that Sudoku is NP-hard. I
wonder about this since Peter Norvig has written a Sudoku solver:

[http://norvig.com/sudoku.html](http://norvig.com/sudoku.html)

Of course, this solver is for the standard Sudoku game on a 9 x 9 grid.
Perhaps the claim in the article refers to "Sudoku" in the sense of the
generalized problem of solving n x n Sudoku, for any n?

~~~
zwegner
As a general rule, when anybody refers to a problem being NP-hard or whatever,
they're talking about a generalized version of the problem, with at least one
"size" variable--the time complexity has to be polynomial/exponential/whatever
in _something_.

9x9 Sudoku has a constant size, which means that it can be evaluated in O(1)
time, if such notation makes sense.

~~~
fenomas
Perhaps useful to add that Sudoku reduces to the more general exact-cover
problem[0], which is indeed NP-hard.

Also, if you've never run across Dancing Links[1], do yourself a favor and go
take a look. It's a method (by Knuth) for implementing an algorithm (ditto)
for generally solving any exact-cover problem (including Sudoku of course).
Basically it amounts to constructing a matrix that encompasses all of the
game's constraints and possible inputs, after which all solutions can be found
through mechanical operations on the matrix. It's both beautiful and very fun
to implement.

[0]
[http://en.wikipedia.org/wiki/Exact_cover](http://en.wikipedia.org/wiki/Exact_cover)
[1]
[http://en.wikipedia.org/wiki/Dancing_Links](http://en.wikipedia.org/wiki/Dancing_Links)

~~~
StephenFalken
Now that you mention Donald Knuth, I can't resist sharing a really interesting
answer he gave recently to one question from a set of twenty, related to
algorithm design and analysis [1]:

15\. _Robert Tarjan, Princeton_ : What do you see as the most promising
directions for future work in algorithm design and analysis? What interesting
and important open problems do you see?

 _Don Knuth_ : My current draft about satisfiability already mentions 25
research problems, most of which are not yet well known to the theory
community. Hence many of them might well be answered before Volume 4B is
ready. Open problems pop up everywhere and often. But your question is, of
course, really intended to be much more general.

In general I'm looking for more focus on algorithms that work fast with
respect to problems whose size, n, is feasible. Most of today's literature is
devoted to algorithms that are asymptotically great, but they are helpful only
when n exceeds the size of the universe.

In one sense such literature makes my life easier, because I don't have to
discuss those methods in TAOCP. I'm emphatically not against pure research,
which significantly sharpens our abilities to deal with practical problems and
which is interesting in its own right. So I sometimes play asymptotic games.
But I sure wouldn't mind seeing a lot more algorithms that I could also use.

For instance, I've been reading about algorithms that decide whether or not a
given graph G belongs to a certain class. Is G, say, chordal? You and others
discovered some great algorithms for the chordality and minimum fillin
problems, early on, and an enormous number of extremely ingenious procedures
have subsequently been developed for characterizing the graphs of other
classes. But I've been surprised to discover that very few of these newer
algorithms have actually been implemented. They exist only on paper, and often
with details only sketched.

Two years ago I needed an algorithm to decide whether G is a so-called
comparability graph, and was disappointed by what had been published. I
believe that all of the supposedly "most efficient" algorithms for that
problem are too complicated to be trustworthy, even if I had a year to
implement one of them.

Thus I think the present state of research in algorithm design misunderstands
the true nature of efficiency. The literature exhibits a dangerous trend in
contemporary views of what deserves to be published.

Another issue, when we come down to earth, is the efficiency of algorithms on
real computers. As part of the Stanford GraphBase project I implemented four
algorithms to compute minimum spanning trees of graphs, one of which was the
very pretty method that you developed with Cheriton and Karp. Although I was
expecting your method to be the winner, because it examines much of the data
only half as often as the others, it actually came out two to three times
worse than Kruskal's venerable method. Part of the reason was poor cache
interaction, but the main cause was a large constant factor hidden by O
notation.

[1]
[http://www.informit.com/articles/article.aspx?p=2213858](http://www.informit.com/articles/article.aspx?p=2213858)

------
swehner
Another angle: totally inefficient solutions can be useful too. I put this
sample up a while ago, Ridiculous Adding,
[https://gist.github.com/stephanwehner/11227766](https://gist.github.com/stephanwehner/11227766)

------
nicholas73
I built my sudoku site using a pseudo random puzzle generator. I don't know
the best algorithm to generate puzzles, so with some basic parameters for good
puzzles, I let the puzzles fill out randomly. I generate, then test them for
unique solutions and difficulty level. If the puzzle is invalid, I toss it.
What I lack in algorithmic knowledge, I make up for with server resources :)

[http://sudokuisland.com](http://sudokuisland.com)

I'm not a CS guy so reading this article made me feel a bit better, especially
since it mentions sudoku.

~~~
Terr_
My gut-feeling is that building Sudoku puzzles shouldn't be too hard to
generate, because every random choice you make reduces the space you need to
pull from for the next choice.

Whether they're "hard enough to solve", though....

~~~
nicholas73
The difficulty comes from multiple solutions to your puzzles (which would
force the player to guess). As it turned out, a purely randomly selected
puzzle would 99.9%* of the time give you a puzzle with multiple solutions. I
would have settled for 99%, but more was too much server resources to hog!

And then add the screening for difficulty levels.

*Not an exact figure =)

~~~
PeterisP
The sudoku difficulty can be very varied, but it doesn't rely on multiple
solutions - a grid where multiple solutions are possible simply isn't a valid
sudoku puzzle; a minimalistic puzzle is one where no numbers can be eliminated
without losing uniqueness of the solution, but adding extra numbers to that
minimalistic puzzle makes it easier.

~~~
nicholas73
All my puzzles have unique solutions. I said the difficulty of generating is
from multiple solutions, not that the puzzles are difficult because of them.

~~~
PeterisP
Ah, I misunderstood you - true enough, the multiple solutions cases slow down
a random generation as the candidates have to be checked and eliminated.

------
leeoniya
i'm currently working on a box-packing algo that attempts to find likely-to-
be-used boxes to pack an order for shipping cost estimation. (variant of the
knapsack problem [1]). "close enough" is the name of the game here :)

[http://en.wikipedia.org/wiki/Knapsack_problem](http://en.wikipedia.org/wiki/Knapsack_problem)

------
brucehauman
The last line is priceless.

"So the moral of the story is: Sometimes quickly and mindlessly choosing a
random solution isn't half bad!"

~~~
JacobEdelman
Actually, with the constant of approximation being two, its almost exactly
half good.

~~~
coldcode
As long as it isn't half-ass.

------
brockrockman
a good reminder that the TSP optimization problem is only NP-hard, not NP-
complete -- [http://www.nomachetejuggling.com/2012/09/14/traveling-
salesm...](http://www.nomachetejuggling.com/2012/09/14/traveling-salesman-the-
most-misunderstood-problem/)

~~~
sidww2
"only" NP-hard? You know that NP-hard problems are equal to order harder than
NP-complete ones, right?

~~~
Strilanc
Probably referring to the definition, not the difficulty.

------
LeicaLatte
Totally tangential but,

Marco points out in latest ATP how he randomizes the lists of podcasts every
time the list is shown. Neat!

