
First improvement of fundamental algorithm in 10 years - phreeza
http://web.mit.edu/newsoffice/2010/max-flow-speedup-0927.html
======
mds
Their "represent it as a matrix" approach reminded me of this Richard Feynman
story:

    
    
       [...] Richard had completed his analysis of the behavior 
      of the router, and much to our surprise and amusement, he   
      presented his answer in the form of a set of partial 
      differential equations. To a physicist this  may seem 
      natural, but to a computer designer, treating a set of 
      boolean circuits as a continuous, differentiable system is
      a bit strange. 
      Feynman's router equations were in terms of variables 
      representing continuous quantities such as "the average 
      number of 1 bits in a message address." I was much more 
      accustomed to seeing analysis in terms of inductive proof 
      and case analysis than taking the derivative of "the 
      number of 1's" with respect to time.
    

[http://www.longnow.org/essays/richard-feynman-connection-
mac...](http://www.longnow.org/essays/richard-feynman-connection-machine/)

~~~
Maro
Great story. I'm a physicist and I've read most of the Feynman stuff,
including many of his papers, but I wasn't aware he _worked_ at Thinking
Machines Corp.

I submitted it as a stand-along link:

<http://news.ycombinator.com/item?id=1732952>

------
milkshakes
from <http://theory.csail.mit.edu/toc-seminars/> (can't find the actual
paper...)

 _Electrical Flows, Laplacian Systems, and Faster Approximation of Maximum
Flow in Undirected Graphs

The maximum flow problem and its dual, the minimum s-t cut problem, are two of
the most fundamental and extensively studied problems in Operations Research
and Optimization. They have many direct applications and are also often used
as subroutines in other algorithms.

In this talk, I'll describe a fundamentally new technique for approximating
the maximum flow in capacitated, undirected graphs. I'll then use this
technique to develop the asymptotically fastest-known algorithm for solving
this problem. For graphs with n vertices and m edges, the algorithm computes
epsilon-approximate maximum flows in time \tilde{O}(m^{4/3})_poly(1/epsilon)
and computes epsilon-approximate minimum s-t cuts in time \tilde{O}(m+n^{4/3})
_poly(1/epsilon).

We compute these flows by treating our graph as a network of resistors and
solving a sequence of electrical flow problems with varying resistances on the
edges. Each of these may be reduced to the solution of a system of linear
equations in a Laplacian matrix, which can be solved in nearly-linear time.

This is joint work with Paul Christiano, Aleksander Madry, Daniel Spielman,
and Shanghua Teng._

------
T_S_
The article refers to the max flow problem. About all this article does is
imply that the new method involves the adjacency matrix of the graph and
provides little else in the way of details. A better article is needed, not
this oversimplified press release.

~~~
superjared
Kelner is set to give a talk about it tomorrow[1]. Anyone know if they create
videos for these?

[1]:
[http://www.csail.mit.edu/events/eventcalendar/calendar.php?s...](http://www.csail.mit.edu/events/eventcalendar/calendar.php?show=event&id=2711)

~~~
danger
The method sounds very interesting, but the title seems to overstate the work
a bit. This is an approximation algorithm, meaning that it's an improvement on
an _approximate_variant_of_ a fundamental problem, right?

~~~
schumitsch
I think the title is appropriate. The approximate solution is the interesting
one, as the exact solution is believed to be intractable.

~~~
cdavidcash
This is absolutely not true. The wikipedia page for the max flow problem lists
several (slower) poly-time algorithms for solving exact max flow. Most
theory-101 classes cover at least Ford-Fulkerson.

------
Groxx
Very little info, but I'd argue that the parallel qualities of the new
algorithm are at least of similar importance to the improvement in complexity.

Old speed: (N + L)^(3/2) New speed: (N + L)^(4/3). (n=nodes, l=links)

> _For a network like the Internet, which has hundreds of billions of nodes,
> the new algorithm could solve the max-flow problem hundreds of times faster
> than its predecessor._

Granted, it's massively useful for Internet-scale calculations. But it's also
massively useful that this is just operations on a matrix - throw the sucker
at a video card, and watch it finish _thousands_ of times faster on cheaper
hardware.

edit: oh, and:

    
    
      100,000,000^(3/2)) / 100,000,000^(4/3) ≈ 21.5
      1,000,000,000^(3/2) / 1,000,000,000^(4/3) ≈ 31.6
      1,000,000,000,000^(3/2) / 1,000,000,000,000^(4/3) ≈ 100
    

Anyone know the scale of N+L for the internet? I somewhat doubt it's in the
trillions. Which is not in any way to suggest there are not significantly
larger applications - just nitpicking at journalism's grasp of numbers.

~~~
jedbrown
_throw the sucker at a video card, and watch it finish thousands of times
faster on cheaper hardware_

This is nonsense even for dense operations. But this matrix is sparse, in
which case GPUs are within a modest factor (2-5 or so depending on the matrix,
whether you use multiple cores on the CPU, and whether you ask NVidia or
Intel). And if the algorithm does not expose a lot of concurrency (as with
most multiplicative algorithms), the GPU can be much slower than a CPU.

~~~
jakozaur
The right answer is in the middle. Of course speed up depends on what are you
comparing, but if you benchmark GPU against decent four core CPU the speed up
is in order of magnitude.

For dense Matrix currently is about eight times
(<http://forums.nvidia.com/index.php?showtopic=172176>) on CUDA 3.1. That is
before the CUDA 3.2 which claim 50-300% performance increase.

For sparse matrix the speed up against multi core CPU is about ten times
(<http://forums.nvidia.com/index.php?showtopic=83825>).

All available as a public libraries
(<http://developer.nvidia.com/object/cuda_3_2_toolkit_rc.html>).

jedbrown, please provide a source of your estimates. Of course I'm interested
in some highly optimized libraries like BLAS. Hand written code would be on
both systems several times slower.

~~~
jedbrown
"On the limits of GPU Acceleration" (short summary from Richard Vuduc):
<http://vuduc.org/pubs/vuduc2010-hotpar-cpu-v-gpu.pdf>

"Understanding the design trade-offs among current multicore systems for
numerical computations" (somewhat more technical):
<http://dx.doi.org/10.1109/IPDPS.2009.5161055>

"Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on
CPU and GPU" (from Intel, but using the best published GPU implementations):
<http://doi.acm.org/10.1145/1815961.1816021>

BTW, you may as well cite CUSP (<http://code.google.com/p/cusp-library/>) for
the sparse implementation, it's not part of CUDA despite being developed by
Nathan Bell and Michael Garland (NVidia employees).

~~~
wingo
Really great comment, thank you.

------
kwantam
This sounds cool, but how is this not a solution that's already been
considered? It seems intuitively obvious to me that you can use a circuit
simulator to solve this problem pretty quickly by assigning conductances to
the edges of the graph. Modern circuit simulators are pretty good at
converging on a solution in the presence of strong nonlinearities, so in
principle you could even have edges that are more than simple conductances.

(I have a CS degree, but I'm a circuit designer, so it's been a while since
I've thought seriously about problems like this and I'm clearly not in touch
with the state of the art. This question isn't snark, it's really a question.)

~~~
jsolson
> This sounds cool, but how is this not a solution that's already been
> considered? … I have a CS degree

Well… did you think of it?

~~~
kwantam
Fair enough :)

(In my defense, I've been solving other problems instead...)

------
rndmcnlly0
The theme with these MIT announcements seems to be HCA: Hope, Certainty, and
Assuredness by analogy to FUD and with similar use of misrepresentation,
simplification, and appeal to authority.

------
shadowmatter
Actually, another case: Vladimir Yaroslavskiy recently came up with a dual-
pivot quicksort method where the number of swaps is reduced from 1 _n_ ln(n)
to 0.8 _n_ ln(n). Yes, he's only twiddling the constant factor, but the
benchmarks have been promising, with some sorts being reduced to 25% to 50% of
their previous running time. Josh Bloch, one of the shepherds of Java, took
the time to implement it as the default sorting algorithm in Java 7 (whenever
that gets released). More details at:
[http://permalink.gmane.org/gmane.comp.java.openjdk.core-
libs...](http://permalink.gmane.org/gmane.comp.java.openjdk.core-
libs.devel/2628)

How timsort from Python stacks up, I don't know offhand. But that's a
different beast instead of an improvement to a fundamental algorithm.

------
kenjackson
Here's a good background paper by Spielman that seems to lay the foundation
for the methods used:
<http://www.cs.yale.edu/homes/spielman/PAPERS/icm10post.pdf>

Although not the actual result itself. But looks like a really good read.

------
jongraehl
It's an approximate algorithm (nothing wrong with that).

I'm confused at how they claim O(m^{4/3}) (m is the number of edges) when even
initializing an explicit n by n matrix is O(n^2), unless they assume that the
graph isn't sparse (m>=k*n^1.5).

Or perhaps the matrix is sparsely represented.

~~~
kenjackson
On page 11 of the paper I reference above Spielman makes the following
statement:

"The most obvious way that sparsification can accelerate the solution of
linear equations is by replacing the problem of solving systems in dense
matrices by the problem of solving systems in sparse matrices. Recall that the
Conjugate Gradient, used as a direct solver, can solve systems in
n-dimensional matrices with m non-zero entries in time O(mn). So, if we could
find a graph H with O(n) nonzero entries that was even a 1-approximation of G,
then we could quickly solve systems in LG by using the Preconditioned
Conjugate Gradient with LH as the preconditioner, and solving the systems in
LH by the Conjugate Gradient. Each solve in H would then take time O(n^2), and
the number of iterations of the PCG required to get an ǫ-accurate solution
would be O(log ǫ−1). So, the total complexity would be O((m + n^2) log ǫ−1)."

I'm not really sure how you get O(n) 1-approximations of G (or even really
what a 1-approximation is yet, as I haven't really read the paper), but that
seems to be the idea, I think.

~~~
CountHackulus
A 1-approximation is an approximation within a factor of 1. Basically that
means that the solution is nearly exact. A 2-approximation would be no more
than twice as bad as the optimal solution, and an n-aproximation would be no
more than n times as bad as the optimal solution.

This is a common thing to use in randomized or approximate algorithms. For
example, solving min-cut by assigning to 2 sets randomly is provably a
2-approximation.

~~~
JosefK
I believe you mean Max-Cut can be 2 approximated with the trivial random
algorithm. Min cut can be solved exactly, and that algorithm would give an
expected n^2 approx factor (for min cut).

------
jey
Anyone have a link to the actual paper(s)?

~~~
Natsu
I looked around, but the only links I could find appear to be selected
textbook chapters that describe old ways of doing it. Even Dr. Dobbs just
reprinted the MIT press release. So far, the only useful link I can find is to
the scheduled talk and someone already posted that.

If anyone can find anything more than this link, it would be interesting to
me, too:
[http://www.csail.mit.edu/events/eventcalendar/calendar.php?s...](http://www.csail.mit.edu/events/eventcalendar/calendar.php?show=event&id=2711)

EDIT: Incidentally that page _does_ tell us a bit more about the solution than
the press release everyone is reprinting. Specifically, it says: "We compute
these flows by treating our graph as a network of resistors and solving a
sequence of electrical flow problems with varying resistances on the edges.
Each of these may be reduced to the solution of a system of linear equations
in a Laplacian matrix, which can be solved in nearly-linear time."

------
miguelpais
As far as the article goes, this new solution could very well be just a
different version of the Simplex method, which can also solve this kind of
problem.

As some already said: better article needed.

------
gwilliams
Couldn't this problem be simplified by considering the nodes and paths to be a
resistor network, and solving for the effective resistance between the source
and drain nodes?

------
PHLITE
How about the processing overhead of converting the complete graph to a matrix
in the first place? Wouldn't this offset(some)gains of the new algo?

