
Linear Programming: How Does It Work? - luu
http://jadagul.tumblr.com/post/145201618713/linear-programming-how-does-it-work
======
ColinWright
Here is a brief summary that gives some context both for the article, and for
the discussion here.

In linear programming the solution space is a convex[0] polytope[1] - the
optimum is always at a vertex.[2]

The Simplex Method proceeds by:

* Finding a vertex

* While there exists a neighbour who is better, move to it.

Done.

The usual implementation doesn't even bother trying to find the best
neighbour, just anyone will do. The math tells us that you never get into a
dead end. In other words, if you are not at the optimum, at least one of your
neighbours is guaranteed to be better, and the pivot step of the Simplex
Method will find one.

You can refine the technique slightly by checking several neighbours and
choosing the one that gives the greatest improvement, but in practice it
doesn't save a lot of work. There is no search, no back-tracking, and no queue
of points to consider. It's not Dijkstra's algorithm in disguise - it's not
any kind of search.

In pathological cases it can take exponential time, but there are two ways to
avoid that. One is to add a little noise, the other is to choose, from among
those neighbours that are better, one at random. Other comments in this
discussion give some references that go into the detail of ellipsoidal and
interior point methods.

Simplex method for the win - it's simple to implement, robust, efficient in
practice, and applicable to a wide range of problems.

[0]
[https://en.wikipedia.org/wiki/Convex](https://en.wikipedia.org/wiki/Convex)

[1]
[https://en.wikipedia.org/wiki/Polytope](https://en.wikipedia.org/wiki/Polytope)

[2] This depends crucially on the fact that this is "linear" programming. As
you sweep a hyper-plane across a polytope, the last point of contact will
contain at least one vertex.

[3] If you've read this far and found it useful, but want more, contact me,
because I've been thinking of writing this up properly for some time.

~~~
mathgenius
> It's not Dijkstra's algorithm in disguise - it's not any kind of search.

It's a kind of superficial statement to make: "this is not that". Obviously
these things are different on the surface, but probing a bit deeper can
sometimes bring similarities out. This is more interesting to me, although
perhaps I am just getting old and too many neurons are interconnected and this
is what makes me seek the similarities between things, instead of just saying
"this is not that".

As has been noted by arghbleargh [0], the simplex method can be used to
implement Dijkstra. How sure are you that the converse is not also true? Will
you at least admit this is an interesting question? Or even the question of
how to ask this question in such a way that it is provable/disprovable. I
think this is worthwile.

Continuing to probe: Dijkstra (and dynamic programming, etc.) hinges crucially
on the idea of a semiring, or generalized distributivity [1], [2]. In the case
of a minimum weight path this boils down to noticing that a+max(b, c) =
max(a+b, a+c). The plus distributes over the max (think of the usual
distributivity: a _(b+c)=a_ b+a*c).

Convexity also has a kind of distributivity, which can be most easily seen by
taking averages: a+(b+c)/2 = (a+b)/2 + (a+c)/2\. So addition distributes
additively over "averages". And this formula extends to arbitrary convex
combinations. My spider sense is telling me that something interesting is
going on here, and there may be some more ways to link linear programming up
with dynamic programming.

[0]
[https://news.ycombinator.com/item?id=11845375](https://news.ycombinator.com/item?id=11845375)

[1] Aji, Srinivas M., and Robert J. McEliece. "The generalized distributive
law." Information Theory, IEEE Transactions on 46, no. 2 (2000): 325-343.

[2] Gondran, Michel, and Michel Minoux. Graphs, dioids and semirings: new
models and algorithms. Vol. 41. Springer Science & Business Media, 2008.

~~~
DannyBee
"How sure are you that the converse is not also true? Will you at least admit
this is an interesting question? Or even the question of how to ask this
question in such a way that it is provable/disprovable. I think this is
worthwile."

It's an interesting question, but one which has a sad answer: integer linear
programming (and other restricted forms, such as 0-1 programming) are NP-Hard.

If you could use dijkstra's algorithm to solve them, in less than exponential
time, you will have proved P=NP.

So i would go with "pretty sure".

(now, could you do it in exponential number of dijkstra invocations? Maybe?
But that is, to me, not an interesting question :P Same with could you expand
the linear programming problem into an exponential size graph that was then
solvable by dijkstra's algorithm - we already know that expspace is a superset
of p, etc)

------
swanson
This is a really cool technique to have in your toolbelt. When you are able to
model a problem in this way, you can get something that also seems magical.
I've found these kind of constraint solvers to be a nice middle ground between
naive algorithms and full blown machine learning.

A couple examples:

* Building a scheduling system for work-shifts -- add constraints for number of hours per week, double shifts, minimum roles (manager/supervision) then use wages as the cost function to minimize

* Exercise recommendations -- add constraints for calories burned, time, etc and then minimize a heuristic for "difficulty"

* Fantasy sports optimizer -- this is very niche but something I built that optimizes a fantasy team projected points based on position limits and player cost (for FanDuel)

The trouble I had was that almost all of the material and tools are very
academic focused. It was difficult for me to grok and the libraries were not
well documented.

The best I have found is Google's or-tools. The documentation is still
lacking, but I was able to understand the basics from their examples. The best
part is that there are bindings for a handful of languages so you aren't
necessarily forced to also learn some thing like R or C++ to work with it.

Start here:
[https://developers.google.com/optimization/lp/glop#output](https://developers.google.com/optimization/lp/glop#output)

and be sure to read through the Stigler Diet example to get a taste for how to
transform real-world problems into solvable linear systems.

------
mathgenius
Wait, so this is straight up gradient ascent on the vertices of the feasable
set? I always thought the simplex method was more complicated than that. I
guess you do need to maintain some kind of queue of current-best vertices.

If we squint a bit do we get Dijkstra's algorithm?

~~~
catnaroek
> Wait, so this is straight up gradient ascent on the vertices of the feasable
> set?

The simplex method literally crawls the edges of the feasible region, which is
always a convex polytope. Sadly, the simplex method runs in exponential time
in the number of variables. A more efficient alternative for large LP problems
is interior point methods:
[https://en.wikipedia.org/wiki/Interior_point_method](https://en.wikipedia.org/wiki/Interior_point_method)
. As the name suggests, interior point methods traverse the interior of the
feasible region.

To see why interior point methods are more efficient, suppose your feasible
region is a convex n-gon in XY plane, for some very large n. Your initial
point is the lowest vertex (least Y coordinate), and the objective function to
maximize is Y, so you want to reach the highest vertex. The simplex method
will literally crawl the vertices one by one until you reach the top. Interior
point methods will take a more straightforward path through the interior of
the polygon.

> If we squint a bit do we get Dijkstra's algorithm?

No. Dijkstra's algorithm operates on fundamentally discrete structures (finite
graphs), whereas the simplex method solves linear programming problems with
continuous variables (conceptually ranging over the reals, although of course
on a computer one uses floating-points or some other approximate
representation).

Integer linear programming, where all or some of the variables range over the
integers, is also possible, but it requires extending some base LP-solving
algorithm with a means to “discretize” the feasible region, such as branch and
bound methods:
[https://en.wikipedia.org/wiki/Branch_and_bound](https://en.wikipedia.org/wiki/Branch_and_bound)
.

~~~
repsilat
>> If we squint a bit do we get Dijkstra's algorithm?

> No. Dijkstra's algorithm operates on fundamentally discrete structures
> (finite graphs), whereas the simplex method solves linear programming
> problems with continuous variables (conceptually ranging over the reals,
> although of course on a computer one uses floating-points or some other
> approximate representation).

I think you've misunderestood the grandparent. Try to think of the vertices of
the simplex as vertices in a graph, and edges on the simplex as edges on the
graph. The (primal) simplex method chooses an entering variable by saying,
"which edge from this vertex improves the objective most quickly?"

The reason the graph search doesn't turn into Dijkstra's algorithm is that we
would never have any reason to backtrack. The next basis is always at least as
good as the last, so we don't "get stuck" and we don't ask whether we would
rather have gone another way earlier on.

EDIT:

> To see why interior point methods are more efficient...

I know they exist, but I've never seen a problem for which interior point
methods are actually faster in practice. My experience isn't too deep with
very large LPs, mind -- mine tend to have tens or hundreds of thousands of
variables, not millions.

~~~
mathgenius
> The reason the graph search doesn't turn into Dijkstra's algorithm is that
> we would never have any reason to backtrack. The next basis is always at
> least as good as the last, so we don't "get stuck" and we don't ask whether
> we would rather have gone another way earlier on.

I wonder if there is a sharper way of putting this.

Dijkstra's algorithm doesn't backtrack either, it is always merely propagating
a "wave-front" of current best vertices (and perhaps how we got there). And
the simplex method will also do this when presented with a connected component
of vertices all of which have the same objective value.

I feel there are some deeper connections to "dynamic programming", of which
Dijkstra's algorithm is a good example. Having trouble putting my finger on
it.

One obvious difference is that the Simplex method doesn't know the target
vertex, whereas we do know this in Dijkstra. So perhaps the (possibly non-
existent) analogy I am searching for links vertices of the Simplex method with
_paths_ in Dijkstra.

~~~
arghbleargh
There is a linear programming formulation of the shortest path problem: see
[https://en.m.wikipedia.org/wiki/Shortest_path_problem#Linear...](https://en.m.wikipedia.org/wiki/Shortest_path_problem#Linear_programming_formulation).
Indeed the vertices of the simplex method would correspond to paths, but I
think running the simplex method on the dual formulation is more akin to
Dijkstra's algorithm. Actually, I think the simplex method on the dual and
Dijkstra's are equivalent, although I did not work through the details
carefully.

~~~
mathgenius
Aha, yes, this is what I was looking for. Thankyou!

Reminds me also of using linear programming to do belief propagation and
decode sparse codes.

------
andars
Fun fact: recent notable rocket landings seem to use convex optimization to
solve the soft landing problem, AFAICT.

See here for more info:
[http://web.mit.edu/larsb/www/iee_tcst13.pdf](http://web.mit.edu/larsb/www/iee_tcst13.pdf)

> This convexification enables the planetary soft landing problem to be solved
> optimally by using convex optimization, and is hence amenable to onboard
> implementation with a priori bounds on the computational complexity and
> realtime execution time.

------
onurcel
For further reading there is an excellent resource here
[http://web.mit.edu/15.053/www/](http://web.mit.edu/15.053/www/) with a
leitmotive example and both theoretical and practical approches. It also
explains in detail the theory behind the real world implementation of simplex
method (aka revised simplex) with inversed matrix factorization methods, etc.
This is a quite old book but as far as I know even modern linear solvers use
those techniques (at least open-source ones for sure).

This is a very specific and little subset of convex optimization and yet very
powerful (and exciting).

Convex optimization methods can also be used in many non-convex situations,
for example when you have quasi-convex or log-convex functions as objective or
constraints. I recommend Boyd's excellent course
[https://lagunita.stanford.edu/courses/Engineering/CVX101/Win...](https://lagunita.stanford.edu/courses/Engineering/CVX101/Winter2014/about)
and the online free book
[https://web.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf](https://web.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf)
(warning : very addictive)

------
oselhn
When I hear linear programming I always remember this method
[https://en.wikipedia.org/wiki/Ellipsoid_method](https://en.wikipedia.org/wiki/Ellipsoid_method).
It is nice example of algorithm which is not useful for programming only for
mathematicians (I believe it was used as proof that linear programming is in
P). I think that most of the programmers will have really hard time to
implement it and also it is on average slower than simplex method despite a
fact that simplex is not in P and this algorithm is.

~~~
mschulze
It's been a while since I studied mathematics (and I wrote my thesis in
optimization), but at the time it was not clear if simplex is in P or not. I
think no one has found a selection function for variables that is always
polynomial.

~~~
oselhn
See this
[https://en.wikipedia.org/wiki/Simplex_algorithm#Efficiency](https://en.wikipedia.org/wiki/Simplex_algorithm#Efficiency).
But simplex is still widely used because real problems are usually solved in
polynomial time. You have to design your problem to not finish in polynomial
time.

But I am not a matematician this opinion it just how I understand it.

------
signa11
Anyone knows of a good introduction to karmarkar's algorithm ? Which finds
optimal solutions waay faster than simplex...

~~~
sevensor
This is supposed to be true only for very large linear programs. The simplex
method is the fastest thing in town for most LPs. Even though it's exponential
time where Karmarkar's algorithm is polynomial.

------
blahi
The OP and the discussion is sround technical implementation, but I am
interesred in the analysis part of LP.

I wanted to use LP to optimize a budget but my senior coleague said that LP is
just too damn hard in the real world. Is this true? Is there a textbook that
goes over practical applications of LP?

~~~
ColinWright
I am surprised that your senior colleague says it's "too damn hard." Are they
referring to the conversion of your data into the right format, or are they
referring to getting an implementation running? The latter is fairly straight-
forward, the former depends on how clean and convertible your existing data
are.

If your data are currently messy, you need to clean it up anyway. There will
be political issues with doing work your senior colleague says is not
possible, but if you can solve - or avoid - the political problems, learning
how to do this, and then doing it, will be good for your personal development.

Warning: I suspect the political problems will actually be the most difficult
part.

~~~
blahi
The data is squeaky clean, accounting-like data. Getting it to right format is
a few lines of R. What he says is too difficult is putting in place all the
right constraints. He says that it is very easy to miss a necessary constraint
and this will obviously blow up your solution. How can I examine the solution
to make sure it would be valid? The data consists of tens/hundreds of
thousands of line items which compete for the same budget and need to meet
some (fairly simple) metrics. Line items have a hidden dependency between
them, so presumably the solver can scrap some items which will in turn make
others (possibly the ones with the best metrics) tank too.

~~~
garethrees
The concern about missing necessary constraints seems reasonable, but then how
does the current budgeting process avoid the same concern? Perhaps it does so
by changing in small steps, so that each period's budget is similar enough to
the previous period's budget (which was known to have worked reasonably well)
that the differences can be analyzed and the consequences understood.

So something you might try, is to locate the current budget in your linear
system and then find a nearby point that improves the objective function. It
should then be simple to explain the proposed improvement ("move $A from B to
C and $D from E to F and the objective function goes up by $G") and not too
complex to examine it for missing constraints ("hang on, you can't reduce B by
$A because the supplier has a minimum order").

The idea is that instead of using linear optimization to replace the current
budgeting process (which is risky), to use it to generate proposals for
submission to the current budgeting process, and to capture the constraints
that are revealed when proposals are rejected.

(But note also that it's quite likely that a real objective function will be
nonlinear due to economies of scale, in which case linear optimization methods
won't suffice.)

~~~
blahi
Things like minimum orders are obvious enough that I will manage to figure
them out.

My colleague's concern (and I see his point) is about what happens when the
solver says "stop selling coffee machines", but when we stop, consumables drop
off a cliff and consumables is where we make the profit.

~~~
ColinWright
So there's a missing constraint that M >= S/1000 or similar, where M=machines
and S=supplies. And yes, that is an issue that needs to be considered. Over
10s or 100s of thousands of items that becomes difficult, but it's still the
case that the system you describe now is not really under control.

But perhaps your senior colleague is right, and that's just the way it has to
be. However, one person's time spent investigating this properly might be
money well spent.

~~~
blahi
Yeah, but that's the problem. There is no way to know all the relationships.
This was an obvious one, there are a lot more subtle ones. In any case, the
comments have been very helpful. A gradual implementation is the only sane way
to go. Politics are not too big of an issue, if I can make a concise case and
what's a better way to do it than a small project.

~~~
ColinWright
Documenting these relationships would potentially be a really, _really_
valuable thing to do "going forward". It helps to make explicit things that
are currently implicit, and possibly unknown. It then opens up the
possibilities of making optimisations like using Linear Programming, but at
the very least it reduces risk and helps to share knowledge.

Good luck!

------
graycat
How does it work?

For the set of real numbers R and some positive integer n, the problem starts
with a set F that is the intersection of finitely many closed half spaces in
R^n. Each such half space is from a _constraint_ that is a linear inequality
with relation either >= or <=.

Then the set F is a closed, convex subset of R^n. Set F is called the
_feasible region_.

In addition, are given a linear function z: R^n --> R, the _objective_
function. Then the problem is to find a point x in F that makes z(x) as large
as possible. Such point x in R^n is said to be _optimal_. Even if optimal
point x exists, it may not be unique. Indeed, if there is more than one
optimal solution, then there are infinitely many, and the set of all optimal
solutions forms a closed, convex subset of feasible region F.

If set F is empty, then the problem is _infeasible_ and there is no optimal
solution. Else set F is non-empty and the problem is _feasible_. Any point in
set F is said to be a _feasible_ point.

If the problem is feasible, then it may be that function z is unbounded above
on F. Then the problem is _unbounded_ and there is no optimal solution. If the
problem is feasible and not unbounded, then it is feasible and _bounded_.

It is a theorem, not trivial to prove, that if the problem feasible and
bounded, then it has an optimal solution.

Given points x, y in R^n and real number t in the interval [0,1], the point

w = tx + (1-t)y

is a _convex combination_ of the points x and y. Easily the set of all convex
combinations of points x and y is closed and convex.

Suppose C is a convex subset of R^n, x is in C, and x is not a convex
combination of two other points in C. Then point x is an _extreme_ point of
convex set C.

It is a theorem that if the linear program has an optimal solution, then it
has at least one optimal solution that is an extreme point of the feasible
region F. So, in looking for optimal solutions, it is sufficient to look only
at extreme points of the feasible region F.

It is a theorem that the feasible region F has only finitely many extreme
points.

The classic algorithm for finding optimal solutions is the Dantzig _simplex_
algorithm.

Intuitively and geometrically the algorithm starts at an extreme point of the
feasible region and moves, one _iteration_ at a time, to _adjacent_ extreme
points (and adjusts what it regards as the _basic_ variables) until it
satisfies a sufficient condition for optimality. With appropriate refinement
of the simplex algorithm, it is always able to achieve this sufficient
condition.

Algebraically, the simplex algorithm is a small but clever modification of
Gauss elimination for solution of systems of linear equations. Each iteration
of the algorithm corresponds to some elementary row operations on the system
of linear equations.

There are more technical details, but the above is a good outline and overview
of the core math.

Some of the consequences of the linear programming and some of the properties
of the simplex algorithm yield a nice collection of inequalities, theorems of
the alternative, the saddle point theorem and optimal strategies of two person
game theory, etc. E.g., in the game paper, scissors, and rock, play each of
the tree moves with probability 1/3rd and independent of all the past. Then in
the long run will break even.

Is the simplex algorithm a _polynomial_ algorithm? Apparently not: There is a
classic example problem of Klee and Minty that shows that at least one version
of the algorithm is exponential. In practice, however, the simplex algorithm
runs in time linear in the number of half spaces. There are more details in
some now classic work of K. Borgward.

Is there a polynomial algorithm for linear programming? Yes.

In practice, the simplex algorithm has many refinements and extensions.

If we ask that the components of an optimal solution x be integers, then the
problem is _integer linear programming_ and is in NP-complete. One solution
approach is via branch and bound where we solve an ordinary linear program at
each node of a large tree. But that approach is not a bad as it might sound
since usually one more solution is just a little work from what was left over
from the last solution -- the simplex algorithm has lots of such
modifications.

Another problem is least cost flows on a directed graph (network) with on each
arc a maximum flow and a cost per unit of flow. There the simplex algorithm
takes on a special form that commonly permits astoundingly fast solutions of
astoundingly large problems. Moreover, if the arc capacities are all integers,
then, starting with an integer feasible solution, the simplex algorithm
maintains integer solutions and will find an optimal integer solution -- we
get integer programming for no extra effort.

We can also use linear programming as a tool in solving convex problems,
nonlinear problems, make use of Lagrange multipliers, achieve the Kuhn-Tucker
necessary conditions, etc.

Some of the promise of linear programming is from how common are linear
expressions, especially in business, accounting, budgeting, business planning,
etc.

Dantzig worked out his simplex algorithm in the late 1940s at Rand Corporation
working on the logistic problem of the USAF of how best to deploy a military
force rapidly to a distant location.

An early commercial success was feed mixing for livestock: Given a list of
feeds and the nutritional content and prices of each and given what total
nutritional content want for the livestock, find how much of each feed to buy
to feed the livestock at least cost. There are rumors that Ralston Purina runs
linear programs daily when mixing feeds.

Another problem of early high interest was how to ship from some factories to
some warehouses to get each warehouse what they needed, take from each factory
only what they had, and minimize total shipping cost. Well, now we know that
this is a special case of the least cost flow problem that we can do so well
solving. An early statement of this problem, the _transhipment_ problem,
resulted in a Nobel prize in economics.

Another early success was operating an oil refinery: So, given crude oil
supplies, which differ in chemistry and price, and refinery products, which
differ in price, find what crude oil supplies to use and what products to make
to maximize profit. Apparently eventually this problem was refined to make use
of nonlinear programming, e.g., by C. Floudas.

One use of linear programming is for scheduling. E.g., consider the original
FedEx: Given 90 US cities with loads to be picked up and loads in Memphis to
be delivered. Given 33 airplanes, say, all Dassault DA-20 Fanjet Falcons. Now,
way which airplanes go to which cities in what order to move all the loads,
obey all safety and engineering constraints, arrive within specified time
windows at each of the airports, and minimize direct operating costs.

So, it looks like a huge, non-linear, integer problem. But, can attack the
problem in two steps where the first is just some enumeration likely doable
with some nonlinear arithmetic and the second is just some 0-1 integer linear
programming which likely has an okay chance of good results.

So, the first step is just to get a list of what a single airplane might do,
that is, what cities it might visit and in what order. So, just write the
code: Try all the reasonable sets of cities in reasonable order (e.g., don't
waste time on evaluating absurd cases like having one plane serve NYC,
Seattle, Miami, Chicago, and LA, in that order). Then for each reasonable
candidate for what one plane might do, do the nonlinear arithmetic to find the
costs. For a given set of cities served, keep only the one order of the cities
that minimizes the costs. The result will be a list of some 100,000 or so
cases for what one plane might do, the cost of the case, and the cities served
(right, cheat a little -- argue that if an plane stops at a city, then it
delivers everything the city gets and picks up everything the city has to
ship, etc.). So, with 90 cities and 100,000 cases set up a linear program with
100,000 variables and 90 constraints. That is, for each of the 90 cities, want
some one of the 100,000 cases to serve that city. So, each of the 100,000
columns has just 0s and 1s, a 0 if that case does not serve that city and a 1
if it does.

This is an old approach and is called 0-1 integer linear programming set
covering. Yes, likely it is in NP-complete. But there is a also a good chance
that save 5, 10, maybe 15% of direct operating costs over other approaches.

Can do a lot of scheduling problems with such approaches.

For linear programming software, can consider, say, R. Bixby and his work
Gurobi.

At times I have had good success with the old IBM Optimization Subroutine
Library (OSL). E.g., once with some _Lagrangian relaxation_ , I got a feasible
solution within 0.025% of optimality to a 0-1 integer linear programming
problem with 40,000 constraints and 600,000 variables in 905 seconds on a 90
MHz PC!

