 An Intro to Integer Programming for Engineers: Simplified Bus Scheduling 122 points by dget on Nov 17, 2016 | hide | past | favorite | 23 comments Integer programming often seems magical. I remember early grad school and seeing "Optimal and Near-optimal Global Register Allocation Using 0–1 Integer Programming" by Goodwin et al, where the very crunchy problem of register allocation was solved largely by punting it to a ILP solver. Gotchas abounded but it was eye opening to see how many hard problems could be solved (or at least adequately approximated) in this fashion.I think this is one of those techniques that every serious computer scientist should have in their toolbox. Someone who knew some statistics, a smattering of something like SMT (enough to drive Z3) and some linear programming could routinely resemble Gandalf at any shop that has people struggling with difficult problems.The main thing is not to turn into a cookbook guy. I've known a few people whose ability to just build an actual algorithm to solve some interesting problem has atrophied since they reach for a ILP solver the minute anything gets difficult. Agreed that it seems magical. This is one of the areas where the open-source tools are way behind commercial solvers. Cplex and Gurobi are really impressive pieces of software. Doubly magical now that computers (and solvers) are so fast. A few times I've thought, "Oh, I could reduce this to a max-flow problem, and write/dig up an algorithm to solve it" before realising it'd just be easier to write it as a linear program (or an integer program "just in case".)And then if weird constraints come along that broke the max-flow reduction, I could usually shoehorn that into the formulation.Can't remember how to write Dijkstra's algorithm? Meh, integer programming it is. Want to write a sudoku solver and you can't be arsed with dancing links or backtracking or prolog or... Whatever, CBC will do it. It'll be nasty and a hell of a lot slower than doing it "the right way" in a fast language, but it might not be slower than doing it in Python or Javascript. The latter is what scares me. I solved a 'bit twiddling' problem using Z3 and felt a little ashamed. It is reminiscent of Brewster's letter to Martin Gardner "Feeding a recreational puzzle into a computer is no more than a step above dynamiting a trout stream". It's very easy to let skills atrophy and resort to using black boxes we don't really understand (well, I speak for myself, anyhow). When Z3 or cplex or whatever tool du jour gets wedged on a problem, I am not sure I have the background in these tools to get unstuck. With Z3, unfortunately, the issue is that we are really bad at gauging what makes for a hard SMT problem. Often, knowing that a particular problem is hard is the same as solving it! That said, you can do some really interesting things with SMT solvers if you can formulate the problem the right way.Gurobi, CPLEX, and friends are less magical if you have enough mathematical maturity. Typically, if you are comfortable with linear algebra (a subject woefully ignored in most undergraduate curricula), you can work with them. In my experience, working with optimization engines is a two step process. First, you need to figure out how to formulate the problem in a smart way (typically this means that it's convex with constraints in a certain format), which is normally the hard part. Then you scroll through the list of available algorithms that the engine has at your disposal and pick one with properties that you like (they often have suggestions if you have no strong preference). Normally, if you can formulate the problem so it's convex, you will at least have a good shot of doing well with your optimization. I have a Phd in optimization and I still think Cplex and Gurobi seem magical. The devil is in the details. Open source solvers are comparatively not very good. Oh for sure. You could go down a real rabbit hole if you had to implement that stuff yourself. I wrote a sudoku solver in my first year of college using some heuristsics and some sort of branch-and-cut algorithm, in java. It used bit masks to keep track of the state and possiblities, I was pretty proud, spent like weeks on that.Recently I re-implemented in Pulp in an afternoon using ILP.It's similarly fast, both can solve similar sets of problems. But the ILP solution was so much easier and shorter. > I wrote a sudoku solver in my first year of college using some heuristsics and some sort of branch-and-cut algorithm, in java.What class of cutting planes did you use in your B&C algorithm (full disclure: I do my PhD on cutting planes, thus I'm interested)? I mean branch-and-bound - i.e. just reject when solution becomes infeasible and back-track. No cutting planes in first year college. ;)I think the heuristics may have been based on selecting cells that had the least possible solutions, in order to quickly 'zoom in' on the areas where you can quickly prune infeasible solutions. C-PLEX and Gurobi -- R. Bixby. I had the opportunity of interacting with Gu and Bob Bixby of GuRoBi(& CPLEX). Some of the most talented OR scientists. Thanks. You are ahead of me. I'd heard of Bixby's role but not Gu's. Whats the distinction between Integer Programming and Constraint Programming? From a cutlery glance they appear to be the same solution to the same problem... Integer programming is effectively searching the edge of a convex polygon (polytope for higher dimensions) for an optimal value function, defined in terms of the coordinates of the points in space. The inequalities are planes that subdivide the solution space into permitted and non-permitted domains.Constraint programming generates lots of potential solutions (combinatorially) and prunes the search tree to make the large numbers tractable.The intuitions behind the two techniques are quite different. > Integer programming is effectively searching the edge of a convex polygon...Great definition of linear programming. In a sense the thing that makes integer programming hard is that the feasible region is not convex -- for two feasible solutions x and y, ax + (1-a)y for 0<=a<=1 is not necessarily feasible.I'd also say that integer programming is a kind of constraint-based programming extended with the addition of an objective function -- we're not just looking for any satisfactory answer, but an answer with a cost or value that cannot be improved upon. You're definitely right that "things called constraint-based programs" tend to be solved in different ways, though (and the languages they're expressed in tend to be nicer, too.) I cant think of any serious constraint solvers without some sort of optimization support, are there any? I think your definition of Integer programming is not correct. Integer programming or Mixed Integer Programming assumes that the solution has the all or some variables, integers. Most of the Integer programming models are usually using binary variable to indicate decisions.Further, I think you tried to describe Linear programming in the beginning of your post. But Linear programming searches the vertices of the polytope and not the edges.The biggest issue with Integer programming is the lack of convexity of the solution space when compared to Linear programming. So the approach for solving them is very different compared to Linear programming. If I made any mistakes in my post, I welcome to be corrected. Constraint programming can be viewed as looking for feasible solutions to optimization problems.So, what is optimization? For positive integers m and n, the real numbers R, and functionsf: R^n --> Rg: R^n --> R^mfind x in R^n to (problem MP1)maximize f(x)subject tog(x) <= 0where g(x) is regarded as an m x 1 matrix and the 0 is also m x 1 but with all its components 0.The constraints are theg(x) <= 0We say that we have m constraints.The subject is essentially the same usingg(x) >= 0org(x) = 0instead or minimizing instead of maximizing.Point x in R^n is feasible provided it satisfies the constraints, that is,g(x) <= 0The feasible region is the set F of all feasible x.A point x is optimal if it is feasible and makes f(x) as large as possible. That is, x is optimal provided for any feasible y we have f(x) >= f(y).If one or more of the functions f or g is non-linear, then we have non-linear programming. Here we may use the Kuhn-Tucker conditions to seek a point that satisfies necessary conditions for optimality. For an algorithm, we may take gradients of the constraints to set up a linear program, solve that, do a line search, and try again.If functions f and g are linear, easily written using matrix algebra, then the optimization problem is linear programming (here programming is used in the British sense of operational planning).Function f is convex provided for any x, y in R^n and any t in [0, 1] we havet f(x) + (1 - t) f(y) >= f( t x + (1 - t) y )That is, in a graph of f, a straight line between (x, f(x) and (y, f(y) ) is on or over the graph and never under the graph. So, the straight line over approximates the function.If f is convex, then it is continuous (proof is not very difficult but is in Fleming, Functions of Several Variables). IIRC convex implies differentiable almost everywhere (maybe proof in R. T. Rockafellar, Convex Analysis). The epigraph is the graph of f and everything above it, and it is a convex set so has supporting hyperplanes and is the intersection of all the closed half spaces from its supporting (tangent) hyperplanes (IIRC, proof in Fleming). Such a supporting hyperplane is called a subgradient and can be quite useful in optimization, e.g., Lagrangian relaxation. There is a closely related, really nice result in Hilbert space: Every non-empty, closed convex subset has a unique element of minimum norm. That can be very nice to know!To be more clear, a convex function, while continuous, need not be differentiable. Why not? Intuitively, at a point, the function makes a sharp turn. Similarly, a cube is a convex set but has 10 sharp edges and 8 sharp points that have supporting (tangent) planes, but these planes are not unique. So, a subgradient at a point would be a gradient and a tangent plane if at that point it was the only tangent plane. So, intuitively, a subgradient is nearly a tangent plane.Function f is concave provided that -f is convex. If f is convex and -f is convex, then f is linear -- so, intuitively, a function that is convex (or concave) is half of being linear.We have some astoundingly powerful methods for solving linear programming problems.A significant fraction of the Nobel prizes in economics have been from economic models based on linear programming.Some special cases, e.g., least cost network flows, maximum matchings, are special cases with even better results for the algorithms.If in MP1 function f is concave and funtion g is linear, then again we have some good methods.If in MP1 function f is quadratic and concave and function g is linear, then we have the problem of Markowitz in investment portfolio management and his Nobel prize in economics.If we are maximizing a convex function subject to linear constraints, then we have, IIRC, a problem in NP-complete.If in linear programming we require in addition that some or all of the components of x are required to be integers, then we have some integer constraints and a problem in integer linear programming (ILP). ILP was one of the earliest problems shown to be in NP-complete.So, that's a nutshell version of optimization.Can also extend to accommodating randomness and doing much the same over time.The over time part is control theory, e.g., how to fly an airplane from one point to another in least time -- there are books, e.g., by Athans and Falb. Mike Athans was long at MIT and Falb, at the Brown Division of Applied Mathematics. Control theory had a flurry of interest in the space program. There is an extension, differential game theory, which is like how best for a missile to chase an airplane where the airplane is best trying to avoid the missile. Typically in control theory and differential game theory we look for necessary conditions for optimality which usually turn out to be in terms of Lagrange multipliers. Optimization over time with randomness is essentially stochastic optimal control, and the main technique is a version of dynamic programming. An early proponent was Richard Bellman.The Black-Scholes Nobel prize work in option pricing is a relatively simple case of stochastic optimal control.So, in terms of MP1, constraint programming is, say, find x in R^n so thatg(x) <= 0that is, just find a point in the feasible region R.An example might be, yesterday killed 5000 hogs and hung and chilled the results. Today cut the hogs and put the pieces in boxes in 18 wheel trucks, about 40,000 pounds of load per truck, for delivery to customers about 900 miles away.Have only three bays in the loading dock. So, want to know the order in which to load the trucks to get them all loaded in time.So, have a constraint satisfacation problem and not really an optimization problem.Fundamentally, i.e., in the sense of computational complexity, finding a feasible point is as difficult as finding an optimal point. So, really, fundamentally, constraint programming is no easier than optimization.IIRC at one time SAP and ILOG were interested in CPLEX as means of solving constraint satisfaction problems. So, to solve a constraint satisfaction problem, start with some optimization software. I strongly encourage using Julia for anyone applying mathematical optimization. It is (through JuMP, for example) one of the areas in which it really stands out. What the OP is describing is, except for the coding, some now classic applied math, i.e., operations research. That applied math got used much less often than one might have expected because the (A) data gathering was too much pain, expense, and botheration, (B) there was too much software to write, and writing it was too clumsy and expensive, (C) the computing was too slow and too expensive, and (D) even when got a good solution ready for production, the production situation commonly changed so fast that the work of (A)-(C) could not change and revise fast enough to keep up. And, of course, the custom, one-shot software was vulnerable to bugs. Net, in practice, a lot of projects failed. Mostly successful projects needed big bucks and lots of unusually insightful sponsorship high in a big organization.But, now (A)-(D) are no longer so difficult. This should be the beginning of a new Golden Age for such work.Sure, since the results of such optimization can look darned smart, smarter than the average human, might call the work artificial intelligence. Really, though, the work is mostly just some classic applied math now enabled in practice by the progress in computer hard/software.The OP is a nice introduction to vehicle routing via integer linear programming (ILP) set partitioning. For the linear programming and the case of ILP, I'll give a quick view below. But, now, let's just dig in:Here is an explanation of the secret approach, technique, trick that can work for vehicle routing and many other problems: The real problems can have some just awful non-linear cost functions, just absurdly tricky constraints, e.g., from labor contract work rules, equipment maintenance schedules, something can't do near point A near lunch, even handle some random things, even responding to them in real-time (dynamically), etc. yet can still have a good shot at getting a least cost solution or nearly so. The "nearly so" part can mean save a lot of money not available otherwise. When there is randomness, then try to get least expected cost.So, first, the trick is to do the work in two steps.The first step call evaluation and the second, optimization.From 50,000 feet up, all the tricky, non-linear, goofy stuff gets handled by essentially enumeration in the first part leaving some relatively simple data for the optimization in the second part.In practice, this first step typically needs a lot of data on, say, the streets of a city and requires writing some software unique to the specific problem. The second step, the optimization, may require deriving some math and/or writing some unique software, but the hope is that the step can be done just by routine application of some existing optimization software.The OP mentions the now famous optimization software Gurobi from R. Bixby and maybe some people from Georgia Tech, e.g., from George Nemhauser and Ellis Johnson (long at IBM Research and behind IBM's Optimization Subroutine Library (OSL) and its application to crew scheduling at American Airlines).First Step.Suppose you are in Chicago and have 20,000 packages to deliver and 300 trucks. Okay, what trucks deliver what packages to make all the deliveries on time, not overload any trucks, and minimize the cost of driving the trucks? You do have for each package the GPS coordinates and street address. And you have a lot of data on the streets, where and when traffic is heavy during the day, etc.Okay, let's make some obvious, likely doable progress: Of those 20,000 packages, maybe have only 15,000 unique addresses. So, for each address, bundle all the packages that go to that address. Then regard the problem as visiting 15,000 addresses instead of delivering 20,000 packages.So, you write some software to enumerate. The enumeration results in a collection of candidate routes, stops, and packages to be delivered for a single truck. For each of those candidates, you adjust the order in which the stops are made to minimize cost -- so here get some early, first-cut, simple optimization. You keep only those candidates that get the packages delivered on time, meet other criteria, etc. You may have 1 million candidate single truck routes. For each of the the candidates, you find the (expected) operating cost.So, suppose you have n = 1 million candidate single truck routes.Also you have m = 15,000 addresses to visit.So, you have a table with m = 15,000 rows and n = 1 million columns. Each column is for some one candidate route. Each row is for some one address. In each column there is a 1 in the row of each address that candidate route visits and a 0 otherwise. One more row at the top of the table is, for each column, the operating cost of that candidate route.So, you have a table of 0's and 1's with m = 15,000 rows and n = 1 million columns. You have a row with 1 million costs, one cost for each column.Again, you have 300 trucks. So, you want to pick, from the n columns, some <= 300 columns so that all the m addresses get served and the total costs of the columns selected is minimized. That is, if add the columns as column vectors, then get all 1's.Second Step.Well, consider variables x_i for i = 1 to n = 1 million. Then we want x_i = 1 if we use the route in column i and 0 otherwise. Let the cost of the route in column i be c_i. We want the total cost (TeX notation):z(x) = sum_{i = 1}^n x_i c_ito be minimized. So, right, we take the big table of m = 15,000 rows and n = 1 million columns and call it m x n matrix A = [a_{ij}]. We let m x 1 column vector b have all 1's. We regard x as n x 1 where in row j = 1 to n is x_j. Then, we get linear programminimize z(x)subject toAx = bx >= 0So, this is a case of linear programming.Except in our problem we have one more constraint -- each x_i is 0 or 1, and in this case our problem is 0-1 integer linear programming.Linear Programming.In linear programming with n variables, with the real numbers R, we go into the n-dimensional vector space R^n. TheAx = bx >= 0are the constraints, and the set of all x that satisfies those is the feasible region F, a subset of Rn.In R^n, a closed half space* is a plane and everything on some one side of it.Then F can be regarded as an intersection of m closed half spaces. So, F has flat sides, straight edges, and some sharp points (extreme points).Well, if there is an optimal solution, then there is an optimal solution at at least one of those extreme points. So, the famous Dantzig simplex algorithm looks for optimal solutions in iterations were each iteration starts at an extreme point, moves along an edge, and stops at the next extreme point. That's for the geometric view; the algebraic view is a tweak of the standard Gauss elimination algorithm.Linear programming and the simplex and other algorithms have a huge collection of nice properties, including some surprisingly good performance both in practice and in theory.But, asking for each x_i to be an integer is in principle and usually in practice a huge difference and gives us a problem in NP-complete -- at one time, this was a huge, bitter surprise.Warning: At one time, the field of the applied math of optimization in operations research sometimes had an attitude, that is, placed a quasi-religious importance on optimal solutions and was contemptuous of any solutions even 10 cents short of optimal. Well, that attitude was costly for all concerned. Instead of all the concentration on saving the last 10 cents, consider saving the first \$1 million. Commonly in practice, we can get close to optimality, may be able to show that we are within 1% of optimality, so close the rest wouldn't even buy a nice dinner, and see no way in less than two weeks more of computer time to try to get an optimal solution.So, concentrate on the big, fat doughnut, not the hole.How to solve ILP problems is a huge subject -- e.g., can start with George Nemhauser -- but a major fraction of the techniques exploit some surprisingly nice properties of the simplex algorithm. Right, likely the best known approach is the tree search technique of branch and bound. I took a number of Operations Research courses in undergrad, and one course was almost entirely on Linear Programming and Integer Programming. I strongly recommend engineers to get familiar with them because they are powerful tools for solving a whole class of really tricky problems. Nice post but sorry guys, never been a fan of integer programming. Too involved with genetic algorithms Search: