Hacker News new | past | comments | ask | show | jobs | submit login
Using mixed integer programming to assign air cargo to flights (flexport.engineering)
199 points by mattmarcus on Jan 27, 2020 | hide | past | favorite | 56 comments



If anyone is interested in learning more about MIP and other discrete optimization topics, there is a great MOOC on coursera (link: https://www.coursera.org/learn/discrete-optimization/). Lectures are quite engaging, delivered by enthusiastic and a bit eccentric prof. Van Hentenryck. Also there are no stupid quizzes, just 5 problems (with 6-8 instances each, ranging from comparatively small to huge) that you need to solve (in any programming language) in order to pass. The only drawback is significant investment of time it requires.


@dunkelheit, thank you for the link to the course. I'll give it a try!


In my experience, each sufficiently complicated model becomes non-linear in some respect. For example, you might want to work with margins for time-slots that are non-linear but smooth.

So, even though I've worked with constrained linear programming in the past, I tend to prefer algorithms with meta-heuristics, such as simulated annealing or Tabu search. Although this might not provide the 'best' solution, it provides a wider range of modelling tools.

To elaborate a bit on a use-case. Lets say we want to plan a high-school roster. Teachers might have a maximum of 8 hours per day of work, but if we make a hard cut-off at that time, we might miss an interesting 8:15 schedule that gives a teacher more time to have proper lunch. If, furthermore, we do not break the 40 hr./week rule, it might be a workable schedule. Also, it provides the search algorithm a usable gradient, so the search space becomes smooth and easier to navigate. Finally, if we provide several top solutions, we give a human planner information on how the problem is (over-)constrained.


There are also ways around this by suitable generalizations of MILPs, namely mixed integer comic programs (MICPs), which are about as fast to solve in practice. This generalization allows you to use nonlinear (convex) functions in the objective and in constraints. Coupled with the integrality constraints, almost all nonlinear problems I’ve encountered can be written as MICPs.


mixed integer convex programming, not comic (unfortunately; although I hope someone takes this as a challenge). Or maybe you meant conic programming? As in MISOCP


I did mean to write mixed integer convex optimization; on the other hand, conic programming includes convex programming as a special case (and vice versa in the case where cones are convex). Most convex problems we solve (e.g. When written in a high level DSL like CVXPY) are reduced to conic forms (such as SOCPs, but also SDPs, etc) before they are solved.


Oh, oops, I just realized I actually wrote "comic," oops! (My phone really does not like the word conic for whatever reason...) Would be a fun (and uhh, interesting?) challenge indeed, whatever that would entail :)


It fits in well with your username! There have been a number of (computer) programming environments that used a comic-strip-like format to depict a sequence of events; I think some are profiled in Watch What I Do and/or Your Wish Is My Command, which I don't have with me at the moment.


Could you handle your example in MIP through the objective function, making time outside normal hours expensive, but balancing that with a positive value for lunchtime? Possibly with an integer value limiting the number of days where normal hours can be violated?


In addition to this remarks of the sibling comment, we don't always have a well-understood objective function. For example, how much is lunchtime worth compared with time outside normal hours? What we can say is that there is a certain 'badness' to it, which should increase exponentially or polynomially as we thread further outside of our preferred domain. These are known as soft constraints.

A fundamental problem with soft constraints in MIP is that we cannot create cuts in the conflict graph. Technically, everything conflicts with everything else, but at very high badness. So, we then have to decompose the problem in a preconceived way, such as on geographical boundaries, time boundaries or using heuristics. This engineering can be challenging, especially given that MIP is often hard to debug.

So, like the sibling comment: I prefer meta-heuristics over constraint logic programming in many cases, but I do not deny that CLP/MIP can be very useful as well.


Aren't CLP/MIP very different mathematically/programatically?

I know they are more similar to each other than meta-heuristics, but I'm not sure shy.

Assuming your linearization is good, at least you'll get a global optimum and the MIP gap. With meta-heuristics you have no clue where you could end up right?


It is sort of trivially true that any nonlinear model in practice can be approximated by a sufficiently complicated linear model. So yes, probably. However that is not without cost. It reduces the amount of computer time required and increases the amount of human interpretation and attention required (+ increased time to linearise the original model). On the face of it that is a questionable trade.

My anecdotes are somewhere close to the grandparent's where I'd prefer to see the heuristic method tried first before a linear model is attempted. If nothing else implementing a GA or simulated annealing algorithm is generally very cheap and debuggable in a way that moderately complicated linear models aren't. I'd guestimate that there are 5-10x more programmers who can implement a genetic algorithm than could tackle mixed integer optimisation. Debugging AMPL code is not a straightforward or satisfying experience.


> I'd guestimate that there are 5-10x more programmers who can implement a genetic algorithm than could tackle mixed integer optimisation.

You don't need 5–10× more wizards, though; you only need one wizard. If her ILP optimal solution is 0.1% better than the GA heuristic solution — which it might not be! — that might mean 3% higher profits for your company for the year. We are not talking about riveting sheet metal here, where productivity is roughly proportional to the number of workers.


:o That is cheating! You can't assume that one solution is better than another then use that assumption to support the conclusion that it is better. There is no particular reason to believe the ideal solution to the linear approximation is better than the approximate solution of a non-linear model.

And shrugging off the people who understand the model as 'wizards' shows the pretty neat cultural problem - people start to believe they can't and won't understand why they are getting solutions that they get. Any fool can understand GAs and get a feel for the risks involved.


The meaning I intended was not the meaning you read.


Yes, as I explained in https://news.ycombinator.com/item?id=22157419 you can model literally anything in MIP if it's in NP, but the structure of the problem may or may not peek through enough for your solver to run efficiently. Some MIP solvers are effectively alien technology from the future, so this can be worthwhile, but it isn't always.


Hi all, author here. I'm very humbled that this made it to the front page! A few things I want to say:

1) Most of this work is a simplified copy of the papers I linked to. Special thanks to James Bookbinder and his team at University of Waterloo.

2) I'm an OR novice, and this is my first optimization project. I feel now that I can roughly model the flight assignment domain, but what the solvers do is a mystery to me, and when I read about Lagrangian relaxation and column generation, I'm lost. Fortunately I haven't needed those techniques in this project.

3) The mathematical model, when written down, looks mystifying even to me, the author, and that's unfortunate. My goal in writing this post was to reduce the mystery and explain the model in plain English (plus math notation), but in the end I'm afraid it still looks eye-glazingly complicated. Data science can be all too happy to cloak itself in mystery by writing down what are actually pretty basic equations. I don't have a good solution to this.

Most of all, I got such joy out of building the model one constraint at a time and seeing the solver follow my directions and spit out optimal solutions. I couldn't believe it worked. It was like I discovered electricity. Lastly, much credit goes to Flexport leadership for allowing me, an OR novice, to embark on this optimization project.


Heya! Very nice and congratulations on the front page :)

On (2) that's the great part about solvers, is that they're essentially quite incredible black boxes. Most people who actually do optimization theory also don't really know how they work. (My money is on black magic for a number of cases.) Kidding aside, writing one is always informative and interesting (and, with languages like Julia, surprisingly not complicated).

3) My suspicion is that matrix notation would actually improve the end result quite a bit. A lot of the constraints you've written down have very common structure (e.g., the total weight constraint can be written as y = X'g, where m is the vector of weights, or the required assignment constraint, X1 = 1, where 1 is the all-ones vector).

Writing this out and the corresponding descriptions next to it would make it much easier to parse. E.g., for the above cases

  min tr(CX) + c'y
  s.t. X1 = 1    (all items need to be shipped)
       X'g = y   (y is the total weight on each ULD)
       etc.
Congrats again on the front page and the article!


> Most of this work is a simplified copy of the papers I linked to

To me, those are probably the most interesting category of blog posts. Some interesting technical content that isn't accessible to a layperson in the literature.

> My goal in writing this post was to reduce the mystery and explain the model in plain English (plus math notation), but in the end I'm afraid it still looks eye-glazingly complicated.

In places maybe, but not overall. You'll find that as you learn, the more deeply you understand the subject, the more simply you'll be able to explain it. That's not true the other way round - not everyone with deep understanding of a subject can explain things well!


Congrats on the post. Enjoyed it.

I've watched with interest as INFORMS has tried to rebrand OR as data science, but this is the first time I've seen anyone refer to it as such in the wild.

I hope you're able and encouraged to develop your interest in this area. It's actually all over the place (I'm currently working in rail), usually cloaked by a less sexy job title :)


I love MIP. It's super useful.

I like to think that MIP is a framework that meets halfway between a natural modeling language and a natural solution language. The reason it's good is that the framework naturally lends itself to abusing linear programming as a subroutine, which can cut out a large amount of search state space. And you tend to find you can encode many problems without too much difficulty with just a handful of tricks.

You can encode a great deal into MIP. Piecewise linearized versions of non linear problems, or logical constraints.

Some interesting applications I've seen or played with:

- Verifying neural nets https://github.com/vtjeng/MIPVerify.jl

- Solving the classical Ising Model http://www.philipzucker.com/solving-the-ising-model-using-a-...

- Global Robot arm kinematics http://www.philipzucker.com/2d-robot-arm-inverse-kinematics-...

- model predictive control with collisions constraints http://www.philipzucker.com/flappy-bird-as-a-mixed-integer-p... See Russ Tedrake's group https://groups.csail.mit.edu/locomotion/pubs.shtml


MIP is NP-complete; you can reduce any problem in NP to it. But there is a large and interesting set of MIP problems where the continuous relaxation to ordinary linear optimization ("linear programming" in the jargon, although that's as misleading a term as "analog computer" or "military intelligence") gives you enough interesting information about the MIP problem to solve it enormously more efficiently than you can with a generic SAT solver.

One of the notes in Dercuano is an overview I put together of the field last year: http://canonical.org/~kragen/dercuano-20191230.tar.gz file notes/linear-optimization-landscape.html. It seems that the best free-software solver is COIN-OR CBC, and the best free-software modeling language is GMPL, aka GNU MathProg, which is compatible with the popular proprietary linear-optimization modeling language AMPL, and includes its own somewhat weaker solver GLPK; it's much easier to get GLPK solving a problem than CBC, but the relevant incantations are in the note. But some of the proprietary solvers are much better; most of them are available on the NEOS Server.

I didn't evaluate embedded DSLs like Pyomo in any depth, unfortunately.

I'm somewhat disappointed with the Flexport post, which I feel is badly formatted and uses needlessly obscure notation and then never gets around to actually writing down a model in MathProg or Pyomo or anything similar. But I guess my own note is only a little better, and the Flexport article at least gives a MIP model of a nontrivial problem. (There are many more such examples in the GNU MathProg distribution, and MIPLIB has a wealth of extremely nontrivial ones.)

Mathematical optimization in general (optimization in the sense of minimizing a possibly constrained function, not in the sense of making code run faster) amounts to programming at a higher level; I think it was Norvig that described it as "the ultimate in agile software development", because you basically just write the tests. And linear optimization is the best-developed subfield of mathematical optimization: linear solvers can solve enormously larger problems than more general solvers.

(There's also an inferior PDF rendering of Dercuano for cellphones that can't handle tarballs: http://canonical.org/~kragen/dercuano.20191230.pdf )


Embedded DSLs in Python sometimes have a lot of overhead because of the way large expressions are built with operator overloading (x + y + z + ...), creating lots of intermediate objects.

But there are solver-specific alternatives, such as for Gurobi or MOSEK that provide good performance, as well as vector-oriented modeling such as cvxpy.

Myself, I'm partial to Julia's JuMP as an embedded DSL that has good performance, a general purpose language and "nice" syntax, that is comparable to algebraic modeling languages.


> Embedded DSLs in Python sometimes have a lot of overhead because of the way large expressions are built with operator overloading (x + y + z + ...), creating lots of intermediate objects.

Is this really a concern in practice? My mental model is that the algebraic model is, you know, half a page to 10 pages of code without any loops or recursion, and you evaluate it to get an MPS file, and you feed that MPS file to your solver, which then chews on it for the next 15 seconds to 15 days. It's hard to imagine that Python's dynamic dispatch overhead would contribute more than a few milliseconds to this multi-day process. What am I missing?


There are low-latency applications of MIPs where such overheads matter. Sometimes one needs to solve small MIPs very often under strong latency requirements.


There's always rewriting in other languages or transpilation. If a framework is so crucial, not just for prototyping, perhaps the runtime components (as opposed to model building) could be ported to something like C, C++ or Rust.


True. For some classes of problems, code generation is actually used here: https://cvxgen.com/docs/index.html


Indeed there are, but you don't to re-run the Python code expressing the model to do that. You build the model once, then tweak the numbers in the constraints and weights.


I can't find a benchmark right now, but the specific issue with Python in this case was not dynamic dispatch, but memory allocation and GC.

And yes, that lead to having the model creation time dominate the solve time by an order of magnitude. So it was a large-but-easy problem. Happens in particular when you use LP or QP, as opposed to MIP.


Actually, see the issue described here: https://stackoverflow.com/questions/38434300/why-is-quicksum...

Using Python sum() is quite slow, so the packages provides a kind of lazy quicksum().


Edit: formatting

> badly formatted

I apologize for this. Flexport uses the Medium.com platform, and I couldn't find any better way to write math notation than embedding LaTeX gists.

> needlessly obscure notation

I agree 100%! I would like to hear ideas about how to eliminate this notation. I feel it's keeping too many people away from the field.

> never gets around to actually writing down a model in MathProg or Pyomo or anything similar

Again, spot on. The post was long enough without explaining how to do it in Pyomo, so I'm saving that for part 3.


> I agree 100%! I would like to hear ideas about how to eliminate this notation. I feel it's keeping too many people away from the field.

I didn't mean you shouldn't use the standard mathematical notation, which seems to be what you're talking about. I meant you should use it in the standard way. Don't use superscripts that are neither powers nor variables; for example, don't use U^M to mean "max weight capacity" and U^V to mean "max volume capacity". UM and UV, M and V, or even U_M and U_V would be an improvement. Don't typeset your display math with the subscripts to the right of the Σ as if it were part of a paragraph; put them underneath. Don't put extra spaces in the middle of your subscripts. If you only have two subscripts (with numeric values, not things like "M" and "V"), it's okay and probably better to make one of them a superscript, especially if you parenthesize it to reduce the chance that people will mistake it for a power. Don't leave out your variable of summation when you write Σyⱼcⱼ or, as you did in another case, write it in upper case. Don't write sets as comma-separated sequences of elements without {} around them.

Alternatively you could write the whole model down in GNU MathProg, which is more verbose but not that much, arguably more precise, and universally understood by people who work in the field in anything other than pure theory. Or Pyomo, which I've never written anything in, but assume can't be that much worse or nobody would use it. Either way it would be easier to use multi-letter names for your variables and parameters if you thought that made things clearer.

I don't think the standard mathematical notation is keeping people away from the field; students who already know basic algebra don't seem to have much trouble with the notation. Figuring out how to make reasonable MILP models or how to diagnose the problem when the solver says your model is infeasible, those seem to take a lot more work.

> Flexport uses the Medium.com platform,

Why? You want to screen people out of your recruiting process that haven't signed up for Medium accounts? Or you want to signal that you're a nontechnical company that can't figure out how to install Wordpress? There are Docker images for that now, you know.


On the last note, I'm guessing the engineers or analysts working on MIP models want to work on that and not also go learn and maintain their own blog site where Medium suffices. The article seems to be free to read without an account. I've read dozens of Medium blog posts over the past few years and learned all about companies without making an account. It is proprietary and closed off like FB and other sites are, but that's still a lot easier than WordPress + Docker woes if that is not part of your core business.


What's neat is the tackling of NP and NP-complete problems that can do better than humans given infinite time and infinite Excel-fu.

I wonder about how to go further to attack NP-hard problems, like Traveling Salesman, because it maybe that P != NP, and existing approximate holistic solutions lead to suboptimal solutions as anyone using Google Maps knows. At what point do we seriously consider letting AI/ML to do algorithm design itself? (Self-programming systems.)


That's pretty cool these guys get to put 1950's math to work in real life. In my experience, I was never able to sell this idea to anyone based on the (potentially justified) excuse that it was completely unmaintainable without a math guy around.

Edit: I can personally attest that potential bootstrappers would find fertile ground making a service to do this stuff for transportation companies. I know of many who basically don't try to solve this problem because they don't have the talent and don't trust their ability to maintain something bought from a consultant. They need a service they can offload it to.


Many of the fundamental algorithms used were developed during the 1980s and 1990s, as were modeling languages like AMPL. So it's not 1950s math, even if the "simplex algorithm" was discovered in 1947.

It occurs to me that such a "service" could to a significant extent pick winners and losers among transportation companies; whoever they chose to plan for would have lower costs by several percent, and so would have profits several times higher than the competition.


Lots of logistics and truck companies use this behind the scenes (source: my friend writes MIP models for a trucking company). Also...many industries have been using LP/MIP models for ages. The entire US grid commits generators and dispatches them with MIP solvers running some MASSIVE models 24/7 (source: I work with that software).


We're seeing here the debate over model-based vs. method-based approaches to optimization. My argument for the model-based approach is reflected in these two sets of slides:

https://ampl.com/MEETINGS/TALKS/2019_09_Bolzano_Model-Based.... https://ampl.com/MEETINGS/TALKS/2018_08_Lille_Tutorial1.pdf

A recording of the first presentation will be available soon. The first uses a simpler example, but both were prepared for audiences not committed to the model-based approach. Their purpose was to familiarize people with model-based optimization and the circumstances in which it has clear advantages.


They did a poor job of explaining what Integer Programming is. Here's the wikipedia intro:

"An integer programming problem is a mathematical optimization or feasibility program in which some or all of the variables are restricted to be integers."

https://en.wikipedia.org/wiki/Integer_programming


Cvx solves a relaxed problem trivially in matlab and then you can heuristically round to integer. http://stanford.edu/class/ee364b/lectures.html Under L1 convex cardinality


If I may quote from the Integer Programming chapter in Vohra's book, "Before proceeding you should convince yourself that no ‘simple’ scheme based on solving the underlying linear program and rounding the resulting solution can find the optimal integer solution."


Depends on the problem (in max-flow, min-cut the trivial rounding scheme gets you the optimal point immediately :).

Kidding aside, generally this is very true, but for a surprising number of practical problems, the LP relaxation and some redundant constraints will often have zero integrality gap. (In many other problems, you’re hosed, so there’s also that.)

EDIT: For context, this is how LDPCs are decoded in robust cases. https://people.eecs.berkeley.edu/~wainwrig/Papers/FelWaiKar0...


I suppose that we could also state that solving with a totally unimodular matrix also absolves us from going through the trouble of going through branch and bound, but that's a pretty niche case.

I guess I react strongly whenever I hear this kind of sentiment because the result of rounding the continuous solution can be arbitrarily bad, but now we have a false sense of confidence that we did some kind of optimization, when we really didn't. My experience has been that rounding continuous solutions to integer gives some pretty interesting and incredibly bad solutions.

For posterity's sake, Wolsey has a good example of how things can go poorly in the introduction of the book Integer Programming:

  max 1 x1 + 0.64 x2
  st  50 x1 + 31 x2 <= 250
      3 x1 - 2 x2 >= -4
      x1, x2 >=0
      x1, x2 integer
The linear programming solution is (376/193,950/193), which is approximately (1.9482,4.9223). The integer optimal solution is (5,0), which is far away.

I'll also contend that the integer programming solvers nowadays are really good. If we can get away with rounding the solution, then the the solver will find the solution in only a few iterations because it often does precisely that in order to go through the branch, bound, and cut algorithm. The difference is that a good solver can get a certificate of optimality when it finds the solution, so that we know we're right.


Is a specialized MIP solver mandatory, or would using a general SMT solver like Z3 usually be good enough?


In my experience SMT solvers tends to be an order of magnitude slower than MIP solvers, even when looking at the exact same constraints. But yes given a small enough problem SMT solvers like Z3 are an excellent choice and provide additional functionality like propositional logic (if A then B else C).


Specialized is necessary due to business requirements.

User requests aren't always "here's a problem + the data, solve this to optimal". Here are other examples:

1. "here's the problem + data, solve it well enough within 10s" 2. "here's the problem + data, solve it well enough with a subset of the data" 3. "here's the problem + data, solve it partially, and then when near real-time updates come in, update solution to take new information into account"


Oh lucky you, your optimization problem is linear. I do something similar, but we have to use heuristics because our problem can't be modelled as a linear problem.


See, that's what I thought before I audited an operations research class, but it turns out that the reduction from SAT to ILP is trivial, so there are no problems in NP that can't be modeled as integer linear optimization problems.


Yeah, but the problem is that the reduction might be large in practice (in the sense that reducing the original problem to SAT or MIP might require the introduction of a large number of variables).

While theoretically polynomially reducible, if the rewriting multiplies the number of variables by 1000 then it’s likely not good in practice.


I don't have experience reducing problems to SAT but I think the usual inflation factor is a lot less than 1000. I think the trivial reduction I found from SAT to ILP uses 3 ILP variables per SAT variable, but there might be a better one, and I might be misremembering; it might be 4 or something.


Sure, the SAT -> ILP is small but the reduction to SAT—or MILP directly—can sometimes (but not often) be large.

Either way, I agree that there is likely some way in which the original commenter’s problem can be solved via MILP/MICP methods (indeed, I’ve found very few problems in practice that cannot be easily reduced) :)


I’d be curious to hear more... often a lot of nonlinear problems can be turned into mixed integer cone programs (MICPs) which are also often fast to approximately or exactly solve in practice.


What happens if/when shipments miss their ready times?


Presumably there are contractual penalties associated with missing delivery times, that then map to the cost of doing things. In the real world if you can miss the deadline on one delivery, and that allows everything else to work much more simply and cheaply then that's probably the right call.


Typically you model this kind of thing in linear optimization as a hard constraint rather than a penalty, but you can model it as a penalty in mixed integer linear programming. It costs you two extra decision variables, one of which is binary.


^ yes, penalties in the solver. Above post is correct.

IRL they just get rolled and we treat it as a new problem with a different set of schedules.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: