
Show HN: My Python Solver for the On-Time Arrival Problem in Traffic Congestion - mehrdadn
https://github.com/mehrdadn/SOTA-Py/tree/master
======
n4r9
I'm very interested in this. Thanks for sharing.

Having coded up a semisolid implementation of time-dependent contraction
hierarchies (which might not even get used, because of the very same data
accessibility issues you outline), it seems at first glance quite
straightforward to adapt CH preprocessing to this model. To contract a vertex
you just look at whether each potential witness path has a _nonzero
probability_ of being a shortcut. I suppose you could cut preprocessing time
down by replacing "nonzero probability" with "probability greater than
epsilon" to get a sort of approximate hierarchy.

You refer to the edge weights as an "hmm", which I guess stands for hidden
markov model. It doesn't seem like they use that term in the paper, although
if you made the data time-dependent would edge weights then be a genuine
markov process? Would something like the Viterbi algorithm become a useful
subprocess of the overall algorithm in that case?

The current big open source routing engines tend to be written in c++ or java.
Do you think using python makes a significant difference in query or
preprocessing times? *

*Edit: Just noticed the Numba dependency, which probably answers this question.

~~~
mehrdadn
Glad to hear from someone who's worked on routing!

Yes, it's quite possible that CH can be adapted to this -- I haven't had a
chance to attempt it -- although it's not obvious to me if it would help with
the preprocessing time.

I have not been very interested in approximations myself, since coming up with
reasonable approximations seems rather obvious/straightforward, and yet
producing any bounds on the error seems to be quite the opposite (at least for
me)! That said, it's only a matter of time before someone does a comprehensive
empirical study on that. :-)

The HMM was actually beyond the scope of this project, which is why you don't
see any mentions of it. (The reference to check out regarding the path HMM
inference would be #23 on Springer.) Its only significance is that it was the
model of the dataset we used, but as far as we were concerned, we merely had a
GMM.

Regarding time-dependent travel times, I'm not actually sure what the best
model for that is (not just from a modeling perspective or data gathering, but
also storage constraints and ease of inference); it's something I've been
thinking of looking into, and I think others are as well. It does seem
reasonable that a Markov model might work, since the travel time on a given
edge at a subsequent time step only depends on the travel times on that edge
and/or on incoming edges at a previous time step. However, I don't immediately
see a reason for it to be Markov in solely the _time_ dimension, which (if
it's not) means that HMM algorithms such as Viterbi might not really be
meaningful here (?), since the Markov properties I can see require a space
dimension as well. It seems like a more general (but sparse & mostly-planar)
Bayesian network to me. I could be wrong about this though.

Regarding Python vs. C++/Java -- it does make a significant difference, but
not outrageously. I'm guessing you didn't see this part on the GitHub page,
but it actually turns out this code is "only" 3× slower than the C++
equivalent used for the paper. (Note that this comparison is after a fairly
heavy amount of optimization on both the C++ and Python codebases. Also note
that the Python optimization involved far more than using Numba, although that
was a notable part of it.) I think it makes a bigger difference in query times
rather than preprocessing times, since the latter is far easier to parallelize
(meaning you can just make it faster by throwing more hardware at it rather
than by rewriting the code). But the overall problem is so computationally
intensive that the main problem isn't the implementation language, it's coming
up with an appropriate algorithm so that you don't need an astronomical number
of CPU-days.

Thanks for the great questions!

~~~
n4r9
Thanks for the answers. I see what you're saying about the Markov properties -
certainly something to think about.

> I haven't had a chance to attempt it -- although it's not obvious to me if
> it would help with the preprocessing time.

My initial research suggested that CH has much faster query and preprocessing
times (and lower memory overhead) than arc-flags, which is why I ended up
implementing it. Plus you can even combine CH with arc-flags or ALT to get
pointlessly good speed. However I don't know how much faster this "arc-
potentials" is than arc-flags.

Using "approximate" contraction hierarchies doesn't necessarily lead to
inexact results! There's a really interesting paper by the Karlsruhe guys [0]
which discusses how you can set up a contraction hierarchy where the shortcuts
only store their minimum and maximum travel times. At query-time this allows
you to quickly generate a "time-profile corridor" between points A and B,
which is a subgraph containing only those vertices which lie on a shortest
path from A to B at _some_ departure time. Then you only need to do a quick
time-dependent dijkstra on the corridor ;]

0\.
[http://dl.acm.org/citation.cfm?id=2444020](http://dl.acm.org/citation.cfm?id=2444020)

~~~
karussell
Do you have any numbers about how much faster CH+ALT was compared to pure CH?

~~~
n4r9
Ah, good question, no I haven't. I'm basing that comment on a vague memory of
a graph showing query time vs preprocessing time for lots of algorithms.
Looking back at the graph it seems ch with arc flags (chase) slightly improves
query time, but for alt they only have results on non fully contracted
networks (core-alt). Probably a memory issue.

Sorry if I misled - I haven't actually tried combining ch with alt as it
turned out to be more than enough on its own. My point was just that ch is
reported to synergise well with other methods, which makes it likely to be a
good investment.

~~~
karussell
Interesting, we (GraphHopper) tested CH+ALT and saw also no improvements, also
for CH with normal A*, although it is reported in papers that it should be
faster. The idea about non-fully contracted networks is really nice and could
be worth a try - thanks!

~~~
n4r9
Wow, that seems counter-intuitive. I suppose it must be due to the computation
of the heuristic, combined with how few relaxations are needed in CH.

GraphHopper is really nice work by the way. I believe you are still working on
implementing stall-on-demand? Will be very interesting to see how much
difference that makes on long-distance queries.

~~~
karussell
I guess it is the overhead of the heuristic that is too much compared with the
saved visited nodes, but this is counter-intuitive for me as well and if I
have more time will investigate this again.

> GraphHopper is really nice work by the way.

Thanks!

> I believe you are still working on implementing stall-on-demand?

I've not yet digged into it but others have with less success but would be
very interesting:
[https://github.com/graphhopper/graphhopper/issues/240#issuec...](https://github.com/graphhopper/graphhopper/issues/240#issuecomment-297787098)

------
jamesrom
The demo looks really cool. Do any of the major navigation apps utilize these
algorithms?

~~~
mehrdadn
Thank you!

I don't believe _any_ apps utilize these algorithms, and for good reason. The
main problem is that SOTA is a very computationally intensive problem. On the
example shown, a plain Dijkstra takes a fraction of a second (you can imagine
A* is faster, and over time these have been sped up by a factor of millions,
IIRC), whereas SOTA takes dozens of seconds to run. The difference increases
quickly for larger networks and time budgets.

If you check out the paper I linked to, you might get a better idea of just
how computationally intensive this process is. If I recall correctly,
classical point-to-point pathfinding algorithms have even advanced to the
stage that, with some reasonable (albeit clever) preprocessing, queries on
_very_ large networks can be done in a matter of _milliseconds_ on mobile CPU
power. Compare that to this algorithm, which takes dozens of seconds to run on
a network the size of San Francisco, or at _best_ in dozens or hundreds of
milliseconds after _practically prohibitive_ preprocesssing (e.g. 30 CPU-days
and tens or hundreds of gigabytes are pretty normal) on a Core i7 CPU. It's
just not scalable yet, which is why we've been working on it as a research
problem (as have many other people).

Edit: There are lots of obstacles; the above was just one. Another one is that
even the existing model assumes the same travel time distribution throughout
the day, which is clearly unrealistic (you can imagine fixing this would blow
up the running time even more). Yet another one is the problem of getting
enough traffic data to construct useful travel time distributions, which means
only a major company with existing data (like Google) could release an app
that actually initially works with real data. Barring some kind of contract
with a big company, everyone else would have to just release an app to gather
data, and it would only become useful after a long time (if people are even
willing to give away precise location data to random app creators).

------
karussell
Thanks for this!

Did you measure memory usage for e.g. San Francisco or Berlin?

Why did you consider this non-commercial license? Especially since you say 'I
don't plan on actively maintaining it'

~~~
mehrdadn
I didn't specifically measure memory usage, however it's on the order of a few
hundred megabytes per query (600 MB on the particular example I'm looking at).

Regarding the license and maintenance, it's what seemed the most suitable for
me given the effort I'd put in and what the code actually did. But the idea
was that if it's unsuitable for you, you can certainly contact me and make a
request for them to be different, and I can try to license it differently on
an individual basis.

~~~
karussell
Is the main RAM usage from the graph or is the algorithm RAM intense?

I'm asking as other open source projects (like ours, see my profile) might not
look in your code just because of the license. And yes, that is the tricky
thing about open source: accept that other people make money with it and try
to do this too ;)

~~~
mehrdadn
> Is the main RAM usage from the graph or is the algorithm RAM intense?

The graph itself is negligible, unless you want to store lots of auxiliary
data (street addresses and whatnot), but then those are already on the disk,
and you don't need them in memory for this problem. The traffic data is
somewhat bigger, e.g. a few hundred float64's per edge, so like 100 MB for |E|
= 40k edges would be normal (no pun intended) but that's (a) just something
that depends on the sizes of your distributions and not the algorithm, and (b)
something that you can discard and re-generate on the fly from the continuous-
time distribution parameters, at a speed cost.

The main memory usage is for storing the policy solution itself. For a
discrete budget T (say, 5k steps) and |V| vertices (say, 10k) you need
approximately T * |V| float64's just to store the policy solution (e.g. 200
MB). And if you want the convolutions to be done in guaranteed quasilinear
time, you can pay a penalty of T * |E|, which would be e.g. 800 MB (4 roads
per intersection). This is not necessarily necessary though, it just depends
on the problem parameters. Sometimes naive convolution is faster too.
(Actually, I just lied a bit, because for every edge you don't need a vector
of size T, but a smaller vector of size (T -
min_travel_time_to_destination_from_edge_source), so what I gave you is
somewhat of an overestimate. But that's the general ballpark.)

For computing the path, an algorithm that doesn't need to pre-compute a policy
may (or may not) use less memory, so for that problem, the policy's memory
usage can be ascribed to the algorithm. However, I don't really expect the
memory usage of a different algorithm for the _same exact problem_ to be
radically different. (You can of course make approximations, etc. but then
it's not the same problem or necessarily the same solution.)

> I'm asking as other open source projects (like ours, see my profile) might
> not look in your code just because of the license. And yes, that is the
> tricky thing about open source: accept that other people make money with it
> and try to do this too ;)

(Edit: I edited this comment since I realized I initially misunderstood your
licensing model.)

I think a few key difference between our projects (and therefore licensing
models) are:

1\. Just how much more computationally intensive this problem is compared to
standard routing. For me to adopt your model of providing a paid service while
making the code free for commercial use, I would not only have to spend a
significant amount of time writing service APIs for this (I have other things
to do than to start this business!), but I'd have to invest in _quite_ a large
cluster of servers -- 1000 CPUs _per metropolitan road network_ seems to be
the minimum for this problem to get decent query times. (Again, note that a
reasonable query on a sub-portion of San Francisco takes 15 seconds, and this
is already making the unrealistic assumption that traffic is constant
throughout the day.) Both of those just seem unrealistic for me, so I'm just
saying if someone else actually thinks such a business would be profitable and
would like to take up those tasks, they can request a commercial license.

I'm not sure anyone _should_ use this commercially, though -- see the next
point.

2\. I simply have _not designed_ or even _intended_ this project for
production usage, whether paid or free... and the license is meant to
emphasize that. It's for researchers, meaning it's meant to be short & easy to
tweak. It's not particularly well-written code, there is no clear API at all,
there are many "engineering" tricks/approximations that would need to be put
in for actual real-world usage, etc... and to be honest, I _actually don 't
want_ poorly-written code to go into production use! I think that is one of
the things that is actually harming us in the modern era. So I would rather
that anyone who wants to use the algorithm just rewrites their own code in a
better way for production usage, and not even use my code. Honestly, they
would probably _have_ to understand the code well enough to be able to rewrite
it in a breeze if they want to modify it enough to make it useful.

So, basically, the commercial-license clause was the "else" clause in the
license. :-) Like I wrote there, I didn't necessarily think this clause should
ever _be_ hit. Chances are good that even if someone contacts me, they'll
change their mind and decide it's outright _easier_ in anything but the very-
short-term for them to just write their own code based on the paper (and that
this would be more profitable would just be icing on the cake). But in case
for some reason they don't think that, I didn't want to leave any doors
closed, so I included that option.

------
kevmo
I didn't realize GitHub had added a topic tags feature to repos. I hope it
ends up being useful!

~~~
mehrdadn
Yeah, I'm thinking maybe I went a little overboard with that. :-)

~~~
kevmo
To the early adopter go the spoils.

~~~
foota
The early bird gets the wannacry?

------
sandGorgon
Can you look into adding numba support to accelerate the computation?

~~~
Xophmeister

        ### Dependencies
    
        - NumPy is the only hard external dependency.  
        - Numba, if available, is used for compiling Python to native code (≈ 3× speedup).  
        - PyGLET, if available, is used for rendering the results on the screen via OpenGL.
        - SciPy, if available, is used for slightly faster FFTs.  
    

It's right there in the README

~~~
sandGorgon
Hmmm..Thanks for this. I looked at the code and must have missed the
decorators.

