
Better bus predictions (a lot better) - kylebarron
https://medium.com/mbta-tech/better-bus-predictions-a-lot-better-64169f1edeee
======
ajuc
In my city there is a system like this (all buses have gps, you can look where
they are at any time
[http://www.sip.ztm.lublin.eu/Default.aspx?lang=EN](http://www.sip.ztm.lublin.eu/Default.aspx?lang=EN)
), and on bus stops there are these dynamic timetables
[http://rpw.ztm.lublin.eu/typo3temp/_processed_/csm_01_01_594...](http://rpw.ztm.lublin.eu/typo3temp/_processed_/csm_01_01_594179becd.jpg)

It's working pretty well, the key thing is the point 4 in the article - the
timetables show "in X min" if they know how long it will take for the bus to
get here, or just "at XX:XX" if they don't know for some reason (transmitter
or bus broke, whatever) so they just show the scheduled arrival.

So you can be pretty sure that if it says it's coming in 4 minutes you won't
be waiting 30 minutes.

Clearly marking how sure you are of your prediction is IMHO more important
than making very good predictions. I don't care if I have to wait 3 minutes or
6 minutes, I do care if I skip another bus waiting for a more convenient one
in 3 minutes that never arrives.

~~~
prolikewh0a
Seattle has GPS on most busses that integrates into apps like Transit. Transit
is also crowdsourced if people leave it running when they're on the bus. Some
bus stops also have screen timetables that update live information regarding
arrival time.

I know I have to leave my condo when Transit says 3 minutes and it has the
"this bus is reporting" logo.

------
kuanbutts
This is awesome! I fully believe that government procurement needs serious
reform, which this article and effort is clearly attempting to address. So,
props to David Block-Schachter (the CTO).

There are far too many large, bloated consultancies that specialize not in
delivering quality products and services, but rather in "surviving" the
government procurement process.

Props as well to the Swiftly team - I had a change to take a peak at some of
the APIs they expose to their customers and it's quite valuable. In
particular, they roll up stop-pair segment performance on routes by time of
day, which allows someone to query for bus performance by discrete route-
schedule-segments.

Gathering this type of data is quite labor intensive and a significant
technical lift (I was once part of a project doing this with GTFS-RT data from
the NYC MTA). This type of information, and the broader ecosystem of
performance related API services they provide to their users (based off the
limited amount I have seen), can enable operators to extract highly
articulated performance statistics about their fleet, on their own.

------
edejong
Having developed similar systems in the past, I would like to advice against
arbitrary boundaries on accuracy.

We used a couple of metrics to assess the quality of the predictions. The
first one is sMAPE (Scaled Mean Absolute Precision Error), which tells us
quickly where and how our predictions fail. We plot this with the precision on
the y-axis and the minutes till arrival on x. Also plotted is bias, which is
important, especially since you want to be slightly biased towards being too
early. Similar axes.

Other metrics we used are MAE (Mean Absolute Error), RSE (Root Square Error)
and 2D kernel density plots.

In the end, for a contract, I would take into account the density of use of
these predictions. It's nice that you can predict perfectly during the middle
of the night, but if that line is not used, it is next to useless. So
something like sMAPE * passengers or sth like that.

Also, even though I cannot relay this one-to-one to the passengers, a
confidence interval on predictions is gold.

------
evmar
In case anyone else was wondering, it appears that MBTA here refers to a
Boston transit agency.

~~~
benrapscallion
That’s correct. It has the reputation of being one of the worst commuter
systems in existence.
[http://www.bostonherald.com/news/local_coverage/2017/10/data...](http://www.bostonherald.com/news/local_coverage/2017/10/data_mbta_service_worst_among_carriers)

------
tzs
I just skimmed the article, so apologies if this was covered.

The article suggests people will use these predictions to spend more time at
home before leaving their bus stop. If that is the case, what happens if the
prediction says their bus will be, say, 10 minutes later, but then that bus
catches some lucky breaks with lights, heavy traffic unexpectedly clears up,
etc., and the bus makes up 5 minutes of that?

People who relied on the prediction could miss their bus.

Is there a mechanism to address this?

The only one that I can think of offhand that would cover almost all cases
would be if the predicted arrival times are also sent to the buses, and if the
buses arrive ahead of the prediction they wait until the predicted time before
leaving.

You wouldn't want to do that for all predictions, though. Maybe only
predictions made within 10 minutes of the original scheduled arrival time are
binding, or maybe only predictions made when the bus is within three stops of
the stop in questions, or something like that.

~~~
why_only_15
It explicitly talks about this in the article where he talks about how they
try to predict the bus will arrive earlier rather than later. In the 12-30
minute bucket, "12% of weekday predictions were up to 4 minutes early, 71%
were up to 6 minutes late, and 17% were outside both of those windows".

~~~
tzs
So they are weighting it to under-predict lateness, which will reduce the
chances someone misses the bus by arriving at the stop at the predicted time,
it seems.

------
bsder
Tell me where the bus _IS_ , dammit. Why is this rolling out _after_
prediction instead of _before_?

Why are we "predicting" at all?

Why doesn't every bus have a tracker that tells you exactly where it is _at
all times_?

This is 2018. GPS with refinement isn't rocket science anymore.

Tell me where the bus is and I'll do my own prediction thanks.

~~~
katehedgpeth
I think you may be presuming that every UI that consumes this data is able to
display a map showing the bus’s location. We do make real-time bus lat/lng
available through our API and lots of clients use that data. But there are
other applications where a time prediction is still helpful. For example in
many of our stations that serve bus routes, we don’t have the infrastructure
(yet) to display a live map but we do have digital signs that tell when the
next bus is expected to arrive.

~~~
ramshorns
A map isn't necessary to show a location. The digital sign could just display
the name of the last intersection the bus was at. (I'm not convinced that's
any better than a reasonably good prediction though.)

~~~
bobthepanda
Each approach has its own tradeoffs.

\- Time. Time has the advantage of brevity ("15 min" takes up a grand total of
five characters) and indisputable meaning. It's also the hardest to predict,
and people get pretty upset about incorrect predictions.

\- Intersection. On a signboard, intersection names can get quite long. You
would also need familiarity with where exactly the intersection is located,
how the bus gets from there to your stop, etc. Boston is famously non-gridded,
and has no real consistent rhyme or reason to its street layout, so that would
be very confusing.

\- Miles/stops away. New York's BusTime uses this. This is a pretty
indisputable metric, and if you have GPS or a working odometer is trivial to
confirm. But this runs into the same problem as intersections; you need
context. The bus is two stops away, but how long does it take to make those
two stops? A mile passing through Times Square is very different from a mile
through the leafy, quiet suburbs of Queens.

You can pick between meaning and certainty, but it's very hard to have both.

------
romed
Awesome. I hope SFMTA can learn from this. They are taking proposals to
replace Nextbus currently [1]. When I last measured Nextbus reliability, the
first prediction for a bus was weakly correlated with arrival, and the second
and third predictions were totally free of information. One of the main
problems around here seems to be buses with the route and destination sign set
to the wrong thing (e.g. the bus says it is headed to downtown, it's actually
going the other way) which apparently throws a huge wrench in Nextbus.

1: [https://www.sfchronicle.com/bayarea/article/Muni-looks-to-
re...](https://www.sfchronicle.com/bayarea/article/Muni-looks-to-replace-
aging-NextBus-system-with-13204933.php)

------
BugsJustFindMe
Dear MBTA CTO, why isn't this on the MBTA website instead of Medium?

~~~
vcryan
I work as a software for the MBTA but don't speak for the organization.

There are pages on the main site about this project, for example:
[https://www.mbta.com/projects/better-bus-
project](https://www.mbta.com/projects/better-bus-project)

I think the Medium blog was set up for posts intended for a more tech-savvy
audience than the broader audience that visits MBTA.com to check schedules,
plan trips, view alerts, etc. Maybe at some point it will be merged in.

The website was relaunched ~year ago as an Elixir / Phoenix application and
there are still quite a few features and content being added.

------
cozzyd
In theory, one should be able to use information about the stop lights to
improve the prediction.

Anyway, I really would like to see a histogram of the predicted distribution
of arrival times. Sometimes I 1-sigma need to make the bus and sometimes I
3-sigma need to make the bus...

~~~
crote
Even better, give buses priority at stop lights when they are behind schedule,
or intentionally delay them when they are ahead of schedule. This is being
done in a few countries and it seems to work quite well. Why predict when you
can control it?

------
vcryan
My apologies for being such an opportunist, BUT, we are hiring for various
engineering and product roles at the moment:
[https://jobs.lever.co/mbta](https://jobs.lever.co/mbta)

~~~
cypherpunks01
Just curious if you're finding it challenging to hire for Erlang? Have you
generally hired people with some experience, or do you typically expect new
devs to pick it up upon hiring?

~~~
katehedgpeth
I also work in the customer tech department for the MBTA. We use Elixir and
all of our web interfaces use Phoenix on top of that. We don’t require anyone
to have experience with elixir or erlang; most of our developers have learned
on the job. We look for solid general programming skills and a willingness to
learn. We’ve had a lot of success with that strategy so far!

