

Analysis of expected number of recurring encounters with strangers on transit - ant6n
https://medium.com/@transitapp/montreal-s-familiar-strangers-6b224f3cc9c3?=a6

======
ant6n
Disclaimer: I made the underlying analysis (but did not write the text).
Usually I do coding for Transit App, but I was given this analysis to do for
sort of fun.

I was given a large database of (userId, timestamp, busNumber), and a scrape
of the website with the schedules and and the task to figure out how often
strangers would keep meeting each other on transit, as a sort of global missed
encounters search.

It's basically a big exercise of combining probability distributions. It’s
impossible to infer what stop people get on because there are no vehicle ids;
the only possible way to infer anything is to look at transfers (i.e. the time
the user enters another bus, or the metro). The time is the time of the tap.

I figured out whether people do the same trip at around the same time of day
on multiple days during the observation period, then record that as a
‘recurring trip’. I assume that a recurring trip is always done from and to
the same approximate stops. From the schedules, I know how many buses are
operating at any given time. For the rest it’s all probabilities

a) a probability distribution for how long people spend on different buses.
This is calculated based on the transfers, i.e. when people tap into the next
bus or metro station.

b) a probability distribution of which bus a recurring trip may fall on. It's
based on the time distribution of the recurring trips; and the headways
waiting passengers are expected to encounter. If there’s few buses an hour,
and the recurring trips happen at the same time, a user will use each of few
buses with high probability.

c) the probability distribution of a passenger being on a certain (but
unknown) bus at a certain (known) time. If it’s more than 20%, it counts as
once per week encounter, if it’s 40% it counts as twice a week encounter, etc.

d) assuming you take the same bus every day, the probability that another
passenger is on that bus at the same time. This depends on the number of buses
operating.

From then on it’s a matter of summing up the probabilities to get the number
of expected strangers with n=1,2,3,4 or 5 encounters per week; assuming you
take the same bus every day. There’s a little bit of fuzzing going on, but not
too much.

EDIT: added how the analysis was done.

