
The Waiting Time Paradox, Or, Why Is My Bus Always Late? - dragly
http://jakevdp.github.io/blog/2018/09/13/waiting-time-paradox/
======
herodotus
Nice article. It reminds me of my year living in London, and taking the bus
everyday to Imperial College from West End Lane in West Hampstead. There was a
stop on both sides of the road - one for the outbound bus, and one for the
inbound (the bus went from central London to a terminus and then returned
mostly on the same route). Now we did not use schedules - way too inaccurate
at rush hour, and the busses there were pretty frequent anyway. But we did
expect an even chance of the inbound bus arriving before an outbound one did.
My daughter and I became convinced after a while that this was not happening,
so we invented a game (which we called "The Game of Life".) When our bus
(inbound) arrived first, we added 1 to our score. We subtracted 1 for every
outbound bus that passed before ours arrived (there were often more than 1).
We realized that the result would be slightly skewed to the negative, but we
expected the outcome to be close to 0 over time. Of course it was not. Anyway
we extended the game to many statistical situations. For example, you go to
the checkout line at the supermarket, and there are N people in front of you.
When you get to the front of the line, you count the people behind you - call
that M. If M is bigger than N, you scored life points. If it is smaller, you
lost some. So you add M-N to your running score, and you get an idea of how
lucky you are in life. However, I never followed up with any real analysis, so
I enjoyed this article.

~~~
bostonpete
Your use of "of course" seems to imply that there's some statistical reason
that the probability of the next bus being inbound vs outbound wouldn't be
equal. Is there? If so, it seems like it must be a different reason than the
one in the article. What am I missing...?

~~~
herodotus
Because we might score -1, -2 or worse if 2 or 3 busses went in the other
direction before ours came, but if ours came first, we score 1. We get on the
bus and thus don’t know if another one or more arrives first on our side.

~~~
repsilat
This reminds me of a mathematical paradox that makes me doubt your conclusion:
"In this country, every couple wants to have one daughter. They keep having
children until they have a daughter, and then they stop. What gender balance
should we expect?"

Couples can have any number of sons, and every couple has exactly one
daughter. Still, the accepted mathematical solution is an equal gender ratio
for the couples' children.

~~~
Terr_
I think the "paradox" comes from how people implicitly assume "any number of
sons" is somehow distributed or weighted in a way that favors towards numbers
of 1 or above.

In contrast, "0 sons" is going to describe a full half of all marriages.

~~~
jon_richards
Same situation with the bus.

~~~
pwagland
Not really. In the son/daughter case, the calculations are: expected
daughters: 1 expected sons: 1/2 _0 + 1 /4_1 + 1/8 _2 + 1 /16_3 + 1/32 _4 + …

So number of expected daughters = 1, number of expected sons = 1. In practice
since women can't have an infinite number of children, then this wouldn't be
an infinite series, so the real number of expected boys would be lower than
one, but there you go…

Now, for the bus case, you get +1 if your bus turns up first, and -1 for every
other bus that turns up first. Assume that it is completely random, then:
expected + score is: 1/2 _ 1 expected - score is: 1/2 * -1 + 1/4 * -2 + …

Expected + is 0.5, expected - is -1.

~~~
repsilat
I guess it would be balanced if the rule was,

> _1 when your bus turns up, -1 for every bus going the other way._

~~~
selestify
Isn't that the same as what the OP's rules were?

------
ikken
This reminds me the bet in the bitcoin community [1]. If on average bitcoin
blocks are produced every 10 minutes, and you learn that 5 minutes ago someone
found a block, what is the average time you will wait for the next block? It
turns out it's 10 minutes, not 5 minutes as you would intuitively think. (it's
a memoryless process, so average expected time till block is always the same -
10 minutes - no matter how many blocks were recently found).

In other words, when you're waiting for bitcoin transaction to be confirmed
and go to check how long ago the most recent block was produced, in order to
estimate how soon the next one will come - you're doing it wrong. Even if
previous block was found 9 minutes ago, you're average waiting time for the
next block is still 10 minutes.

[1].
[https://www.reddit.com/r/btc/comments/7rs8ko/dr_craig_s_wrig...](https://www.reddit.com/r/btc/comments/7rs8ko/dr_craig_s_wright_has_refused_to_pay_up_on_a_bet/)

~~~
ikeboy
This is actually wrong. The average expected time till next block is almost
never 10 minutes because hashpower goes on and offline all the time. It gets
adjusted every 2016 blocks based on historical block timing so that if no
changes occur then future blocks would be 10 minutes on average - but changes
always happen so this is never accurate. As such, you do learn something by
looking at prior block times.

~~~
fjsolwmv
What you say is not a rebuttal of the parent comment. Parent explicitly said
that average block times are 10mins in the assumption. The most recent block
time doesn't change that.

~~~
ikeboy
Parent is talking about bitcoin, where that is false. If they are assuming the
average, then they're assuming something false.

~~~
8note
the important thing is that it doesn't matter what the parent assumed.

whether the actual time is 10 minutes or 100 years, knowing that somebody else
solved one recently doesn't speed up your time to find one

~~~
ikeboy
Of course it doesn't speed up your own time, since you have perfect
information about your own hashpower. But it does tell you information about
the total hashpower that's online, statistically.

I'll give an extreme example to make this clearer. Suppose 10X hashpower just
came online an hour ago. It's quite likely that ~60 blocks have been found in
the last hour, assuming the difficulty adjustment hasn't happened since.
Seeing this, one could deduce that hashpower went up by ~10 and that the
expected time till next block is roughly 1 minute instead of 10.

Now, in most cases hashpower doesn't change that drastically but it remains
true that recent block times give you more than 0 information about hashpower
and therefore about the expectation for future block times.

------
agumonkey
Slightly related, my ghost town had few buses and sparses. I could never rely
on printed hours. If I got there 10 min earlier to be sure, I'd still never be
sure I'd wait 20 min for nothing because it was 11 min early. Of course half
the time if I decide to walk to the next town where buses are many, I'd see
all my town buses (both ways) pass me <yell-at-cloud.png>

I think it made me completely careless about time, I would just go between
stops and take the first one, go with the flow. By experience I'd know the
range it would take for me to reach big places around the area.

I had a friend who was completely foreign to this mode of thinking, she was
very dilligent and fully trusting (although she mostly used trains so a lot
less divergence).

It reminds me of kid studies about intelligence / wealth ratios. When you're
environment is random, you think random. When it's predictable you planify.

------
twtw
> a Poisson process is a memoryless process that assumes the probability of an
> arrival is entirely independent of the time since the previous arrival. In
> reality, a well-run bus system will have schedules deliberately structured
> to avoid this kind of behavior: buses don't begin their routes at random
> times throughout the day, but rather begin their routes on a schedule chosen
> to best serve the transit-riding public.

I've never really understood any example involving a poisson process. They
always seem to involve bus arrivals or light bulbs burning out, and I can't
understand why the memory less property would ever make any sense for these.

Even if the bus system was poorly run, why would it make sense to assume that
the expected value of time to arrival doesn't change based on how long you've
been waiting?

What is an actual phenomenon that is well modeled by a poisson process?

~~~
lordnacho
> What is an actual phenomenon that is well modeled by a poisson process?

Time to next Bitcoin block mined. It's 10 minutes, regardless if whether
you've waited 1 hour, 10 minutes, or 10 seconds.

Makes sense though, because all the failed hashes are useless, thus no memory.

~~~
QML
To add on, there was this article posted months ago on the same topic:

Why Is It Taking 20 Minutes to Mine This Bitcoin Block?

[https://news.ycombinator.com/item?id=16469382](https://news.ycombinator.com/item?id=16469382)

------
taeric
Highly recommend reading this to any folks that are just sitting the
discussions.

The simulations were worth the article on their own. The real world analysis
was a great bonus.

Anecdotally, i was expecting confirmation bias to be the main culprit.
Pleasantly surprised to seei was wrong.

------
edoo
Hah great analysis. One factor with bus' is the schedule is likely planned to
minimize early arrivals at the risk of being late more often. Usually when a
bus is early it has to sit and wait until its departure time. A late running
bus can be more efficient, and if kept until departure time might not ever get
a chance to average down the bursts of lateness.

~~~
stormbrew
I don't think this is always true. My city measures buses as being on time at
inner stops as being between something like 5 minutes early and 1 minute late
(yes, you read that right). Timing stops, where they have to wait, are pretty
infrequent (mostly bus terminals).

I don't know how common it is but it does exist. And buses perpetually being
early means that if you're on time you wait even longer for the next one.

------
stephengillie
OneBusAway is surprisingly accurate, at least in my experience. Google Maps
has very good transit support too.

One reason buses are late is because a bus must travel a circuit. Cars provide
linear transportation, so the delay can only happen in the direction of your
travel. Since buses run a circuit, they are impacted by delays in the
direction opposite of your travel as well.

Your bus might be late because the return route has traffic or other delays.
Or maybe a drunk or drug user got in a fight with the driver and the police
were needed. Or someone in a wheelchair had a problem getting onto the lift.

~~~
QML
Why is modeling required? Can’t we just put a sensor on every bus, and just
return the empirical expected time it takes for the next bus to drive to your
station given the time of day and day of week?

~~~
wiml
That's how it works (at least the OneBusAway feed in the Seattle area). They
started in the 90s with an RFID transponder on bus stops, read by the bus as
it went past; more recently they use things like odometry or GPS to feed
information into the system.

But:

> just return the empirical expected time it takes for the next bus

There is a world of complexity in "the empirical expected time", there...
expected according to what models?

Anecdotally, I think it's especially hard to model because any given delay is
probably attributable to one or a few specific incidents. This isn't a
situation where everything averages out and we can use a nice tractable AWGN
model; we're down in the muck and the shot-noise.

------
gwern
The memorylessness of the Poisson process makes the statistical aspect a bit
trivial. But here's an interesting variant: how should you update your beliefs
while waiting if there is a certain probability that the bus won't come at
all? "The Ups and the Downs of the Hope Function in a Fruitless Search", Falk
et al 1994:
[https://www.gwern.net/docs/statistics/bayes/1994-falk](https://www.gwern.net/docs/statistics/bayes/1994-falk)

------
jobigoud
I've encountered the inspection paradox in debates about factory farming and
people talking past each other points.

If you take the average farm, chances are that it's doing humane farming. But
if you take the average animal, it has an overwhelming chance of being in an
industrial farm.

~~~
hmmmmmh
Just like if you pick an average human being she probably is poor and
black/indian. But average GDP per capita is pretty high worldwide.

------
stornetn
Reminds me of a similar article that measured a similar kind of question about
the wait times for NYC subways conditional on how long you've been waiting
([https://erikbern.com/2016/04/04/nyc-subway-
math.html](https://erikbern.com/2016/04/04/nyc-subway-math.html)). I think
it's a pretty safe bet that people who like this post will like this article
as well.

------
ChrisFoster
It strikes me that even with a perfectly regular starting schedule, buses
might clump together in time because the schedule is probably dynamically
unstable. To explain, picking up passengers from a stop costs time and a long
time between buses implies a high probability that passengers will be waiting
at a given stop. This further adding to the delay and shortens the time to the
next bus in the schedule.

I'm sure drivers try to actively manage this, but if they didn't I suspect the
system would naturally evolve toward pairs of buses leapfrogging each other on
long routes.

~~~
jobigoud
I think another confusing factor about that specific example is that bus
shouldn't ever start _before_ their schedule. Otherwise you run the risk of a
bunch of people missing their bus even though they showed up on time. I think
bus, trains and planes can only be late.

For example this is an article about a Japanese Train company issuing a public
apology for departing 20 seconds early. [https://www.bbc.com/news/world-
asia-42009839](https://www.bbc.com/news/world-asia-42009839)

------
MaxBarraclough
This Wikipedia article seems relevant:
[https://en.wikipedia.org/wiki/Residual_time](https://en.wikipedia.org/wiki/Residual_time)

(From reddit -
[https://www.reddit.com/r/programming/comments/9s4j58/the_wai...](https://www.reddit.com/r/programming/comments/9s4j58/the_waiting_time_paradox_or_why_is_my_bus_always/e8ntcng/)
)

------
varlock
Can't believe no one has yet mentioned the PASTA theorem - Poisson Arrivals
See Time Averages
([https://en.wikipedia.org/wiki/Arrival_theorem#Theorem_for_ar...](https://en.wikipedia.org/wiki/Arrival_theorem#Theorem_for_arrivals_governed_by_a_Poisson_process)).
It is one of the theorems I remember the most from my Queuing Theory classes
at the university!

------
nakedrobot2
In prague, the trams all run on time - within 2 minutes or less of the posted
time. So I think this article is incorrect for this particular context.

~~~
albertgoeswoof
Trams don’t have traffic in the way that buses do, and there are only minor
differences in the drivers that could cause anomalies (I.e. you can’t steer a
tram the wrong way) so they’re much easier to keep on time.

------
PascLeRasc
Is the inspection paradox what would happen if you surveyed everyone on how
many siblings they had, and every sibling double-counted N-1 times (where N is
the number of siblings in their family), inflating the resulting "average
number of siblings", or is that something different?

~~~
combatentropy
This is my first exposure to it, but yes, I think so. My paraphrase is: The
Inspection Paradox is when you ask someone "in the mix" about the mix. You're
only going to get an accurate estimation of the mix by standing outside of the
mix.

So yes, if you want an accurate count of siblings, you would consult some
spreadsheet that just lists how many children each family had. If you go and
start asking the families themselves (those "in the mix") then your results
will be skewed.

I thought the article that this article linked to was also very good, "The
Inspection Paradox Is Everywhere," by Allen Downey,
[http://allendowney.blogspot.com/2015/08/the-inspection-
parad...](http://allendowney.blogspot.com/2015/08/the-inspection-paradox-is-
everywhere.html)

------
akane
On a related note, arrival time predictions can be biased early to prevent
people from missing buses, which also increases the perception of lateness.

[https://nextbus.cubic.com/FAQs](https://nextbus.cubic.com/FAQs)

------
amai
Is there a "evil" distribution which maximises the waiting time? Or is the
Poisson distribution already the theoretical "evil" maximum that a public
transport provider can achieve?

~~~
anotheryou
Send all buses at once

~~~
kirkules
If something like that is an option, just don't ever send any.

~~~
anotheryou
Well but anything else would equally be bound to the extreme of the rule. "has
to come within a 10 minute timeframe" = send one at the beginning, the next
one at the end, so always 2 come together.

------
kuu
A bit off-topic: How can you integrate a jupyter notebook in a blog post like
this one? It looks really nice!

Nice article, btw, interesting topic!

~~~
anotheryou
if I'm not mistaken there is a html export function that bakes it in to a
static html

~~~
kuu
That's true! :) Thanks!

------
ezoe
I hate the poisson distribution because it completely against the naive
instincts of how random behaves.

------
mayankkaizen
Nice article. Since I just started learning Stats, I wish I could find more of
such notebooks.

Any recommendations?

------
nyc111
Would not be easier to actually time the actual waiting times as he waited for
the bus every day?

~~~
jchw
That would probably make it seem, though, that the buses actually don't arrive
(on average) every 10 minutes, since you'd oversample the buses that take
longer than 10 minutes.

~~~
mmt
I'd argue that it's not _over_ sampling at all, but, rather, that the measure
of "average bus arrival time" is what's invalid or misleading.

After all, the point of the bus arrivals isn't in service of the bus (or
driver) but of the passengers. Observed average wait time at each bus stop is
a better measure. The even better measure would be average wait time weighted
by number of passengers [1].

[1] which is tougher to measure empirically, or even model, than just average
wait time for that one person, since it requires counting passengers boarding,
not just bus arrival times.

~~~
hammock
That's a nice idea but ignores all the people sitting in their offices or
homes, choosing to go or not go out of their places down to the bus stop.

Better to consider each bus stop as an asset to invest in, the more valuable
it is, the more people you can serve.

~~~
hammock
@mmt to clarify, you seem to be treating bus stops independent of alternative
means of transportation. Measuring the average wait time of people at the bus
stop is not enough: there are people who chose to ride a bike today instead of
waiting at the bus stop, because of what happened to them yesterday at the bus
stop.

~~~
albedoa
Wow that is...quite the edit lol. I'm all for small corrections and addendums,
but completely changing the meaning of your comment from an attempted callout
is something else.

~~~
hammock
Glad to have impressed you. My edit was intended to bring the comment in line
with HN guidelines.

~~~
albedoa
Ah, I thought it was because you realized that your accusation was wildly
inappropriate and unfounded.

------
usgroup
This is a straightforward consequence of modelling an arrival process as a
Poisson distribution with a constant rate of arrival lambda...

Go from arrival to cumulative arrivals to time of arrival to recurrence of
arrival (next arrival). All are Poisson processes, including the recurrence
process, which has a fixed expected value.

------
torgian
I’m glad I live in east Asia. Busses and trains are almost never late

------
graycat
It's all much easier than that:

It's just the Poisson process, e.g., with a nice chapter in E. Cinlar,
_Introduction to Stochastic Processes_.

Buses come as _arrivals_. So bus arrivals are a _stochastic arrival process_
where _stochastic_ just means varying _randomly_ over time where, really, the
_randomly_ doesn't mean anything, includes deterministic arrivals, that is,
known exactly in advance, but also admits any case of unpredictability.

Well, in short, if have a stochastic arrival process with _stationary,
independent increments_ , then the arrival process is a _Poisson_ process and
there is a number, usually denoted by lambda, so that the times between
arrivals are independent, identically distributed random variables with
exponential distribution with arrival parameter, the arrival rate, lambda. The
_stationary_ means that the probability distribution of the times between
arrival does not change over time. The _independent increments_ means that the
time from one arrival to the next is independent of all the past _history_ of
arrivals.

The exponential distribution has the property, easy to verify with simple
calculus, that the conditional expectation of the arrival time given that the
arrival time is already greater than some number is the same as the expected
arrival time.

So, net, if bus arrivals form a Poisson process, then the time until the next
bus arrives is the same after waiting five minutes as not having waited at
all.

Cinlar's treatment is nice because it is _qualitative_ , that is, has
assumptions that can often be confirmed or believed just intuitively. And we
might not believe that bus arrivals meed the assumptions.

This subject can continue with, say, _hazard curves_ for equipment failures
and a lot more about Poisson processes.

E.g., the sum of two independent Poisson processes, say, Red buses and Blue
buses, assuming that they are Poisson processes, is also a Poisson process
with arrival rate the sum of the Red and Blue arrival rates. If _randomly_
throw away some arrivals, then what is left is also a Poisson process with
arrival rate adjusted in the obvious way.

In Feller's volume II is the renewal theorem that the sum of independent
arrival processes, Poisson or not, with mild assumptions, converges to a
Poisson process as the number of processes summed grows. So, if the users of a
sufficiently busy Web site act independently with mild assumptions, then the
Web site will see arrivals accurately as a Poisson process.

The vanilla Poisson process is Geiger counter clicks.

There is much more to the pure and applied math and applications of Poisson
processes.

------
nyc111
> When waiting for a bus that comes on average every 10 minutes, your average
> waiting time will be 10 minutes.

This is very ambiguous. Unless he gives a time frame the numbers do not make
sense. Average in a week? Average in a year? This is not how it works in real
life.

And I cannot accept his premise. My experience tells me that, in New York,
when I used to take a bus to work, sometimes the bus was coming as I was
walking to the stop; sometimes I would wait a long time. Sometimes not very
long. There was no observable bias.

~~~
pieguy
In statistics, "average" often means "expected value". No time frame is
specified (although you could consider it an infinite time frame). With a
small sample size your actual average might not be 10 minutes, but as your
sample size grows, it will tend toward 10 minutes.

~~~
twtw
> it will tend toward 10 minutes

If you are talking about spherical-cow style poisson buses, yeah (that's what
the author means by "reasonable assumptions). But as the author concludes, bus
arrival times are not well modeled by a poisson process.

