
NYC subway math - ikeboy
http://erikbern.com/2016/04/04/nyc-subway-math.html
======
Judson
One other cause of delay (which is, anecdotally, present on the L line) is
overcrowding. This is, unfortunately, not shown on the MTA's datafeed, though
could maybe be estimated by adding the turnstile data?

The morning commute hours (and the hours after a 'delay' status has been
cleared) are usually marked by very crowded platforms and trains. Even though
the trains are 'on-time' and showing up every 3-5 minutes, there is no room to
squeeze on, and you may need to let a few trains pass before getting on one
successfully.

A 15 minute train 'delay' during the morning commute could cause an hour+ of
backed up platforms and people unable to squeeze on. I think this kind of
delay is where a lot of the displeasure with L line service is centered.

~~~
AndrewUnmuted
You perfectly described the issues that exist with the L train. It makes me
wonder just why so many people decided to move to a region of Brooklyn with
such historically poor subway service, when central Brooklyn, served by the
2,3,4,5 lines, is easily serving its rapidly-gentrifying population with
relative ease.

~~~
eigenvalue
I live right there in Williamsburg near the Bedford Avenue stop, and the
subway issues are generally not a big deal. I can get to Grand Central within
20 to 25 minutes. Importantly, Williamsburg is at a very handy latitude; I can
get to the East Village, Union Square, or the West Village in 10 to 15
minutes, and that is more often than not where I want to go. Then there are a
ton of great restaurants and bars in the neighborhood, and everyone is pretty
young and hip. It's actually under-priced in my view relative to the city
(assuming the L doesn't stop running, in which case I will immediately return
to Manhattan).

~~~
sytse
I'm sorry to report that you might have to return
[http://gothamist.com/2016/01/13/l_train_tunnel_closure_years...](http://gothamist.com/2016/01/13/l_train_tunnel_closure_years.php)

------
jrcii
Where's Ben Wellington when you need him?
[http://iquantny.tumblr.com/](http://iquantny.tumblr.com/)

~~~
whatever_dude
Very cool website!

~~~
iquantny
Thank you!

------
inverba
Unfortunately this is flawed in a very fundamental way. The NYC subways have
different schedules for the morning rush, afternoon, afternoon rush, evening,
and night. There should be at least three different distributions in the
delay/wait data. Without teasing apart these distributions, I'm not convinced
that anything meaningful can be said.

~~~
maxaf
The author did limit his "sunk cost" analysis to 7am-7pm due to this exact
reason. I think that part at least is quite sound, and is probably the most
enlightening takeaway.

------
startling
> My fascination for the subway takes autistic proportions [...]

What a weird way to phrase that.

~~~
kough
Autism often involves strong obsessions, so it's not wrong. Why does this
matter? Fantastic article on the whole.

~~~
liquidise
Sure, but obsessions are hardly the characterizing trait of Autism, nor are
they even included in literary definitions. I agree the article is wonderful,
but i came to these comments specifically to see if that phrase had given
anyone else pause.

~~~
sotojuan
No idea if the author intended it this way, but on some forums and imageboards
it's normal to refer to being obsessive about something as "being autistic
about it". It's similar to how most people call themselves "OCD".

------
bogomipz
Nice work Erik! You should send this to straphangers.org:
[http://www.straphangers.org/](http://www.straphangers.org/)

No surprise with the L train as its the only one that is automated(despite
having a conductor on board.) They refuse to let it run un-manned which begs
the question why did they spend the hundreds of millions of dollars to
automate the line? There are plenty of places with unmanned subway lines.
There's one in Tokyo and I believe Barcelona and Copenhagen.

I guess its no surprise that their API would be mess as thats about sums up
the culture at MTA from what I can tell.

The MTA is spending hundreds of millions of dollar to put arrival clocks in
all stations. The problem with is that I want to know how long it is before
the next train arrives before I pay for a fare. I have to pay for a fare in
order to go onto the platform to find out the next train won't arrive for
anther 20 minutes. At this point I have needlessly paid for a fare and
generally walk back up stairs and take a taxi.

When they were questioned about this poor decision they responded it was
because of terrorism that they couldn't put train arrival times outside the
entrance. That makes zero sense.

~~~
CPLX
Do you live in NYC but don't buy an unlimited card? I feel like that's an
unusual use case.

~~~
bogomipz
Because everyone that lives in New York should be buying an unlimited card
regardless?

How about retired people or people on a fixed income that are not work
commuters?

How about someone that works in their neighborhood but still needs to use mass
transit occasionally?

How about students that largely don't leave their campus tether but still need
to use mass transit albeit less frequently than a commuting professional?

How about someone who bikes as their primary means of transit but still needs
to use mass transit infrequently?

Its not really unusual at all is it?

So if you can't afford or don't need to purchase an unlimited subway that's
your problem?

------
loisaidasam
Some thoughts on optimizing for wait time:

[https://github.com/erikbern/mta/issues/2](https://github.com/erikbern/mta/issues/2)

------
mwsherman
Awesome article.

The countdown clocks are generally accurate, but in certain situations they
are surprisingly off.

I would love to see someone use this data to improve arrival predictions – a
regression, say, based on features such as time of day (a proxy for crowding),
weather, holidays, nearby events (concerts & sports), maintenance or signal
problems...

I assume the clocks use something as simple a static numbers representing the
time between adjacent stations or segments. They presumably don’t tolerate
deviations from crowding or other unexpecteds.

------
rgejman
Isn't the last graph showing a poisson distribution, which is exactly what you
expect for situations like "waiting for a train that comes on a schedule"?

~~~
ilzmastr
Yes, one way that the poisson process is defined is as a counting process with
inter arrival times that are iid exponentials.

The poisson process is the first (continuous time) point process discussed in
many textbooks because the counting increments are identically distributed,
since the rate parameter is a constant. You can think that in the case of
modeling waiting times for subway trains, a better model is with the rate
parameter that is dependent on time, since the inter arrival times are smaller
during rush hour than when the subway is closed.

You can read about this in Bertsekas, and more thoroughly in Parzen or Cox's
books on stochastic processes.

------
vdnkh
Nice! Really liked the sunk cost part. However, sometimes leaving isn't really
a good idea because it's another N minutes to another station, which could
take you M minutes farther from your destination than your current train,
which arrives O minutes after you get there, which takes P minutes to your
stop. For each station, would it be possible to instead calculate the point
where (M + N + O + P) < max_wait_time?

~~~
maxaf
Totally off topic, but: I grew up next to the VDNKh metro station in Moscow &
therefore appreciate your user name.

~~~
vdnkh
Thanks! I got it from reading the Metro: 2033 book (which is really
fantastic). I too have a fascination with subways

~~~
effie
The storytelling is really atmospheric. Even more when I read it on a subway
train.

------
mathattack
Wow - thank you for sharing! It's a fun data source for those of us interested
in the topic. My son learned numbers and letters from subway stations.

------
mrcactu5
Does this guy live in New York?? The 4 train does not stop at Van Cordlant-242
St. That would be 1 train.

I should know since I ride it every day.

[http://erikbern.com/assets/4_trips.png](http://erikbern.com/assets/4_trips.png)

------
pjpw
1\. Either the stations are mislabled or that is data from the 1 train not the
4 train in that first visualization.

2\. The waiting time by line is flawed because the other lines are redundant
to each other (corresponding colors) but the L is a single line so waiting
times should be shorter.

~~~
burntwater
Yes, it's the 1 train.

~~~
erikbern
let me fix

------
cloudjacker
Interesting, they should install this calculator on the tablet kiosks in the
stations

since you wouldn't be able to download this calculator in stations w/o data
server

------
nunez
Damn. This is useful. I used to just set a timer whenever I waited for a train
and left after it expired, but this is really useful data to have too.

Good work!

