NYC subway math (erikbern.com)
193 points by ikeboy on Apr 7, 2016 | hide | past | web | favorite | 68 comments

One other cause of delay (which is, anecdotally, present on the L line) is overcrowding. This is, unfortunately, not shown on the MTA's datafeed, though could maybe be estimated by adding the turnstile data?

The morning commute hours (and the hours after a 'delay' status has been cleared) are usually marked by very crowded platforms and trains. Even though the trains are 'on-time' and showing up every 3-5 minutes, there is no room to squeeze on, and you may need to let a few trains pass before getting on one successfully.

A 15 minute train 'delay' during the morning commute could cause an hour+ of backed up platforms and people unable to squeeze on. I think this kind of delay is where a lot of the displeasure with L line service is centered.

You perfectly described the issues that exist with the L train. It makes me wonder just why so many people decided to move to a region of Brooklyn with such historically poor subway service, when central Brooklyn, served by the 2,3,4,5 lines, is easily serving its rapidly-gentrifying population with relative ease.

The L's reputation is undeserved; it's just that some people make more noise than others.

I'm extremely happy with the location — it's arguably the most well-positioned area of Brooklyn. Almost everything of interest is 20 minutes or less away: Union Square, Village, SoHo, Midtown, and also transportation hubs like Penn Station and Grand Central. Greenpoint and Long Island City are both a super quick ride, and while south Brooklyn takes longer to get to, I can get to Fort Greene in 17 minutes, and Park Slope in 30. (The most egregious lacuna is to Brooklyn Heights and the surrounding area, but the ferry service is very enjoyable, and the bus is not bad either.)

The main overcrowding occurs during rush hour on the Bedford Ave. and Lorimer St. stops, the former of which is the only stop in a very dense area of recent urban (over)development. Lorimer isn't as bad, and it gets progressively much better the farther out you are.

Developers milking the region after the ~2008 zoning changes, availability of space for new constructions, and people willing to move there because of price (originally), proximity to the city, and (more recently) restaurants, parks, Yoga studios, etc.

Of course that was not sustainable long term, but most people didn't think about that, or didn't care.

I live right there in Williamsburg near the Bedford Avenue stop, and the subway issues are generally not a big deal. I can get to Grand Central within 20 to 25 minutes. Importantly, Williamsburg is at a very handy latitude; I can get to the East Village, Union Square, or the West Village in 10 to 15 minutes, and that is more often than not where I want to go. Then there are a ton of great restaurants and bars in the neighborhood, and everyone is pretty young and hip. It's actually under-priced in my view relative to the city (assuming the L doesn't stop running, in which case I will immediately return to Manhattan).

I'm sorry to report that you might have to return http://gothamist.com/2016/01/13/l_train_tunnel_closure_years...

People move to specific neighborhoods for many, many reasons other than available train lines.

Are you from NYC? Because this reads like someone who isn't aware of how important Manhattan access is to somebody making enough money to live on the developed sections of the L line. Bicycle commuting options do exist on the Williamsburg Bridge but it likely owes the newly-congested bicycle lanes to the poor L train service.

I'm finding more and more people that are in the right socioeconomic to live in Williamsburg and similar areas aren't going to Manhattan for work. They're working from home or in Brooklyn based startups.

I work with some of these people, but live in upper Manhattan. I spend a LOT of time commuting to Brooklyn.

Unfortunately all the Brooklyn startups seem to be clustered around DUMBO, which isn't easy to access from Williamsburg on a daily basis (without several subway transfers or traveling through Manhattan).

It doesn't read that way to me at all. Let's face it, people move to Williamsburg because it's a fun and trendy place to be. Never mind that an increasing number of workplaces are also located in the area, too.

Well perhaps.

Access to an Internet connection to email mom and dad in the Midwest and ask for more money for rent is also a common preoccupation along the L train corridor set.

Given the L's more modern signaling system, they should be able to squeeze more trains on the tracks than they do during the peak rush hour times. I wonder whether they rely on some kind of data that is lagged by a long time period to make those kind of decisions. I'd challenge anyone who contributes to those decisions to ride to Manhattan at 9am and from Manhattan at 6pm for a few weeks.

The lack of tail tracks at 8th avenue and the current electrical system are the current limitations. They could do more, but they can't turn the trains around fast enough or give enough power to more densely pack trains.

One tough part of the load handling is that if the L has increased capacity, this could, in some cases, just make congestion worse in many other areas because many people need to make at least 1 transfer for their trips.

The tough problem isn't always capacity either. Small disruptions have cascading consequences. This is where I believe a lot of the limitations in service come from since there is rarely a day w/o a single problem. As many others have pointed out in the past, this can largely be traced to the topological properties of the subway lines. The MTA's insistence in growing just a few large bottlenecks rather than disperse them is telling of the problems we'll be living with for decades or centuries longer.

At some point, I think the improved bus service initiatives are our only real hope of directly addressing the transportation problems in NYC. It might seem ludicrous in Manhattan but it certainly has its place and in the other boroughs, it can be a huge improvement when line selection is limited.

> largely be traced to the topological properties of the subway lines

See 'A Subway Named Mobius' [1][2] for some of the problems this might cause in the future!

[1] http://www.iblist.com/book12352.htm

[2] http://www.rioranchomathcamp.com/Topology/SubwayNamedMobius....

I think the L fills a unique enough niche in the subway network that making it run even more will just lead to induced demand. There really ought to be another line between Brooklyn and Manhattan that services parts of the L's route.

Induced demand is a good thing for subways!

Induced demand is a good thing for anything. It means people are richer.

Yes, for subways that means less cars too. That is an extra (and very big) benefit, but don't think it's bad on other contexts.

There is! JMZ

In Brooklyn, yes, sort of. But it doesn't have parity in Manhattan. The J doesn't go uptown and the M is west side only. And the Z? Does that even run anymore?

Z runs during rush hours, I believe. I only see it before 9 am and after 5 pm weekdays.

Also, you can transfer from the M to a whole bunch of other train lines depending on where you are going. It's not the best, but it isn't terrible either.

No room to squeeze on? I think you have not seen Japanese rush hour trains :D

Yes I know, not a useful response. I'd prefer not to have to squeeze

Where's Ben Wellington when you need him? http://iquantny.tumblr.com/

Right here!

Very cool website!

Thank you!

Unfortunately this is flawed in a very fundamental way. The NYC subways have different schedules for the morning rush, afternoon, afternoon rush, evening, and night. There should be at least three different distributions in the delay/wait data. Without teasing apart these distributions, I'm not convinced that anything meaningful can be said.

The author did limit his "sunk cost" analysis to 7am-7pm due to this exact reason. I think that part at least is quite sound, and is probably the most enlightening takeaway.

It also doesn't take into account the distance between stops. The L train has a short route compared to other lines.

> My fascination for the subway takes autistic proportions [...]

What a weird way to phrase that.

Autism often involves strong obsessions, so it's not wrong. Why does this matter? Fantastic article on the whole.

Sure, but obsessions are hardly the characterizing trait of Autism, nor are they even included in literary definitions. I agree the article is wonderful, but i came to these comments specifically to see if that phrase had given anyone else pause.

No idea if the author intended it this way, but on some forums and imageboards it's normal to refer to being obsessive about something as "being autistic about it". It's similar to how most people call themselves "OCD".

It made sense, but felt distasteful to me.

Yeah... Having HFA, I don't really think this is okay.

Nice work Erik! You should send this to straphangers.org: http://www.straphangers.org/

No surprise with the L train as its the only one that is automated(despite having a conductor on board.) They refuse to let it run un-manned which begs the question why did they spend the hundreds of millions of dollars to automate the line? There are plenty of places with unmanned subway lines. There's one in Tokyo and I believe Barcelona and Copenhagen.

I guess its no surprise that their API would be mess as thats about sums up the culture at MTA from what I can tell.

The MTA is spending hundreds of millions of dollar to put arrival clocks in all stations. The problem with is that I want to know how long it is before the next train arrives before I pay for a fare. I have to pay for a fare in order to go onto the platform to find out the next train won't arrive for anther 20 minutes. At this point I have needlessly paid for a fare and generally walk back up stairs and take a taxi.

When they were questioned about this poor decision they responded it was because of terrorism that they couldn't put train arrival times outside the entrance. That makes zero sense.

> The problem with is that I want to know how long it is before the next train arrives before I pay for a fare.

The times displayed on those clocks is available here: http://apps.mta.info/traintime/ or via the official app http://web.mta.info/apps/subwaytimeapp.html

What station(s) are you using that you can't see the time clocks before entering the turnstiles? I can't think of any, although admittedly I'm not looking that often, as I rarely take a taxi for personal use.

Any and all of them that would be two levels of stairways - the stairs from the street to the turntsile/kiosk and the second set of stairs from the turnstile/kiosk to the platform. The clocks are only on visible once you are standing on the platform.

Unless you are on the rare above ground station this is the general layout.

I live on the L a bit out (post Montrose), and there's no way to see the schedule prior to swiping your MetroCard at my station

3rd ave in Manhattan for one.

> The problem with is that I want to know how long it is before the next train arrives before I pay for a fare. I have to pay for a fare in order to go onto the platform to find out the next train won't arrive for anther 20 minutes.

Not to sidestep your question, but do you live in NYC and not use the unlimited pass? I'd assumed everyone did.

My colleagues who live close to work do not own an unlimited card. You've also got people who commute mainly by bike or scooter and only use the subway due to weather conditions or when going out after work/on weekends.

They had planned to run it unmanned, and did run it unmanned for a small period of time. People felt unsafe, so they re-manned it.

Yeah? Please provide a citation? Was there an MTA survey that went out to the ridership of the L train? Did ridership experience a decline because it was an automated line? I'm pretty sure that none of that happened.

Sorry, it was _just_ the removal of the conductor (the one in the middle of the train that opens and closes the door. That was enough to scare people. They did it in 2005 and tried again in 2009.




The plan was to take out even the operator, but unions kept the train operator in. http://www.nytimes.com/learning/teachers/featured_articles/2...

Do you live in NYC but don't buy an unlimited card? I feel like that's an unusual use case.

Because everyone that lives in New York should be buying an unlimited card regardless?

How about retired people or people on a fixed income that are not work commuters?

How about someone that works in their neighborhood but still needs to use mass transit occasionally?

How about students that largely don't leave their campus tether but still need to use mass transit albeit less frequently than a commuting professional?

How about someone who bikes as their primary means of transit but still needs to use mass transit infrequently?

Its not really unusual at all is it?

So if you can't afford or don't need to purchase an unlimited subway that's your problem?

It's cheaper not to if you take less than 14 rides a week, which is not unusual at all in places like the East Village.

EasyPay Xpress MetroCard is what I used living in NYC for years; I still use it when I go visit actually. I never stop at a kiosk, it always has a balance from the attached credit card. I used full fare card, but you can also do unlimited monthly. Skipping the kiosk is great.

I live in NYC and barely use the subway at all.

Some thoughts on optimizing for wait time:


Awesome article.

The countdown clocks are generally accurate, but in certain situations they are surprisingly off.

I would love to see someone use this data to improve arrival predictions – a regression, say, based on features such as time of day (a proxy for crowding), weather, holidays, nearby events (concerts & sports), maintenance or signal problems...

I assume the clocks use something as simple a static numbers representing the time between adjacent stations or segments. They presumably don’t tolerate deviations from crowding or other unexpecteds.

Isn't the last graph showing a poisson distribution, which is exactly what you expect for situations like "waiting for a train that comes on a schedule"?

Yes, one way that the poisson process is defined is as a counting process with inter arrival times that are iid exponentials.

The poisson process is the first (continuous time) point process discussed in many textbooks because the counting increments are identically distributed, since the rate parameter is a constant. You can think that in the case of modeling waiting times for subway trains, a better model is with the rate parameter that is dependent on time, since the inter arrival times are smaller during rush hour than when the subway is closed.

You can read about this in Bertsekas, and more thoroughly in Parzen or Cox's books on stochastic processes.

Nice! Really liked the sunk cost part. However, sometimes leaving isn't really a good idea because it's another N minutes to another station, which could take you M minutes farther from your destination than your current train, which arrives O minutes after you get there, which takes P minutes to your stop. For each station, would it be possible to instead calculate the point where (M + N + O + P) < max_wait_time?

I thought the implication was that you would get a cab, not that you'd walk to another subway station.

Totally off topic, but: I grew up next to the VDNKh metro station in Moscow & therefore appreciate your user name.

Thanks! I got it from reading the Metro: 2033 book (which is really fantastic). I too have a fascination with subways

The storytelling is really atmospheric. Even more when I read it on a subway train.

Wow - thank you for sharing! It's a fun data source for those of us interested in the topic. My son learned numbers and letters from subway stations.

Does this guy live in New York?? The 4 train does not stop at Van Cordlant-242 St. That would be 1 train.

I should know since I ride it every day.


1. Either the stations are mislabled or that is data from the 1 train not the 4 train in that first visualization.

2. The waiting time by line is flawed because the other lines are redundant to each other (corresponding colors) but the L is a single line so waiting times should be shorter.

Yes, it's the 1 train.

let me fix

Interesting, they should install this calculator on the tablet kiosks in the stations

since you wouldn't be able to download this calculator in stations w/o data server

Damn. This is useful. I used to just set a timer whenever I waited for a train and left after it expired, but this is really useful data to have too.

Good work!

