
Forecasting at Uber with RNNs - paladin314159
https://eng.uber.com/neural-networks/
======
eggie5
I wish the diagrams were bigger, they are hard to read and a bit blurry.

One of the interesting points, that is often overlooked in ML is model
deployment. They mention tensorflow, which has a model export feature that you
can use as long as your client can run the tensorflow runtime. But they don't
seem to be using that b/c they said they just exported the weights and are
using it go which would seem to imply you did some type of agnostic export of
raw weight values. The nice part of the TF export feature is that it can be
used to recreate your architecture on the client. Bu they did mention Keras
too which allows you to export your architecture in a more agnostic way as it
can work on many platform such as Apples new CoreML which can run Keras
models.

~~~
agibsonccc
Warning: I'm a vendor. Take everything I say with a grain of salt. I will try
to sell you something.

1 biased perspective I have here: Infra is often a different team from data
science. They don't always do the deploying. Beyond "some sort of serving
thing" the data scientists might not necessarily know about what's being
deployed. This is not true at every organization and there are exceptions.
This is typically true of most companies we sell to though. There are usually
ML platform teams that do the "real" deployment (especially at sizable scale)

Another characteristic of production is it's "boring". "Production" is a mix
of databases to track model accuracy over time, possibly microservices
depending on how deployment is "done". Characteristic ways of giving feedback
when a model is wrong, experiment tracking and model maintenance among other
things.

A lot of these things are typically very specific to the company's
infrastructure.

The "fun" and "sharable" part that people (especially ML people) is usually
related to "what neural net did they use?"

The other thing to think about here: "production" isn't just "TF
serving/CoreML and you're done" there's typically security concerns, different
data sources,.. that are often involved as well that might be specific to a
company's infrastructure. There also might be different deployment mechanisms
for each potential model deployment: eg: mobile vs cloud.

Grain of salt sales pitch here: We usually see the "deployment" side of things
where it's a completely different set of best practices that happen to overlap
with data scientists experiments. This includes latency timing, persisting
data pipelines as json, gpu resource management, kerberos auth for accessing
data, managing databases and an associated schema for auditing a model in
production (including data governance), connecting to an actual app/dashboard
like the ELK stack,..

TLDR: The deployment model would be its own blog post.

~~~
krona
The Google paper _Machine Learning: The High Interest Credit Card of Technical
Debt_ [1] offers a semi-rigorous introduction to the topic of _real-world_ ML
model engineering/deployment considerations and best practice. (If anyone else
knows of similar work I'd be grateful to hear about it.)

[1]
[https://research.google.com/pubs/pub43146.html](https://research.google.com/pubs/pub43146.html)

~~~
agibsonccc
This is actually a great reference! Thanks for the link.

------
siliconc0w
I wonder how much they could enlist others to solve this by creating something
like an 'Uber Auction House' to basically buy and sell the right to reap
Uber's cut for a ride. They could clean up on exchange fees while everyone
solves this problem for them.

~~~
noway421
This is interesting, could it potentially reduce surge pricing?

One thing I thought, it would be really convinient if Uber could amortize
their surge pricing over the month/year. In order not to hit customers with
unexpected rates and essentially offer a flat predictable fee over the whole
period. Problem with that is you can't really plan the future demand to
calculate how much you need to save/dip. Could an auction house help to hedge
the bets?

~~~
bastawhiz
It's unlikely to affect surge pricing. You can think of surge as a tool to
move drivers to where the riders are. Even if you knew there would be heavy
demand, you need to incentivize drivers to actually go there. Why would I, as
a driver, go 20 minutes out of my way to be in Katy Perry concert traffic when
I can keep picking up passengers at a reasonable clip on the other side of
town?

Surge is only really solved once autonomous vehicles can be preemptively
positioned near demand.

~~~
theCricketer
The author might be suggesting that you could have surge priced compensation
for the drivers to incentivize them to move to the demand but also amortize
that cost for the consumer.

~~~
bastawhiz
That's an interesting problem because (if I'm understanding you correctly)
forecasting would need to be done at an individual level. A consumer getting
rides primarily during off-hours should (imo rightly) pay less than a rider
booking rides primarily from busy locations. Amortizing that cost fairly while
also making a safe profit is a tricky balance to strike, I'd think.

------
ozankabak
I don't understand if they use windowing as a fixed computational step that is
active both in training and scoring time, or, if they use sliding windows only
to chop up the training data.

Also, I wonder if they checked how a feed-forward NN that operates on the
contents of a sliding window (e.g. as in the first approach above) compares
with their RNN results. I am curious about this, as it would give us a hint
whether the RNN's internal state encodes something that is not a simple
transformation of the window contents. If this turns out to be the case, I'd
then be interested in figuring out what the internal state "means"; i.e.
whether there is anything there that we humans can recognize.

[edited to increase clarity]

~~~
zebrafish
I wasn't very sure what the sliding window part was about either. I think they
were just saying that they trained on a sliding window using the "output
window" as part of their loss function.

A feed-forward NN wouldn't do much because it doesn't hold a state variable
which you need to be able to understand context in time series data. There are
probably some pieces of the state that you'd be able to interpret but the
majority of it would mean nothing to us.

------
marksomnian
Whenever I see a post or announcement by a major company that they're using
"machine learning", I'm reminded of what CGP Grey said: it seems like nowadays
machine learning is something you add in to your product just so you can seem
hip by saying that it has machine learning, and not for a legitimate technical
reason.

There are undoubtedly things that machine learning is right for, however to me
it seems like it's become a buzzword more than anything else.

~~~
ThrustVectoring
Resume-driven development contributes some momentum here, too.

~~~
tryitnow
Resume-driven development - never heard that term before. Now I have an
excellent descriptor for what I see all too often.

Maybe O'Reilly should release a book on RDD?

~~~
ThrustVectoring
Resume-driven development is an organizational pattern, and like all patterns,
it exists for better reasons than you'd first think.

The lifetime compensation of developers isn't just tied to how much salary
they have at the moment. Getting into a dead-end and not developing your
skills will definitely set you back over the long term. Like, there's a reason
people will pay you a premium for working with COBOL. So there's a very
rational pressure to get better total compensation out of a role by choosing
resume-developing tools and technologies.

On the other end, organizations mostly just care about getting the task done.
They've got the choice of doing it in a boring way and paying lots of money
for developers who don't want to grow their resume, or indulging in the
developers fancies and getting it done more cheaply.

tl;dr - resume driven development is a way for companies to pay for projects
with "you'll get experience".

------
afro88
Interesting stuff, but all they've managed to do so far is find models that
fit historical data better. Would be interested to read a follow up a year
later to see how their models actually performed.

------
sjbp
I wonder how are they quantifying uncertainty around their predictions. Having
a point-estimate without some notion of confidence interval seems much less
useful. Is there a natural way to do this through LSTMs?

Also, some actual benchmarking would be great. Say, against Facebook's Prophet
(which also deals with covariates and holiday effects).

