
Using machine learning to estimate lost demand in a fulfillment chain - lil-scamp
https://tech.instacart.com/modeling-the-unseen-6a51c9a02430?gi=2b1c2c2c8646
======
joker3
It's censored data. Statisticians have been working with that for at least
half a century. All they've done is reinvent the wheel, and I have no
confidence that they've made something that's as good as what we already have.

Anybody who's interested in this should read up on the Kaplan-Meier estimator
([https://en.wikipedia.org/wiki/Kaplan%E2%80%93Meier_estimator](https://en.wikipedia.org/wiki/Kaplan%E2%80%93Meier_estimator)).
If you want to see it used for a problem that's very similar to demand
estimation, see Censored Exploration and the Dark Pool Problem
([https://www.cis.upenn.edu/~mkearns/papers/darkpools-
final.pd...](https://www.cis.upenn.edu/~mkearns/papers/darkpools-final.pdf)).

~~~
dmix
Plenty of software has reinvented something that was done prior. Even if the
outcome hasn’t improved significantly, how the outcome was achieved still has
plenty of value. Just as we’ve seen in software, repeatedly finding better and
efficient processes to achieve largely the same technical outcome.

Throwing the machine learning label into the mix always pushes the expected
results to a higher standard.

~~~
zwaps
In contrast, demand estimation is a statistical or rather econometric problem
that targets exactly the areas that ML has yet to explore: Causal analysis,
censoring of dgp and related to this but distinct in the literature,
identification and endogeneity.

The authors in this article do not show any ML, its all mainline stats.

ML is used, even in these areas, for the things it does best. So it is a
misnomer to separate nowadays.

But i disagree that one wozld expect ML to do better in this area.

Look at websites doing ab testing, which is certainly not ML but experimental
stats.

~~~
dmix
I should note that I meant people tend to expect more from ML outputs when
they hear the words ML/AI while that may not be the case and while it could
simply be a process optimizations making the whole original output easier to
achieve for personal new to it or gives experts a shorter path.

------
bkberry352
Reminds me somewhat of that AirBnB paper [1] around demand estimation for
optimal price setting. In both cases the regression target was unobservable.
In the instacart case: the counterfactual demand if all delivery options had
been available. In the AirBnB case: the counterfactual demand if a house had
been listed at a different price. In both cases, it appears the solution was
to build bind models and score those to estimate what the demand would have
been under the counterfactual case.

[1] [https://www.kdd.org/kdd2018/accepted-
papers/view/customized-...](https://www.kdd.org/kdd2018/accepted-
papers/view/customized-regression-model-for-airbnb-dynamic-pricing)

------
PaulHoule
It reminds me of the time that I went to the store to buy numbers for my
mailbox and they had a "4" and a "2" but they didn't have a "7" so I didn't
buy any of them.

~~~
sova
Oddly profound

------
Plough_Jogger
I worked for an 'on-demand' startup facing this problem and we simply ran a
limited experiment which showed users full availability. We tried to back-fill
any appointments for which we didn't have real supply, but otherwise sent out
an 'oops' email and rescheduled.

While there was clearly a negative experience for the customers who went
through the experiment and had appointments rescheduled, we did get a very
good measure of true demand and were able to optimize some of our supply
dynamics.

------
madrox
I'd be more engaged in what Instacart has to say on ML if they managed to get
my deliver correct and within the delivery window at least once. Is this meta-
lost-demand?

