
Amundsen – Lyft’s data discovery and metadata engine - ryan_lane
https://eng.lyft.com/amundsen-lyfts-data-discovery-metadata-engine-62d27254fbb9
======
ryanworl
This was probably in development before FoundationDB was open sourced, but
this is exactly the type of system where FoundationDB shines. A developer from
Apple did a talk at the FoundationDB Summit describing Apple’s metadata store
for machine learning that people interested in this from Lyft may like.

[https://youtu.be/16uU_Aaxp9Y](https://youtu.be/16uU_Aaxp9Y)

------
hallman76
Would someone mind explaining how ML models are used in the real world? In
this case Lyft has users, trips, points of interest (destinations). What would
they create a model for? How would it improve driver/rider experience?

I've gone through a few TensorFlow tutorials, but still can't grok what the
appropriate use-case is.

Edit: I really appreciate the responses @tedsanders and @theossuary - thank
you so much! Not sure what the etiquette is other than upvoting you!

~~~
tedsanders
I received an offer from Lyft last year to do data science. Here are some
potential use cases of machine learning for a taxi company:

-Predicting wait times

-Predicting the best route

-Predicting where to send drivers before requests come in

-Predicting demand so that you can surge price predictively

-Predicting what assets/requests to prefetch to the phone and when

-Predicting when drivers will churn and what tactics will reduce churn

-Predicting when passenger will churn and what tactics will reduce churn (e.g., special deal for 25% off next 10 rides)

-Predicting which passengers are more or less price sensitive and then price discriminating accordingly

-Predicting what it will cost various vehicles to reroute to pick up an additional Lyft Pool passenger on the way to their destination

-Predicting the driver's position some time ahead (e.g., you don't want to send a request for them to get off the freeway right as they are passing their exit)

-Predicting car location based on fusion of GPS, accelerometers, priors from past driving data (e.g., if GPS says you're 10 m right of the freeway traveling at 60 MPH, you're actually probably on the freeway)

-Predicting which Facebook users will be most likely to click ads

-Predicting who and when to send out email marketing

-Predicting all sorts of server load balancing type stuff on the back end (proactively scaling capacity for known busy times, etc.)

~~~
jayperi
Predicting age, gender, and cost of the route and timings for en route, and
doing shopping online for groceries and beverages in the cab, and pick them on
their return to home.

------
photoft
We did a presentation in Strata SF 2019 last week which detail could be found
at [https://conferences.oreilly.com/strata/strata-
ca/public/sche...](https://conferences.oreilly.com/strata/strata-
ca/public/schedule/detail/72505). The slide could be found at
[https://www.slideshare.net/taofung/strata-sf-amundsen-
presen...](https://www.slideshare.net/taofung/strata-sf-amundsen-presentation)
.

~~~
crorella
How do you deal with discoverability of map keys and struct fields? That's
something we just added after such datatypes became more prominent in our DWH.

------
pella
[https://github.com/lyft/amundsenfrontendlibrary](https://github.com/lyft/amundsenfrontendlibrary)

~~~
photoft
there are four repos (3 for microservices, 1 for data ingestion library):
[https://github.com/lyft/amundsenfrontendlibrary](https://github.com/lyft/amundsenfrontendlibrary)
[https://github.com/lyft/amundsensearchlibrary](https://github.com/lyft/amundsensearchlibrary)
[https://github.com/lyft/amundsenmetadatalibrary](https://github.com/lyft/amundsenmetadatalibrary)
[https://github.com/lyft/amundsendatabuilder](https://github.com/lyft/amundsendatabuilder)

~~~
kenhwang
Nothing quite highlights the fragmentation of Python more than 3 different
Python versions being called out across 4 repos.

------
huac
congrats to the team! reminds me of Airbnb's data discovery tool:
[https://medium.com/airbnb-engineering/scaling-knowledge-
acce...](https://medium.com/airbnb-engineering/scaling-knowledge-access-and-
retrieval-at-airbnb-665b6ba21e95)

------
iblaine
Metacat is worth checking out as well.

