Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Neuropod – Uber ATG's open source deep learning inference engine (github.com)
80 points by vpanyam 24 days ago | hide | past | web | favorite | 25 comments

Hey Everyone! I lead the development of Neuropod. Happy to answer any questions

There's also a blog post that has more detail: https://eng.uber.com/introducing-neuropod/

Super excited to open-source it!

TensorRT offers significant advantages wrt inference and it takes ONNX files. Best I can tell this does not have a TensorRT backend (https://github.com/uber/neuropod/search?q=nvinfer.h&unscoped...). Why not?

Adding backends for TensorRT, ONNX, JAX, etc are on our TODO list (and we'd love to see PRs to add support for these and others)!

We actually do use TensorRT with several of our models, but our approach is generally to do all TRT related processing before the Neuropod export step. For example, we might do something like

    TF model -> TF-TRT optimization -> Neuropod export

    PyTorch model
    -> (convert subset of model to a torchscript engine)
    -> PyTorch model + custom op to run TRT engine
    -> TorchScript model + custom op to run TRT engine
    -> Neuropod export

Since Neuropod wraps the underlying model (including custom ops), this approach works well for us.

I wrote our internal lightweight version of neuropod at another SDC startup where we did use TensorRT. Our ML researchers worked in pytorch and more often than not, the pytorch -> onnx -> tensorrt conversion did not work. We ended up needing to replicate the network architecture using the tensorrt library and manually convert the weights from pytorch. Then we'd use the tensorrt serialization to compile the models so they could be run in c++. I imagine that they may have tried this in neuropod and saw the same conversion problems. TensorRT was a big investment to get running smoothly but it did shave off 20% or so off our inference latency

It's gotten better in TensorRT7. I'm using it quite successfully. It does have a lot of corner cases though, that much is true, and the documentation is really poor, which, coupled with it being mostly closed source, severely limits adoption.

That said, I'm getting ridiculously good performance with it, even without using the TensorCores.

What will the continued support for this project be, given that Uber has shuttered their AI Labs?

Neuropod was created at Uber ATG (not AI Labs) and powers hundreds of models across the company (ATG and the core business). It's been used in production for over a year and we're continuing to actively work on it.

The blog post I linked above goes into more detail, but here's a relevant quote about usage within Uber:

> Neuropod has been instrumental in quickly deploying new models at Uber since its internal release in early 2019. Over the last year, we have deployed hundreds of Neuropod models across Uber ATG, Uber AI, and the core Uber business. These include models for demand forecasting, estimated time of arrival (ETA) prediction for rides, menu transcription for Uber Eats, and object detection models for self-driving vehicles.

I'm a former Uber AI Labs member - they're different orgs. Begs the question though, who at Uber will use this now?

(EDIT there are many projects at Uber using neuropods)

Also it's great.

From the looks of it. It is from uber ATG not from AI labs. I believe those are two different orgs. Someone from uber can clarify

any possible support for tensorrt ?

How does this compare to ONNX https://github.com/onnx/onnx in terms of feature completeness/performance and what made you develop your own runtime ?

This is a good question. I want to write a more detailed post about this in the future, but here are a few points for now:

- Neuropod is an abstraction layer so it can do useful things on top of just running models locally. For example, we can transparently proxy model execution to remote machines. This can be super useful for running large scale jobs with compute intensive models. Including GPUs in all our cluster machines doesn’t make sense from a resource efficiency perspective so instead, if we proxy model execution to a smaller cluster of GPU-enabled servers, we can get higher GPU utilization while using fewer GPUs. The "Model serving" section of the blog post ([1]) goes into more detail on this. We can also do interesting things with model isolation (see the "Out-of-process execution" section of the post).

- ONNX converts models while Neuropod wraps them. We use TensorFlow, TorchScript, etc. under the hood to run a model. This is important because we have several models that use custom ops, TensorRT, etc. We can use the same custom ops that we use at training time during inference. One of the goals of Neuropod is to make experimentation, deployment, and iteration easier so not having to do additional "conversion" work is useful.

- When we started building Neuropod, ONNX could only do trace-based conversions of PyTorch models. We've generally had lots of trouble with correctness of trace-based conversions for non-trivial models (even with TorchScript). Removing intermediate conversion steps (and their corresponding verification steps) can save a lot of time and make the experimentation process more efficient.

- Being able to define a "problem" interface was important to us (e.g. "this is the interface of a model that does 2d object detection"). This lets us have multiple implementations that we can easily swap out because we concretely defined an interface. This capability is useful for comparing models across frameworks without doing a lot of work. The blog post ([1]) talks about this in more detail.

The blog post ([1]) goes into a lot more detail about our motivations and use cases so it's worth a read.

[1] https://eng.uber.com/introducing-neuropod/

Yes, this seems very similar to the ONNX Runtime https://github.com/microsoft/onnxruntime. I'm not sure why they needed to reinvent the wheel here.

How is the performance of inferencing compared to the native serving solutions provided by frameworks like TFServing etc

Found it interesting that most of the commits are under 1 contributor (OP). Are you the most active contributor or was this an artifact of open-sourcing it? Just wondering if you get hit by a bus tomorrow, what would we do? :)

Thanks for this, btw!

This looks great thanks for open sourcing it.

Have you had a chance to try running your models on baremetal devices such as ARM cortex M4?

Is there a list of OPs that are supported or crucially, unsupported?

Are there any examples of demand forecasting ? Thanks.

How does it differ from pyro?

Neuropods can wrap pyro models

Consider using ONNX instead.

It always strikes me as uncannily brave to see a post like this. So many statements associated with one username..

- I no longer work at $company, and their stuff sucks

- ergo, they fired me, or I left on bad terms

- I clearly didn't get on well with my coworkers, as I'm happy to shit on their work from across the pond

- ergo, I have some deep attitude problem I'm likely to bring to my next placement

This is not a good thread for a disgruntled grudge post.

That is fair. It does not contribute to curiosity, discovery, or good conversation. I will remove the negative bits.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact