Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Batteries Included AI Deployment (github.com/mockapapella)
2 points by Mockapapella 11 months ago | hide | past | favorite | 2 comments
Hi HN, I've spent the better part of the last year deploying AI inference systems for both personal and work projects. I've noticed a ton of "gotchas" and footguns throughout the process that make doing it right a pain, so I built "Batteries Included AI Deployment". Not a creative name but it does what it says on the tin.

In short, it's a template for deploying AI inference APIs with FastAPI.

In long, it uses Docker to encapsulate (almost) the entire development and deployment process. The repo includes:

1. A way to download and cache models straight from huggingface

2. A way to expose those cached models via a FastAPI server endpoint

3. A docker configuration that exposes a `debugpy` port so that you can debug your application within a container

4. A way to run tests

5. A way to debug tests (using `debugpy` as mentioned above)

6. A way to run pre-commits on staged files

7. A way to manually run pre-commits on all code in your repository

8. CI steps via GitHub Actions

9. Full Observability with a Grafana Dashboard

10. Metrics via Prometheus

11. Tracing via Tempo

12. Logs via Loki

13. GPU monitoring via DCGM

14. CD via GitHub actions and a `post-receive` hook on the server

15. Alerts that email you when something goes wrong in production

I say "almost" because you still need a way to attach to the debugger port from outside the docker container and there's some one-time configurations that need to be set up manually, but not anything beyond that.

I'd love to hear any feedback you might have :)




The DevOps around LLM/AI is a disaster right now.


Agreed. I'm pretty sure it's due to all the different verticals coalescing, though I can't say for sure since some of the problems are consistent across different workflows as well.

You've got the data scientists tinkering away with the actual models where half of their stuff is in Jupyter notebooks. You've got the platform engineers sticking these models behind an API. DevOps making sure any model/code updates get propagated to prod in a safe and hassle free manner. Then you've got Infra who needs to make sure you actually have the specialized hardware to deploy these models onto.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: