
Show HN: Cortex – Open-source alternative to SageMaker for model serving - calebkaiser
https://github.com/cortexlabs/cortex/tree/v0.15.1
======
Lemaxoxo
Hey there, great work. I used Cortex as an inspiration when designing
chantilly: [https://github.com/creme-ml/chantilly](https://github.com/creme-
ml/chantilly), which is a much less ambitious solution tailored towards online
machine learning models. Keep up the good work.

------
itsderek23
This certainly looks like a cleaner way to deploy an ML model than SageMaker.
Couple of questions:

* Is this really for more intensive model inference applications that need a cluster? It feels like for a lot of my models, a cluster is overkill.

* A lot of the ML deployment (Cortex, SageMaker, etc) don't see to rely on first pushing changes to version control, then deploying from there. Is there any reason for this? I can't come up for a reason why this shouldn't be the default. For example, this is how Heroku works for web apps (and this is a web app at the end of the day).

~~~
calebkaiser
You're 100% right that Cortex is designed for the production use-case. A lot
of our users are running Cortex for "small" production use cases, since the
Cortex cluster can include just a single EC2 instance for model serving
(autoscaling allows deployed APIs to scale down to 1 replica). For ML use-
cases that don't need an API (a lot of data analysis work, for example),
Cortex is probably overkill.

As for your second question, we definitely want to integrate tightly with
version control systems. Since right now we are 100% open source and don't
offer a manged service, we don't have a place to run the webook listeners.
That said, most of our users version control their code/configuration (we do
that with our examples as well:
[https://github.com/cortexlabs/cortex/examples](https://github.com/cortexlabs/cortex/examples)),
and it should be straightforward to integrate Cortex into an existing CI/CD
workflow; the Cortex CLI just needs to be installed, and then running `cortex
deploy` with the updated code/configuration will trigger a rolling update.

If you're referring to version control for the actual model files, Cortex is
un-opinionated as to where those hosted, so long as they can be accessed by
your Predictor (what we call the Python file that initializes your model and
serves predictions). If you're interested in implementing version control with
your models, I'd recommend checking out DVC.

~~~
cgoltsev
Is it possible to partner with you to offer a managed service for Cortex? We
are looking at your solution to offer our clients for deployment.

------
bobosha
Has anyone used Cortex in production?

\- Could you share your experiences?

\- why would one choose this over docker for instance?

~~~
calebkaiser
I'm sure others will comment, but in the meantime, some people have written up
their experiences using Cortex in production. I'd point you to AI Dungeon:
[https://medium.com/@aidungeon/how-we-scaled-ai-
dungeon-2-to-...](https://medium.com/@aidungeon/how-we-scaled-ai-dungeon-2-to-
support-over-1-000-000-users-d207d5623de9)

We also have a pretty active Gitter channel:
[https://gitter.im/cortexlabs/cortex](https://gitter.im/cortexlabs/cortex)

As for your second question, Cortex uses Docker to containerize models. The
rest of Cortex's features (deploying models as microservices, orchestrating an
inference cluster, autoscaling, prediction monitoring, etc.) are outside
Docker's scope.

~~~
bobosha
>orchestrating an inference cluster, autoscaling, prediction monitoring,

Does this approach preclude the need for queuing (a la RabbitMQ) and/or a load
balancer?

~~~
calebkaiser
Yep! Cortex deploys load balancers on AWS and manages queueing.

~~~
bobosha
This is super-exciting! I didn't know it could be this easy!

How do you handle API authentication? Is there a module that interfaces with
AWS API gateway? or external API authentication?

~~~
calebkaiser
Right now, users handle API auth by using AWS API gateway in front of Cortex,
but incorporating AWS API Gateway into Cortex to automate this is on our short
term roadmap.

------
wikibob
The name Cortex is in use for the scalable Prometheus storage backend:
[https://github.com/cortexproject/cortex](https://github.com/cortexproject/cortex)

~~~
ignoramous
... and for a lot of other things:
[https://en.wikipedia.org/wiki/Cortex](https://en.wikipedia.org/wiki/Cortex)

------
dodata
One of the things that has deterred me from SageMaker is how expensive it can
be for a side project. Real-time endpoints start at $40-$50 per month, which
would be a bit too much for a low-budget project on the side. I love the idea
of using an open-source alternative, but I noticed that all of the systems
combined for Cortex would be a bit more expensive. Do you have any tips on how
to keep a model deployed cheaply for a side project using Cortex? Id be fine
with a little bit of latency on the first request, similar to how Heroku's
free dynos work.

~~~
calebkaiser
In general, Cortex will be significantly cheaper because you're only paying
AWS for EC2 (the bulk of the bill) and the other AWS services used (a much
smaller portion of the bill). With SageMaker, you're paying the EC2 bill plus
a ~40% premium.

To keep the AWS bill as low as possible, Cortex supports inference on spot
instances, which are unused instances that AWS sells at a steep (as in 90%)
discount. The drawback is that AWS can reclaim the instance when needed, but
with ML inference failover isn't as big of a deal, since you typically don't
need to preserve state.

If you use spot instances, choose the cheapest instance type possible, and
keep your autoscalers minimum replicas to 1 (meaning it won't keep many
replicas idling), you should be able to deploy the model pretty cheaply.
Significantly cheaper than with SageMaker, at the very least.

There's some more info here: [https://www.cortex.dev/cluster-management/spot-
instances](https://www.cortex.dev/cluster-management/spot-instances)

------
cameronfraser
Why would I use this over deploying the model to a lambda function aside from
lack of GPU? (not trying to be confrontational, genuinely don't know) Won't
lambda functions scale as needed? How does this compare cost wise?

~~~
calebkaiser
Great question. We actually experimented with Lambda before ever building
Cortex. We ran into several issues, the three easiest to list are:

1\. Size limits. Lambda limits deployment packages to 250 mb uncompressed, and
puts an upper bound on memory of 3,008 mb. That's not nearly big enough for a
lot of models, particularly bigger deep learning models.

2\. As you mentioned, GPU inference is supported on Lambda, and for many
models, GPUs are necessary for serving with acceptable latency.

3\. Lambda instances can only serve one request at a time. With how slow ML
inference can be—especially if you need to call another API or preform some IO
request—it's easy to lock up Lambda instances for full seconds just to serve
one prediction.

The TL;DR is that while Lambda works for some use-cases, it in general lacks
the flexibility and customizability needed for most inference use-cases.

------
_____smurf_____
How is this compared to KubeFlow?

~~~
calebkaiser
The simplest way to put it is that Kubeflow (whose team we have a ton of
respect for) is a tool for helping devops engineers build their ML deployment
platform on kubernetes, whereas Cortex is an ML deployment platform. Kubeflow
plugs into an existing k8s cluster, whereas Cortex abstracts k8s (and
automates AWS-layer devops too).

With Cortex, we wanted to build something so that developers can take a
trained model—regardless of if it's trained by their DS team or if it is a
pre-trained model—and deploy it as a production API without needing to
understand k8s. Because Cortex manages the k8s cluster, we can do the legwork
for features like spot instances, request-based cluster autoscaling, GPU
support, etc, and expose them as simple yaml configuration.

