llom2600's comments

llom2600 · 2025-07-04T19:28:56 1751657336

I've been involved w/ cloud infrastructure for the past few years (mostly around optimizing cold start for large models, distributed systems, etc). Haven't went too deep into building applications with LLMs. Over the past week I tried building with BAML/MCP Servers etc. Thought I'd share what I learned in this repo - would love some thoughts on the architecture.

llom2600 · on March 4, 2022

We have a data integrations feature that allows you to connect external data sources (e.g. S3) with your sandbox.

You can basically combine this with "Scheduled Training", and it'll retrain your model on a schedule - pulling in the new data. This is V1 and does not yet handle more complex, event based retraining. This is something we know is crucial for tons of use cases and we're planning on adding it in the coming months.

Happy to chat and get your feedback on what kind of event sources you might want to trigger the model to retrain on new data.

llom2600 · on March 4, 2022

Really interesting - could definitely support the model itself. Might require a bit of conversation to figure out how to port the tool they use to interact with the model over.

llom2600 · on March 3, 2022

The second SDK issue w/ the regex is resolved. Just bumped latest version to 0.1.70. If you upgrade you should be good to go. Still haven't been able to reproduce your issue with pip.

5cotts · on March 4, 2022

Can confirm it works now! Thanks.

llom2600 · on March 3, 2022

Thanks for the heads up - yep, looks like a bug in our SDK. Should have a new version out shortly that handles it.

Weird that you weren't able to install the slai sdk via pip, we just released a new version of the SDK this morning, unsure if that's related. I'll take a look into that this afternoon.

Thanks for trying it out!

llom2600 · on March 3, 2022

We have been planning an "eject" feature that would let you develop in our app, but then export your artifact as a docker image you can spin up in your own cluster (or whatever). This is a necessity for customers who require on-premise deployments.

However, this is probably a few months out since we're currently focused on startups/developers that don't have that requirement.

llom2600 · on March 3, 2022

Much of the complexity in the sandbox is in ensuring that the development environment behaves as it it would in production - but also that it loads as fast as possible. So we have to do a bit of stuff behind the scenes involving dynamically linking libraries, provisioning Kubernetes resources, etc. But generally that's about right.

We've haven't open-sourced any of it yet, but there are definitely a few components of our system that we'll open source once we feel they're stable enough.

kamikazeturtles · on March 3, 2022

I'm sorry I don't have any experience with Kubernetes

What benefit would Kubernetes bring to this architecture? You can create and destroy docker container using the api.

What do you guys use Kubernetes for?

llom2600 · on March 3, 2022

Kubernetes gives you a ton of extra tools that allow us to manage the lifecycle of our sandboxes, deployed models, asynchronous training jobs, etc.

Internally, each pod is just running a docker image. You could probably throw something together with docker/the docker API - but in our case we needed a bit more control.

kamikazeturtles · on March 3, 2022

Hey Eli!

Thanks for the helpful responses!

I sent you an email regarding a possible internship opportunity. Are you guys open to interns?

llom2600 · on March 3, 2022

We tried to keep our testing workflow as flexible as possible. There's a couple of use-cases that we wanted to allow:

- User is working with a pre-trained model that already went through extensive testing during training. In this case our test utilities are useful as e2e tests. Once you integrate the model into your handler, you can specify a bunch of test cases to be sure your API is going to behave as expected (like a unit test).

- User wants to train the model on our platform - they can add error metrics directly in their training script and prevent the model from being saved if any error metric exceeds a certain threshold. They can then additionally use the test.py script to run tests against the model + handler.

llom2600 · on March 3, 2022

Hey - thanks!

Yeah we've been thinking about this quite a bit. We've explored a couple of options here - I think our first pass is going to be a way to synchronize an external git repository with a sandbox. Would love to hear your thoughts here on what kind of workflow might make the most sense.

I think long term we'll also add VSCode integration through an extension, but that might be a few months out.

llom2600 · on March 3, 2022

Hi - Luke (CTO) here. Thanks! Yeah, we're using WASM for things some things in the editor like syntax highlighting. Planning on moving most of the network logic into WASM shortly as well.

ayanb · on March 3, 2022

great - I sent you a note over on LI. Hoping we can jam a bit on the network logic and tooling available to fast track things.