Github Copilot is the most useful tool I've found in a long time and having that in Jupyter Notebooks is just awesome. I've been missing that for quite some time. Great work guys!
You can also open Jupyter notebook files in VS Code, which would be another way to get AI autocomplete. I’m not enough of a Jupyter user to know whether it would make sense to use VS Code all the time.
Yeah, this is definitely a good way to access AI code completion (inline or otherwise) in Jupyter notebooks. In fact, I know some data folks who've been using Jupyter from day 1 switching to VSCode simply because their company buys a copilot license for everyone and they really miss it their Jupyter workflow.
Agree. We actually tried getting GitHub Copilot to work with Jupyter but GH doesn't have an official API. We actually took some time to reverse engineer an implementation from neovim GH Copilot extension [1] and from Zed [2] but found it too flaky and too much trouble in the end.
Meanwhile, we also found a better speed/quality tradeoff with Codestral (since it has a fill-in-the-middle version, unlike a general LLM) so we decided to go with Codestral. This is inspired by continue.dev using Codestral for tab completion :)
At the moment, we manually verify operators and are currently onboarding some tier-4 operators. Down the line, we'll have a 2-tier system where you can choose whether you want a verified machine or not. From the operator's perspective, everything runs inside Docker, configured with security best-practices.
I've always understood that containers are not proper sandboxes and shouldn't be used for containing untrusted code, no matter the best practices used. Has this changed in recent years? Do you have documentation for what sorts of best practices you're using and why they are sufficient for executing untrusted code?
You are correct from my knowledge. I would expect that if the container is set to not run as root you might be able to enforce fine meaningful security but I’d still run it in a VM if feasible.
Having done a little bit of work in the area[1], I think you should publicly document exactly what those best-practices are. Are the workloads running in a networkless container? Do you limit IO? Do you limit disk usage? Answering these in detail would help you gain customer trust on both sides.
However, I think PyTorch does it the same way (?), at least they say something like this in their docs.
"This function accumulates gradients in the leaves - you might need to zero .grad attributes or set them to None before calling it." - https://docs.pytorch.org/docs/stable/generated/torch.autogra...
The rust burn crate does it better, they store the backprop'd gradients in a separate container and return it: https://github.com/tracel-ai/burn/blob/af381ee18566fc27f5c98...