
Torus: A Toolkit for Docker-First Data Science - augustflanagan
https://medium.com/manifold-ai/torus-a-toolkit-for-docker-first-data-science-bddcb4c97b52
======
saamm
This is interesting! It sounds like this v1 gets your local environment up and
running in a Docker container. I maintain something similar for analysts on my
team, and we've seen success in terms of decreasing time spent on environment
setup.

As another interesting use of Docker in the data space, I'm excited about
Pachyderm [0] (though I haven't had the chance to use it in production). In
particular, the data provenance story seems compelling.

0:
[https://github.com/pachyderm/pachyderm](https://github.com/pachyderm/pachyderm)

~~~
jdoliner
Thanks for the plug saamm, I'm one of the creators of Pachyderm. I think Torus
and Pachyderm would work very nicely together. You could go straight from
developing code in the image Torus provides to deploying it on Pachyderm as a
production pipeline that runs on new data as it comes in with just a few
commands. Similarly, their Dockerized data science cookie-cutter could work
nicely as a Pachyderm service, this would work similar to using the service on
your laptop, except that you could easily deploy it on a cloud provider and
schedule it with GPUs and it will get updated with new data as it comes in.

Very exciting to see more people applying containers to data science.

~~~
sdeymanifold
Yes to containers! We are trying to make it as seamless as possible to be
Docker first in all things. And not reinvent the devops wheel. It just needs
to be adapted for the needs ot data scientists. Pachyderm is really cool. I
will have to check it out. We've recently moved to Airflow for all our
pipeline management... how does Pachyderm fit in that ecosystem?

~~~
jdoliner
Pachyderm's pipeline system covers much of the same functionality as Airflow's
so there's generally not much reason to use both.

------
zimbatm
No to confuse with this other company at
[https://manifold.io/](https://manifold.io/) (io, not ai) which deprecated
their [https://www.torus.sh/](https://www.torus.sh/) project :)

~~~
plara
[https://manifold.co](https://manifold.co) ;-)

~~~
sdeymanifold
Yes, we found that super confusing thing later. I guess if you name your
company Manifold you will name your projects after specific types of
manifolds. We have Torus, next is Mobius, then ... Klein Bottle?

------
robohamburger
I think a more interesting direction would be for jupyter lab to ship an
electron app and have it able to understand how to spin up and talk to
containerized kernels.

I made a hacky version for work that proxies to a k8s pod but first class
support would be cool.

~~~
po84
[https://github.com/jupyter-
incubator/enterprise_gateway](https://github.com/jupyter-
incubator/enterprise_gateway) to launch kernels on a cluster and
[https://github.com/jupyter-incubator/nb2kg](https://github.com/jupyter-
incubator/nb2kg) to make a notebook server aware of them might be of interest
(sans electron app shell).

~~~
robohamburger
Thanks for sharing this! I didn't realize this project existed.

I have lots of questions now, like why this isn't using the zeromq based
protocol, so I guess I will need to spend some time with it.

It does look like it closely overlaps with what I was describing. I didn't
realize overriding/extending the http api was even a thing that could be done
so I just used zeromq for my own purposes :)

------
mr_toad
Is cookie cutter running inside the docker container or on the host? The
instructions imply setting up python, virtualenv, pip and cookie cutter all on
the local machine...

------
stmw
Isn't this was Pachyderm was supposed to do?

