
Syberia – Make R a production-ready language for deployable machine learning - michaelsbradley
http://syberia.io/
======
dataewan
We've tried using docker to put R applications into production. I'm no docker
expert, and I found it pretty easy to do.

A couple of pointers: \- If you have a dependency on a library like blast that
takes a long time to compile, you can make a base docker image that already
has that library installed. That makes iteration quicker, as you'll only need
to build that base image once.

\- If you put a web interface on the image using shiny, then it is
straightforward to deploy it for your users to interact with.

~~~
robertk
We usually dockerize outputs of our Syberia projects as well. We have several
dozen internal packages consumed by the root projects. With many contributors
working on constituent packages daily, we've found frequent changes to
packages can slow down a docker-only workflow. So far, using a base docker
image with lockbox catching us up to the most recent daily and hourly changes
has been working well.

------
zitterbewegung
This looks like it solves a big pain point in R. I hope that more tools like
this crop up. R has a nice set of libraries but it lacks in data engineering
at this point.

------
gaius
What does this do that R on Azure Machine Learning doesn't? Not snark, genuine
question.

~~~
robertk
One of the lead developers on the Azure suite wrote a blog post that might
explain some of the differences:
[http://blog.revolutionanalytics.com/2017/06/syberia.html](http://blog.revolutionanalytics.com/2017/06/syberia.html).
A rough analogy is that Rails is to AWS/Heroku like Syberia is to Azure. You
can replace the underlying components in your project with calls to Azure
services, but a large developer team may prefer to work in a unified codebase
over a set of UI tools.

------
zebrafish
How is this different from the caret package? Using if(interactive()) {} as
the main function and including the extremely well documented caret package
seems to accomplish much of the same thing that Syberia does unless I'm
missing something.

~~~
robertk
Author here. Philosophically, Syberia can be thought of as an extension of
caret with hopefully clearer abstractions in large projects. In particular,
all of the packages supported by caret can and will eventually be parametrized
into the modeling engine.

This is the 0.6 release in which we make the scaffold available. Over time, we
will fill in the pieces that are currently provided by other tools like caret
or Bernd Bischl's mlr.

~~~
scottlocklin
You know, the real problem with R for production work is dealing with munged
package dependencies. Writing a makefile with hand curated package deps is
something I really wish I never had to do again.

~~~
claytonjy
Looks like robertk wrote `lockbox` specifically to solve this.

Dependency management still sucks in R. Too many options, each a little
different: Packrat, checkpoint, now lockbox. I have friends using Docker
specifically to encapsulate their R package dependencies, too.

~~~
kirillseva
disclosure: I work with robertk at Avant, where we've developed syberia and
lockbox specifically to have reproducible model builds and to turn R model
objects into API servers in a deterministic way. Lockbox is closely modelled
after ruby's bundler, and syberia is like rails for data science. I've you've
ever written a ruby-on-rails or a django app you'll feel like at home using
lockbox. I encourage you to give it a try, and let us know in github issues if
you are having any trouble using it

~~~
claytonjy
I'd love to hear more about how it compares to other workflows. Is it worth
the learning curve? There's a lot going on here, and a lot of overlap with
more popular and widely used tools in the R world.

My small team is R-first and a big fan of the tidyverse, and we're exploring
using Docker in tandem with tools like plumber (R package for building simple
API's around R code) and pachyderm (language-independent, containerized data
pipelining with straightforward cloud integration) for different projects.
Does Syberia fit in nicely with these tools, or aim to replace them with its
own set of conventions and philosophies?

