
A Future for R: A Comprehensive Overview - michaelsbradley
https://cran.r-project.org/web/packages/future/vignettes/future-1-overview.html
======
jonchang
So I've actually tried to use the futures package. While it's very clean for
certain types of tasks, there are a few problems that I think are inherent to
the way R deals with its parallel packages (which the futures package is built
on top of).

Futures is great for tasks where you have some kind of task workflow like:

    
    
        # slow task 1 --------,
        #                      ----> task 3
        # slow task 2 --------'
    
        do_stuff <- function(input1, input2) {
          result1 <- slow_task1(input1)
          result2 <- slow_task2(input2)
          task3(result1, result2)
        }
    

Because then you can just do something like:

    
    
        library(futures)
        plan(multicore)
        do_stuff <- function(input1, input2) {
          result1 %<-% slow_task1(input1)
          result2 %<-% slow_task2(input2)
          task3(result1, result2)
        }
    

And boom, you can have two tasks running in parallel and everything "just
works." It's _extremely_ nice to use thanks to R's promises capability.

Where it falls down is when you try to load up a bunch of futures at once...
I'm not clear on the implementation details, but from what I can tell every
parallel task is assigned a "port" on your system, but if there is a port
conflict (or the OS doesn't "release" (?) the port quickly enough) tasks
simply die with an inscrutable error.

I've found that it's necessary to 1. ensure that only one "set" of parallel
tasks are running at one time, and 2. create a central "port registry" and
manually assign ports randomly within nonoverlapping ranges for parallel
tasks. It's straightforward but frustrating to do.

Finally (and I don't know if the futures package has updated since I tried it
out last year) it doesn't work on Windows, which is a problem for many R
users.

~~~
claytonjy
As for not working on Windows, have you tried `plan(multisession)` instead of
`plan(multicore)`? The latter will never work on Windows due to it's lack of
forkability, as mentioned in this vignette.

That ports thing is interesting; could that bug be specific to the `multicore`
plan?

~~~
jonchang
It's been over a year since I dug into this problem but I believe that the
ports issue is inherent to all R parallel code, at least on Linux. Peeking at
the futures code multisession is build on the "clustering" type of parallelism
which means that it is vulnerable to this issue.

I haven't encountered the problem on Mac but that's because my laptop is
generally unable to create enough parallel jobs to saturate the number of
available ports.

------
michaelsbradley
Also see this slide deck, by the author:

[http://www.aroma-
project.org/share/presentations/BengtssonH_...](http://www.aroma-
project.org/share/presentations/BengtssonH_20160628-useR2016/BengtssonH_20160628-A_Future_for_R,useR2016.html)

------
rcthompson
I actually hacked together an almost functional system similar to this a few
years ago. It used the same primitives, delayedAssign and
parallel::mcparallel, to implement parallel-evaluated lazy promises. It was
nearly useful, but I couldn't get it to work when passing one promised value
to another lazy expression, presumably because only the process that forked a
subprocess can read the value it returns, so the second forked process can't
evaluate the promose. It looks like this package solves that problem by
forcing evaluation of any implicit futures before passing them to another
future. I'm definitely interested in trying this out.

------
haddr
This definitely a great feature!

Now, with this in hand can we have for instance a multithreaded (or in some
other parallel way) web server or even REST API for R?

Talking from the practical perspective, the biggest problem with wide adoption
of R is the problem of integration. Sometimes you just want to have a single
module in R, and the rest of the system in some other technology. I know there
are ways to do it, but not without quite high technical debt. On the other
hand having native microservice-like integration could probably help.

I know we are close, but not sure how close.

~~~
HenrikB
Have a look at the fiery package
([https://cran.r-project.org/package=fiery](https://cran.r-project.org/package=fiery)),
which utilizes futures and that Thomas Pedersen just released:

"A very flexible framework for building server side logic in R. The framework
is unoppinionated when it comes to how HTTP requests and WebSocket messages
are handled and supports all levels of app complexity; from serving static
content to full-blown dynamic web-apps. Fiery does not hold your hand as much
as e.g. the shiny package does, but instead sets you free to create your web
app the way you want."

Bob Rudis wrote an interesting blog post
([https://rud.is/b/2016/07/05/a-simple-prediction-web-
service-...](https://rud.is/b/2016/07/05/a-simple-prediction-web-service-
using-the-new-firery-package/)) about it the other day. As you can see there,
Thomas has some interesting extensions planned (e.g. routr).

~~~
haddr
Thanks! I'm starting to like this package!

Update: it still can serve only one request at a time, but looking forward for
some progress there...

------
nosound_warmup
This feels a little like using a hammer for cutting down a tree. You could do
it, but there really are better tools for that particular job.

~~~
rcthompson
I think there are perfect use cases for this style of code. For instance, I
sometimes find myself writing code something like this:

    
    
        some_value <- compute_some_value(dataset1)
        other_value <- compute_other_value(dataset2)
        final_answer <- some_value + other_value
    

where compute_some_value() and compute_other_value() are independent and both
of them take a long time to run, so they would benefit from running in
parallel. However, actually running them in parallel is tricky, because most
parallel interfaces in R are modelled after lapply, running a single function
on multiple elements of a list, and this doesn't fit that mold. You could
parallelize it manually using primitives such as parallel::mcparallel, and
delayedAssign, but you don't get error handling/propagation, and your code
gets super messy with the implementation details of your parallelization
strategy. And if you do parallelize it and then someone else calls _your_ code
in parallel, now you get too many parallel processes and risk running of
memory and ending up in swapping hell.

The bottom line is that code such as the above generally just doesn't get
parallelized, because the only way of doing so (as far as I know) requires
pointing several guns at your foot. So this package looks very interesting and
useful to me, and I also think it provides a good set of primitives with which
to implement yet another "multi-backend parallel lapply" package with
advantages over the others, such as doing its best to ensure consistent
behavior across the different "backends".

(Edit: Also see jonchang's comment along similar lines.)

~~~
baldeagle
Thank you for explaining this; I was trying to see how this would be useful.
Could this be used to do parallel data load (like read from csv and database
at the same time?)

~~~
huac
yes, you could draw from multiple data sources at the same time.

    
    
      csv_data %<-% read.csv("myfile.csv")
      tsv_data %<-% read.tsv("otherfile.tsv")
    
      csv_data %>% left_join(tsv_data, by = "id_user")

------
apathy
See also [https://mitpress.mit.edu/sicp/full-
text/sicp/book/node70.htm...](https://mitpress.mit.edu/sicp/full-
text/sicp/book/node70.html) for fun background reading. Or if you have an
actual CS degree, review from freshman year.

~~~
protomyth
> Or if you have an actual CS degree, review from freshman year.

You do realize that actual CS degrees happened before 1985?

~~~
apathy
Before 2007, you mean?

Because that's when SICP was deprecated for MIT 6.001. Happily, it's still
available:
[http://web.mit.edu/alexmv/6.037/](http://web.mit.edu/alexmv/6.037/)

~~~
protomyth
I was going for the original date of publication but I see your point.

------
insulanian
Can someone explain in few sentences (or point me to the place where I can get
the answer) what is the benefit of using R over some functional language like
Haskel, OCaml or F#?

Which use-cases does R shine in over those languages?

Is the popularity driven by R the language or its libraries?

~~~
mziel
While obviously the answer is libraries and data types (DataFrame is a very
powerful semantic, replicated later in Python, Julia, Scala/Spark), R
definitely IS a functional language:

\- closures, lambdas

\- high-order functions

\- a lot R processing is map/fold-like (even though it's lapply instead of
map)

~~~
michaelsbradley
There's also the `Map`, `Filter`, and `Reduce` functions provided by the base
library, i.e. in addition to the `*apply` family of functions.

------
johnmyleswhite
I'd like to see the title changed to something like "Futures for R" since it's
so tongue-in-cheek that it's effectively clickbait.

~~~
projectramo
That might sound like trading commodity futures.

