Hacker News new | past | comments | ask | show | jobs | submit login
Programming R at native speed using Haskell (tweag.io)
145 points by psibi on Sept 9, 2015 | hide | past | favorite | 27 comments



I love R and I love programming in it, but I have always thought I need to learn Julia due to a few things including speed, but that the R community keeps coming out with great answers to those questions where Julia keeps my interest. I don't see how this would speed up my typical R programming or code to be worth learning Haskell. What am I missing and yes I read the whole article.

I have found Haskell very difficult to use with Cabal and its package management. The reason I went to Haskell was to teach myself functional programming. After struggling and going through 2 books I still felt like I hated Haskell due to making the environment just work. I work in three different locations and to get all 5 computers to work in Haskell was a serious pain. This lead me to see what else is out there. It lead me to https://www.coursera.org/course/proglang which introduced me to Racket and I loved it and felt that Racket was the perfect fit for me.


You should give Haskell another shot. Since last you tried, this happened: https://github.com/commercialhaskell/stack. In fact using stack is the recommended installation method for HaskellR, rather than using the cabal-install command.


I've long had an unrealized dream of doing something similar with R and Racket as the HaskellR people have done. I've thought that it would be cool to:

1. implement R or something very akin to R as a Racket language. I'd call it "Arket" because is far more Googleable than R or Racket ;-) Functions would be callable either from Arket or Racket.

2. Ensure that all my important R packages work with Arket, either by porting or by virtue of similarity between Arket and R.

3. Use Racket for everything.

I think this would be great until I realize I would need to do a lot of work to create this.


Best part is:

> I'd call it "Arket" because is far more Googleable than R or Racket


> I have found Haskell very difficult to use with Cabal and its package management. The reason I went to Haskell was to teach myself functional programming. After struggling and going through 2 books I still felt like I hated Haskell due to making the environment just work. I work in three different locations and to get all 5 computers to work in Haskell was a serious pain.

Package management is certainly the worst thing about Haskell at the moment. Just yesterday I had to fight with cabal, and sit through hours of compilation, to build my application with profiling enabled :(

There's recently been a surge of activity in this area though, which seems to have been driven by FPComplete and consulting with industry:

- First there's "Stackage", which is basically a curated version of the package repository Hackage. Anyone can upload a new package to Hackage, and package authors can make breaking changes whenever they like. Stackage takes consistent snapshots of Hackage, where the package versions are known to work together.

- Next there's "Stack", which is an alternative UI for Cabal (ie. it replaces the commandline tools, but uses the same libraries and infrastructure). Nice features of stack are that it can fetch/install GHC (so "bootstrapping" is much easier), and it doesn't use a global package database (cabal has this feature with "sandboxes", but they're opt-in).

- Personally, I use Nix for managing Haskell packages. It's a bit like stack or cabal sandboxes taken to the extreme, although its Windows support is still pretty experimental. It will fetch pre-built packages from a cache, rather than building stuff locally (as long as you're using the defaults, at least).

Take a look at https://www.fpcomplete.com/blog/2015/06/stack-0-1-release for more info.

> This lead me to see what else is out there. It lead me to https://www.coursera.org/course/proglang which introduced me to Racket and I loved it and felt that Racket was the perfect fit for me.

Racket (and Scheme in general) is also an excellent functional programming language. Having dynamic types and macros makes it a very different beast to Haskell, so it's definitely worth learning both :)


> Having dynamic types and macros makes it a very different beast to Haskell,

Can you speak to how the type systems in Typed Racket and Haskell differ?

I ask because I too have tried scaling Mount Haskell and noped out not long after the base camp. If I continue with Typed Racket, am I getting some or most of what I'd get by learning Haskell? Or is Haskell simply a beast I must slay to become whole?


Sorry, I've not used Typed Racket so I can't comment. I've only played around a little with Racket itself, although I've used various Schemes in anger (and more Emacs Lisp than I'd like!)

One cultural issue, rather than a technical one, is that the whole of Haskell is strongly typed, lazy and pure; there's no getting away from it, so every idiom, library and API must take them into account. I imagine Typed Racket's ability to mix and match features will result in more "conservative" code (ie. choosing the type system to match a known approach, rather than inventing a new approach to fit the type system).


That's a good answer. I'm still fairly novice in Racket, so I don't yet have a good intuition for where Racket is asserting its own cultural norms. Haskell definitely communicates its concepts of purity early on, and so I can see where the community would continue to push itself in these directions.


Looks fun. However, what is REALLY slow in R isn't the interpreter or garbage collector: but some of the object re-allocation patterns you accidentally trigger. Here is my note making some comparisons and recommendations on writing fast code in R: http://www.win-vector.com/blog/2015/07/efficient-accumulatio...


Well haskell is not the only option.

From .NET: http://rdotnet.codeplex.com/ there is also a dataprovider directly to F# : http://bluemountaincapital.github.io/FSharpRProvider/

Also there is C++ binding available also: http://www.rcpp.org/


Or just run it with FastR (https://bitbucket.org/allr/fastr). Here it is in action (running with JS and C in the same REPL): https://dl.dropboxusercontent.com/u/292832/useR_multilang_de...


How does garbage collection work? Does this version improve on the R I know from a year ago, which had a reference counting system with only three values: 0, 1, and 2+?


https://tweag.github.io/HaskellR/docs/managing-memory.html

https://tweag.github.io/HaskellR/docs/memory-allocation.html

Both these links have more information on this.

As far as I can tell (and most of this is way over my head) the native R garbage collection is used


Yes - the R GC manages R objects allocated by the R interpreter, and conversely the Haskell GC manages Haskell objects allocated by the Haskell runtime.


You're confusing reference counting (which is used primarily for R's copy-on-modify strategy) and its generational garbage collector.


In that case, does HaskellR interfere with copy-on-modify in R, for example does it increment the reference count of R objects, then try to decrement them (which R doesn't or didn't really support), leaving those objects appearing to be multiply-referenced, and therefore forcing unnecessary copies as we see in plain R?


Does anyone have good links to documentation on quasiquotation in Haskell? Last time I looked I struggled to find a good summary.


I've been teaching myself Template Haskell recently, and found https://ocharles.org.uk/blog/guest-posts/2014-12-22-template... to be a really understandable explanation. It specifically doesn't cover quasiquotation, but it has many links to things which do.


This is a fairly nice quasiquotation tutorial: https://www.fpcomplete.com/user/marcin/quasiquotation-101


While this really has nothing to do with programming R, this is a neat way to use Haskell for some statistics/data science tasks whilst making use of all the work that has gone into R over the years.

I personally don't see this as a way of using R, rather a way of using Haskell.


I don't know almost anything about Haskell (tried to learn it some years ago but never pass form the basics).

Yet from what I'm seeing we are only calling R functions from Haskell. I'm probably missing something on the explanation, but how does this speed up R?


Often what is slow in an R program is not the core analysis function you want to call, but everything you have to do before and after you call that function, IE turning your raw data into a data structure containing the relevant data to feed into the function and taking the output of the function and munging it into whatever output format you actually need. This should help speed up those bits quite a lot.


I see, thanks. I my own code I suspect that it's the for loops I can't vectorise that are making R taking so long in some operations, so this is where this project would help.

Must give it a try then.


In R its very easy to simply create a function in C++ or Fortran, then call that. Plus there's plenty of functions already that you can call instead of using for loops in R (which really aren't idiomatic).


You can use python, call out to R libraries with Rpy2 and speed up loops to faster than Julia and around native Speed with numba


I don't know about speed, but with this package you can have Haskell types at your disposal.

If you use them properly, you would not be able to compare p values of experiments directly (a common mistake).


I'm an astrophysicist, we don't really trust p-values. But it could be handy in some other areas.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: