Hacker News new | past | comments | ask | show | jobs | submit login
Coloring in R's Blind Spot (zeileis.org)
46 points by Amorymeltzer on May 11, 2023 | hide | past | favorite | 22 comments



The article is quite useful, although the examples near the end are all sequential, not diverging. Maybe the blogger got busy with something else. Anyway, the below demonstrates the 2 recommended diverging palettes.

    n <- 100
    m <- matrix(seq(-1, 1, length.out=n), nrow=1)
    par(mfrow=c(1, 2))
    for (p in c("Purple-Green", "Blue-Red 3")) {
        image(m, col=hcl.colors(n+1, palette=p))
        mtext(p)
    }
In case it's of interest, I am putting two citations below. The first discusses colour schemes suitable for plotting various oceanographic quantities (or anything, really), and the second deals with an R package that provides these colours chemes.

1. Thyng, Kristen, Chad Greene, Robert Hetland, Heather Zimmerle, and Steven DiMarco. “True Colors of Oceanography: Guidelines for Effective and Accurate Colormap Selection.” Oceanography 29, no. 3 (September 1, 2016): 9–13. https://doi.org/10.5670/oceanog.2016.66.

2. Thyng, Kristen M. “The Importance of Colormaps.” Computing in Science Engineering 22, no. 5 (September 2020): 96–102. https://doi.org/10.1109/MCSE.2020.3006946. Thyng, Kristen, Clark Richards, and Ivan Krylov. “Cmocean: Beautiful Colour Maps for Oceanography,” May 6, 2019. https://CRAN.R-project.org/package=cmocean.


Thyng's splendid gradients are available in several formats at cpt-city: http://soliton.vm.bytemark.co.uk/pub/cpt-city/cmocean/index....


R gets a lot of hate, not least because it's quite annoying to parallelize at times, but I think it's a lovely free language with an enormous ecosystem and a lot of highly mathematically literate users, oft based at universities, and oft beholden unto no corporate interests. Arguably that's also R's greatest weakness too.

This is quite a nice modernisation of colour palettes in base R, which otherwise does feel a bit like it's from the 1990s. Colour palettes are particularly important for (not) biasing data representations -- a great paper about it is https://www.nature.com/articles/s41467-020-19160-7.


R is hard to parallelise compared to python?

lapply(list, DoSomething)

To parallelise to 16 cores , rewrite as

mclapply(list, DoSomething, mc.cores = 16)

What’s the equivalent in python ?


Probably something like:

    with multiprocessing.Pool(core_count) as p:    
        p.map(do_something, list)


The parallel code structure looks very different from the standard for loops in python.

So there is a lot of rewriting to get things to work on parallel in python compared to R.

In R you just replace lapply with mclapply.


But, many R tools are already vectorised, so your shift from lapply() to mclapply() is about as fair a comparison as claiming it's "just" a shift from python's builtin map() to pool.map(). Anybody can play this game, and it's not helpful. I've been using+teaching R now for nearly seven years and the number of times I've used lapply can be counted on one hand.

> just

https://justsimply.dev/


There is also this in the futureverse if you like for loop style code more:

  library(doFuture)
  plan(multisession)

  y <- foreach(x = 1:4, y = 1:10) %dofuture% {
    z <- x + y
    slow_sqrt(z)
  }
https://dofuture.futureverse.org/


I use sapply all the time to transform data all the time. It tends to be less code (no counter, no output initialisation ) and easier to follow if that style is familiar.


I am curious now, what do you use instead of lapply (or other *apply variants)?


> just

This isn't documentation or a guide or helping someone.

It's a friendly competition between languages so it gets to use the perspective of someone that's familiar with things.


Serialisation/deserialisation code will bite you unless you are very careful.


so, only twice as much code!


My gripes with R are that the language itself is a mess and I found it hard to reason about code. The ecosystem is what keeps people using it.


I agree and disagree.

R itself isn't a mess. Typical R code tends to be.

In theory, this reflects one of R's strengths, which is its ability for flexible metaprogramming. In practice, this becomes a weakness because people abuse it to introduce all sorts of inconsistent notation that makes zero sense without reading the documentation of the package even to understand 'syntax'.

From a language point of view, R is not too bad. The ecosystem is a mostly positive thing, but at the same time, it's the variety in the ecosystem that's creating this chaos around offerred packages and R code in the wild.


> R itself isn't a mess. Typical R code tends to be.

The same can be send about Perl - it is not hard to write readable code but there is code around which was written without regard to readability/maintainability or any consistency at all. Languages which are trying to force a single style (e. g. indentation as a part of the syntax like in Python or configurable format tool like 'go fmt') and have as little TMTOWTDI as possible are favored nowadays.


R is king for analysis of tabular data e.g. data exploration, model fitting, and visualizations where all the code can be contained in one .R file.

R not as good as other languages for production-level work though or for analyzing non-tabular data (e.g. streaming data, or data in unstructured formats).

I also find R packages (writing, installing, documenting) to be a better experience than the python packaging system. i.e., very little rewrite necessary to convert a bunch of R code into an R package.


Huh. Those are my gripes about python, and it would also die without the ecosystem.

Except that python also lies about it.

"You don't need curly braces, indents are enough"

"what do you do when your code is more than 80 chars long?"

"Oh, that's simple. Because Python is simple. You wrap it in a curly brace."

If you have a blank line between two parts of a function, why does this fail when you paste it into the interpreter? Solutions? Oh, that's simple. Just put four spaces on the blank line. Unless your linter doesn't like that, because it means you have a line with trailing spaces.

And I'm not even touching advanced crazyness, such as package management and versions.

Thanks for listening to me complain. I'm going through a difficult transition right now.


I'm definitely not defending Python. I think that language is a mess as well. There is no law of conservation of programming language derps, unfortunately.


I think people are annoyed about the idea of having to know R, Python and Julia. So we stick to our guns with Python even if Julia is possibly better. Heck, if I'm ever learning another language, it's Janet then Rust. Or Type/Javascript if life ever throws me thataway. To paraphrase: stop trying to make Julia happen.

Also the Julia community is lovely, but R folks are so smug about tiny things like (the admittedly very fine for its applications) ggplot2. Also: have you heard SAS or Stata users telling you they can run neural networks using their built-in matrix languages? Yes we can! Yes we can!


I'd rather stick to my R guns than my python pea-shooters, thank you very much.


Many years later still one of the best languages I’ve ever used for large scale data science applications.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: