Hacker News new | past | comments | ask | show | jobs | submit login

  > mm <- matrix(rnorm(1000000), 1000, 1000)
  > system.time(eigen(mm))
     user  system elapsed 
     5.26    0.00    5.25 

   IPy [1] >>> xx = np.random.rand(1000000).reshape(1000, 1000)

   IPy [2] >>> %timeit(np.linalg.eig(xx))
  1 loops, best of 3: 1.28 s per loop
But where R really stinks is memory access:

  > system.time(for(x in 1:1000) for(y in 1:1000) mm[x, y] <- 1)
     user  system elapsed 
     1.09    0.00    1.11 

   IPy [7] >>> def do():
          ...:     for x in range(1000):
          ...:         for y in range(1000):
          ...:             xx[x, y] = 1
   IPy [10] >>> %timeit do()
  10 loops, best of 3: 134 ms per loop
Growing lists in R is even worse with all the append nonsense. Exponential time slower.

That's why you never ever grow lists with R. do.call('rbind',...) or even better data.table::rbindlist(). You can't blame R for being slow if you don't know how to write fast R code.

obviously I use do.call all day long because R is my primary weapon, but even if I say so myself, a happy R user, Python with Numpy is faster. I would invite you to show me a single instance where R is faster at bog-standard memory access, than Numpy. My example demonstrates exactly this. Can there be anything simpler than a matrix access? And if that's (8x) slower, everything built on this fundamental building block of all computing (accessing memory) will be slow too. It's R's primary weakness and everybody knows it. Let me make it abundantly clear:

  > xx <- rep(0, 100000000)
  > system.time(xx[] <- 1)
     user  system elapsed 
    4.890   1.080   5.977 

  In [1]: import numpy as np
  In [2]: xx = np.zeros(100000000)                                               
  In [3]: %timeit xx[:] = 1
  1 loops, best of 3: 535 ms per loop
If the very basics, namely changing stuff in memory, is so much slower, then the entire edifice built on it will be slower too, no matter how much you mess around with do.call. And to address the issue of (slow, but quickly expandable) Python lists, recall that all of data science in Python is built on Numpy so the above comparisons are fair.

Applications are open for YC Winter 2024

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact