Hacker News new | comments | show | ask | jobs | submit login

To me the problem with R isn't performance problems, which I've never run into myself, but rather the complicated and confusing semantics of its data types.

R's aggregate data types are: vector, matrix, array, dataframe, and list. The semantics of these types and the relationships between them are extremely confusing. I wish I had gathered examples of this so I could be more specific, but I have basically come to the conclusion that I will never get familiar enough with them to do any better than random guessing until it works right. And I've written somewhat in-depth analyses in R.




I may be wrong but I think that:

list => are basically hash, or an array that can have mixed objects inside

vector, matrix, array => are all the same thing. They are what in most computer languages are called arrays, and can have only one type. The difference between those three is just the number of dimensions (vector:1, matrix:2, array:3+).

dataframe I will concede is a little more complex, and I still have some problem with it. But I basically think of it as a table, where a row represents a value (say temperature) and the column different measuremnts. So, for example:

rows=> temperature, humidity, hours of light, peak UV columns=> Day1, day2, day3, day4, ...

Hope that helps.


Lists are hashes on acid. The default return value of indexing in is NULL. That's a weird semantic for a list, but maybe not so much a hash. The weird part is that if you explicitly assign null to an element of the list, it deletes the element. That's particularly weird because the list has knows all the elements in it, so it's not like it can't tell the difference between a value that has been set to NULL and a value that has never been set. See http://gist.github.com/578110 for a little transcript.


Vectors are not the same as 1d matrices/arrays because a vector does not have direction (i.e. it's not a row vector or a column vector, but will work as either depending on circumstance).

  > all.equal(1:10, matrix(1:10, ncol = 1))
  [1] "Attributes: < target is NULL, current is list >"
  [2] "target is numeric, current is matrix"   
  > all.equal(matrix(1:10, ncol = 1), array(1:10, c(10, 1)))
  [1] TRUE


The semantics are fairly straightforward once you figure out the two main differences.

Vectors, matrices and arrays are atomic/homogeneous objects, and only differ in their dimensionality. Vectors are 1d, matrices 2d, and arrays are 3d or higher. Calling a 2d homogenous structure is a matrix is just a convention: a matrix is identical to a 2d array in every important way.

Lists and data frames are heterogenous/recursive. Lists are 1d, and data frames are (essentially) 2d (each row is homogenous, but each column can be a different type).


I would suggest checking out Octave, a Matlab compatible language. It doesn't have the rich libraries of statistical procedures like R, but it far easier to use to build your own matrix based algorithms.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: