Hacker News new | past | comments | ask | show | jobs | submit login

No, you are wrong. R is terrible, and especially so for non-professional programmers, and it is an absolute disaster for the applications where it routinely gets used, namely statistics for scientific applications. The reason is its strong tendency to fail silently (and, with RStudio, to frequently keep going even when it does fail.) As a result, people get garbage results without realizing, and if they're unlucky, these results are similar enough to real results that they get put somewhere important. Source: I'm a CS grad working with biologists; I've corrected errors in the R code of PhD'd statisticians, in "serious" contexts.

Scientific applications require things to fail hard and often, to aggressively fail whenever anything is potentially behaving incorrectly. R does the exact opposite of that in several different, pernicious ways. IMHO, Python is more dangerous than a scientific computing language should be, but at least it will stop when it hits an error. R has undoubtedly cost humanity millions of dollars in wasted research costs and caused untold confusion, from otherwise perfectly-performed studies reporting corrupted statistical results. The world would be a noticeably better place without it.

I simply cannot articulate my opinion about R without sounding grossly hyperbolic. I'm sad that HN, a place which is typically enlightened in the ways of the programming arts, is so confused what this article is on about. If we tolerate such blatantly hostile design in something as important as the language of scientific statistics, where do we expect to get?




Have you ever worked with other major statistical packages? Have you ever caught people doing data munging in Excel? R fails far less silently than the credible alternatives. Source: I've been around the academic block and seen many types of horrors.

It's unfortunate that you've gotten to a _terrible_ feeling about R without realizing that many of the 'silent' failures are easily configured away (some examples, https://github.com/hadley/strict). That R isn't noisy about things that CS majors might think it should be by default is, BTW, entirely appropriate. Many of what one might call 'silent' failure modes in R are for the express purposes of making exploratory data analysis easier... and that was one of the original purposes for R.


That's too bad, I wish R could be better and easier for these people, but I don't think it warrants your hyperbole. I can point to many of my own anecdotes where R has saved millions of dollars by empowering analysts to conduct data exploration and modeling that would have been vastly more complex undertakings using any other tool. They seem to handle the silent failures just fine (usually by double checking their results before presenting them). Poor rigor and coding practices in academia are practically a meme at this point. You really want to lay all of that at the feet of R? Suggest a code review step for their publishing process or use a different tool. R is certainly not perfect but the idea that "The world would be a noticeably better place without it" is silly.


R is fundamentally flawed. It tries to merge two highly conflicting goals: a productive analytics environment and a programming language.

To do the first really well means automating away many of the issues that would crop up in the second allowing R to 'just work'. Because of that nothing beats R for getting to an answer as fast as possible (not even Python) at the cost of making it more difficult to productionise a solution in pure R.

Given its huge popularity and free-nature the benefits clearly outweigh the costs by a large factor.


Not sure I buy this. R is a language with parser and interpreter. Parser spits out an AST and interpreter evaluates nodes according to rules. This is the same in every other sane language. AST is pretty much the same in every language. There is no reason R’s parser and language can’t be replaced with something sane.

And it really is insane and horrible.


And there is the mistake: production. Most people I know who use R don't care a whit about production. They run an analysis to answer hypotheses.


I agree that R shouldn't be used in production, but R is great for prototyping different analytical models before porting them over to Python or another language.


Same here and I think that’s exactly how its meant to be used.

Even so, if you want to use R as the production system, you shouldn’t implement the jumbled spaghetti code an iterative analysis involves just for your own sanity's sake. A rewrite is always required at which point hello Python


While I kind of want to agree with you, I just don't see a better alternative. Do you really want biochemists to have to deal with the horrors of C compilation? In production code I'm very glad my makefile tells clang to fail on absolutely anything, but is that the best we can do? Other commenters have pointed out ways to avoid dangerous things like integer division, but if you think R is hostile then please offer a tenable alternative. The only ones I can think of are python and Matlab, and both are even worse for the intended use.

Yes, R is not my preferred language for anything heavy-duty, but I would guess ~95% of R usage is on datasets small enough to open in excel, and that is where the language truly shines (aside from being fairly friendly to non-programmers).

So yes, there are some problems with R, but what are your proposed improvements? Because if I have to analyze a .csv quickly, I'm going for R most of the time.


Python3: Pandas ans Seaborn?

I have very quick flows for data processing: load data, make long form, add meta data as categories, plot many things with seaborn one-liners. I use Jupyter lab and treat it Like a full lab notebook, including headers, introduction, conclusion, discussion. Works very wel for me.


> I just don't see a better alternative

Julia?


The biggest Julia cheerleaders gave up years ago. That doesn't mean the language is dead but it's not a great sign for a niche lang


Julia is by far my favorite language; I used it all throughout grad school for my research. The problem is that it doesn't have enough of a network effect in industry. I've begrudgingly switched to Python for my day-to-day work.


Actually, what is the current status of Julia? Can anyone in the know share a bit on the perspectives as they stand?


They're in the run-up to their first stable release at the moment (I think they're aiming for August, but could be wrong about that). I can't speak for popularity, but development is certainly going strong.


The biggest Julia cheerleaders are using Julia.


uf, I’m a working mathematician and just reading julia’s documentarion makes me dizzy... I do not think it fits this use case.



Sure, thanks. I am always put off Julia because of the documentation.


That's ironic because I find Julia's documentation to be the second most clear documentation I've seen (after elixir). Notation wise, Julia is the most comfortably close to mathematics (APL is closer, but it's a write only language). I'm not a working mathematician, though i did graduate with a rather theory based math degree.


The documentation is fine but IMO written more for developers. We do need more mathematical-based introductions which introduce the right packages for working mathematicians. I am a working mathematician myself and find Julia to be the perfect language for it because its abstraction is on actions instead of on data representations which fits things like functional analysis extremely well.

Things that are OO based like C++ and Python are pretty bad at representing math because they put forward an idea of the actual representation (the object) as what matters, instead of the actions it performs (the function overloads). This may be good for some disciplines, but in a mathematical algorithm I really don't care what kind of matrix you gave me for `A`, I just want you to do the efficient `Ax=b` solve and have the action of the solver choose the appropriate method to abstract away the data. In Python you'd have to tell it to use the SciPy banded matrix solver, in Julia your generic ODE solver will automatically use the banded matrix solver when it's a banded matrix. This then allows for a composibility where the user overloads the primitive operations on their type, and your generic algorithm works on any data representation. This matches the workflow of math where an algorithm is proven on L2 functions, not on functions represented with column-wise indexing and ...


I've written a custom GF256 data type and used Julia's builtin matrix solves (note: required monkey patching in Julia <~ 0.6 because there were one and zero literals in the builtin solver) to do Reed Solomon erasure coding... It's glorious.


You're right and wrong: R is a disaster when you want to write programs as you would in a real programming language. R is an excellent choice for what it is used most of the time by these people whose education/training isn't related to programming: interactive analysis of data and (maybe) writing prototypes.


My brief encounter with R led me to the exact same conclusion - that the R culture does not value correctness. That is not a characteristic I value in a development culture.

Case in point, the bug I raised about TZ handling (which is also an example of silent failure):

https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16412


Silent failure and continuing to run on errors are common in interpreted languages. SAS has similar issues, most RDBMSs will continue to process queries after failures. It’s something you need to explicitly guard against.


Are you sure about "most RDBMSs"? With the exception of SQLite and older versions of MySQL, all the databases that I've used are strict and fail the query immediately on error, will generally prevent silently dropping or truncating data, etc.

I'm one of the original authors of Presto, a distributed SQL engine for analytics on big data. From the beginning, we've been careful to follow the SQL standard and do everything possible to either return the correct answer or fail the query. For example, an addition or sum aggregation on an integer will fail on overflow rather than silently wrapping.

Returning an incorrect answer or silently corrupting your data is the worst thing a database can do.


What I mean is that if you run in batch mode they’ll fail a query and happily run the next. Generally, depending on the client, you need to handle begin/commit/rollback blocks yourself. This is pretty common in scripting languages. Unlike, for example Java, where an unhandled error will terminate the process.


You haven't been working with scientists very long, have you? I'm guessing you're also only a very recent CS grad. You're criticizing a language based on the behavior of certain people who use the language, rather than criticizing the language itself.

For many years before you mounted your high horse, scientists were writing equally shitty code in Perl. When they've moved on from R, they'll write shitty code in some other language.


Completely disagree




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: