Hacker News new | past | comments | ask | show | jobs | submit login

R is cursed beyond reason, but traditional software engineers are sleeping on it, IMO. It's very easy for quantitative people that are not software developers to get something done quick. The downside is exactly what you described, most projects are not just the model, they eventually tend to incorporate generic data wrangling, UI/web code, etc, and a general purpose language tends to work better overall.

I have a similar anecdote: I was brought in on a project where a group of terrorists implemented a solution for a TSP-like problem directly in R. We eventually replaced that thing with OR-Tools.




+1. I am a software engineer but I double majored in statistics and wrote a lot of R in undergrad. The library ecosystem is incredible. Essentially any technique in statistics has a well-documented R package that is one library() call away.

I keep wondering if I should learn the Python data science ecosystem at some point but it just seems like a waste of time. One of my personal projects is written in Python but calls into R for statistics/plotting.

The language itself however, incredibly cursed.


The same thing can be said about Python. Python itself is not such a great language, especially in terms of performance. However they managed to have every simple package in the world of analytics and ML added to the Python ecosystem, so it is impossible to stop using it.


There's definitely a lot of similarities.

I think of MATLAB, Mathematica, R, and Python together as "practitioner's languages". These are languages that are designed from the core to be highly productive to a specific kind of technical worker (in the sense of developer velocity).

MATLAB for engineering. Mathematica for mathematics. R for statistics. Python for software engineering.

You could also say "Python for ML", of course, and that would be true, but Python is also used for general purpose programming much more than the other three. I think that "Python for software engineering" is more correct.

I think that each of the languages is shaped to the way that its users think about the problems that they want to solve with it.

MATLAB is shaped around linear algebra. Mathematica is a term-rewriting system. R has lots of magic around data and scopes to make the surface syntax for stats nice. Python is shaped like a traditional OOP language but with a pseudocode-like syntax and hooks so that libraries can act magically.

This is kinda half-baked, I'm trying to express this for the first time. But essentially I think that Python is what you get when you have real programmers (^TM) try to create the programming equivalent of something like MATLAB, Mathematica, and R.

And so of course ML, which is a field dominated by real programmers, adopted Python in order to create their ecosystem.


Generally agree, though Mathematica is for physics, I rarely saw it used by mathematicians, especially compared with MATLAB, which was closer to the programming language of choice. That or Julia were both more common.

> I think that Python is what you get when you have real programmers (^TM) try to create the programming equivalent of something like MATLAB, Mathematica, and R.

I think this is a better description of Julia tbh, at least relative to MATLAB.


I believe the advantage of Python for the domain of ML is just how easy it is to take tested C and Fortran code and add it as a bona-fide software engineering package. In other words, it is a language that allows standard scientific software to be consumed according to the latest software development trends.


I guess I should also mention Inform7 for interactive fiction as another example. That is an extremely weird language. It's shaped to the minds of fiction authors, who are very, very different from software engineers.


> R is cursed beyond reason

Why would you say that?


One concrete example: R has 5 distinct, actively maintained class systems, at least 3 of which are somewhat commonly used for new projects. I.e., there are 3+ reasonable ways to declare a class, and the class will have different semantics for object access, method calling and dispatch, etc. depending on which one you choose.

Another: R can’t losslessly represent JSON because 1 and [1] are identical. That’s a float (well, float vector) literal by the way, the corresponding int literal is 1L, though ints are very prone to being silently converted to float anyway.


The R development community is (was?) consciously focused on single operator managing their stuff on their workstation.

Considerations like repeatable procedures, reliable package heirarchies, etc. were clearly and more or less politely Not Interesting. I spent several years with one of my tasks being an attempt to get the R package universe into Gentoo, and later to RPM packages.

I wouldn't say the R devel community was rude about it, but the systems-administration view of how to maintain the language was just not on their radar.

At the time I was trying to provide a reliable taxonomy of packages to a set of research machines at a good sized university. Eventually, I gave up on any solution that involved system package managers, or repeatability. :)

So if you're a researcher driving your own train, R is freakin' FANTASTIC. If you're the SA attempting to let that researchers' department neighbors do the same thing on their workstations, anticipate fun.


I think this SA perspective is outdated or perhaps never adequately investigated. Packages like renv solve for these issues, and they work great.


As someone who uses Python, R, Go, Rust, Fortran, and Java...

I would never write a full stack application in R. Terrible maintainability.


I would have said this a year ago but R is a language for statisticians and not software engineers. It took me forever to understand this.

Statistical Rethinking by McElreath is really what finally got me to see the value of R.

You can find the python versions of the class and they are certainly not better.

A full application in R really makes absolutely no sense.


I've wrapped R to python before. That was okay, a bit stilted but still could take to production if absolutely necessary.

You're 100% right that R is great for data scientists (my background) for frontier level academic implementations as well as toy/simple models. It's generally a poor runtime for computation and suffers from much of the same issues as Python for data quality and typing. Python is better for battle-hardened type stuff, has better debugging tools for certain.

R _can_ be done well, but the juice isn't worth the squeeze typically.


as a maniac who has written a full stack application in R: I agree. It is easy to get something out of the door quickly for the average R user, but maintenance will show you quickly how brittle everything is.


Hello fellow maniac!

Glad to hear the agreement. Little small projects are so fun. Stadium-sized pools full of spaghetti less so!


I'm in this picture, and I'm begging my collaborators to move to Julia or Python.


in addition to neighbor, there are multiple standards for everything, even core language features like objects, which makes reasoning about code difficult. dplyr and friends, which make the aforementioned data analysis easier, require interacting with one of the least consistent metaprogramming systems I've ever dealt with.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: