I have some questions that are not answered by the homepage. 1) How does this wo...

fn-mote · 2024-09-19T23:26:33 1726788393

> As much as I like static types, I feel like R is maybe the language where I need or want them the _least_.

I really disagree with this.

I think one of the whole reason there is a whole Tidyverse ecosystem that the behavior of (some) R code is unintuitive in a way that adding typing would absolutely improve.

It seems like you're deeply familiar with the R ecosystem, but as a user what I want is a safe subset of R that I can use.

> How often do you really run into a situation where you pass a character vector to a function that requires a numeric vector and it crashes your program?

In R the more likely situation is that you pass in the wrong typed thing and it silently continues with very unexpected values being passed, causing trouble or errors much later in the program. Which is very much a problem that typing helps with.

nerdponx · 2024-09-20T02:16:39 1726798599

> In R the more likely situation is that you pass in the wrong typed thing and it silently continues with very unexpected values being passed, causing trouble or errors much later in the program. Which is very much a problem that typing helps with.

Can you name one practical example of this happening as a result of passing in a vector of the wrong class()/mode()? Not a data frame with the wrong column types, but an actual standalone vector. Can you name an example in the Tidyverse ecosystem that specifically improves on the type-safety ("class/mode-safety") of the standard library? I can't, but maybe that's just because it's been too long since I did anything serious with the language.

I can definitely think of complicated interfaces where you can silently get strange results by passing in the wrong thing. sweep() and apply() are obvious examples, where you can accidentally swap the argument order and silently get a nonsensical result. But that's a matter of array shape, not of type. Try passing an argument of the wrong type (again, where "type" in this case means the class or mode of the vector) to sweep() or apply(), and watch what happens: you get an error message informing you that you passed a value of the wrong type. At worst, you get an obtuse error message informing you that you passed a value of the wrong type, but bubbled up from some internal code. But you get an error all the same.

R is actually very strongly-typed, and abstracts over some details that would otherwise cut into that type-strength. For example, R doesn't have the Numpy problem of exposing different physical storage sizes for integers and floats! It just has abstract numeric arrays, backed by whatever the hell storage type the R language implementers decided to back them with, with no opportunity for the user to accidentally mix things up and lose precision, overflow, or crash on contact with some pre-compiled Numba function.

I maintain that a much, much more pertinent problem is that array shape is not part of the type system, and moreover that a lot of R code is (by design) highly polymorphic with respect to array shape, precisely because there is no such thing as a "scalar" number or string but we still want to let people use numbers and strings in scalar-like fashion.

NULL I think falls into this category as well. NULL in R is a bit like nil in Lua or undefined in Javascript, in that it has a kind of dual function as a "value that is not any other value" and a "non-value that cannot be inserted into a collection, instead deleting whatever was previously there". But when is the last time someone got a NULL and a numeric vector mixed up, and wasn't able to figure out what happened? Is all the complexity of a static compiler really necessary to catch that relatively rare mistake?

Maybe the one exception here is the factor class. But there's no mention of factors here, and (as with array shape), validating factor levels is probably more important as validating that the thing is a factor in the first place, as opposed to character.

The NA checking proposed is another story. Now that would be useful, but so would checking things like min/max ranges, the presence of certain columns in a data frame, etc. For example Python has its data frame input validation framework Pandera that offers at least some of these guarantees at the type level.

As for classes, I noticed that they implement what looks like a nice concise syntax for creating S3 class objects with structure(). That's great, but you could have just written a helper library to do that.

Anyway, here's a project where someone designed a whole language and wrote a compiler for it, and I'm just one cantankerous former R user doubting whether that project is ever going to be useful. If this is just a hobby project to scratch someone's itch: ignore me. But if this is intended to be a serious thing for serious use in production, then I'd encourage the creators to reconsider how they portray their value proposition, and to maybe reconsider whether the goal of their project aligns with the needs and desires of actual R users in industry, of whom there are still many, but definitely not as many as there used to be.

karencarits · 2024-09-20T11:18:59 1726831139

You might sometimes end up with a vector of factors with numerical labels where I think you can get a surprise or two. E.g., that the factor is 2 but the factor level is 1

rscho · 2024-09-20T11:32:06 1726831926

Static types seem like a bad idea for most R use cases. Contracts, on the other hand would be absolutely stellar. A-la-SQLite style.

nerdponx · 2024-09-20T13:51:31 1726840291

Is there not a decent contracts framework for R yet?

rscho · 2024-09-20T17:04:23 1726851863

There is. But that's still a far cry from a fully integrated language feature.