Hacker News new | past | comments | ask | show | jobs | submit login

I have some questions that are not answered by the homepage.

1) How does this work with function parameters that are intended to be captured unevaluated with substitute()? Do you type the input as "any" and document separately that the parameter is kept "unevaluated" as a symbol/name or call?

2) How does this work with existing untyped R code? Does it at least include types for the standard library (or some subset thereof?)

3) Is there any type inference, or does it require explicit type annotation everywhere?

4) How do you propose to handle NA (which can appear "within" any typed vector)? Does the compiler support refinement types? If not, how does checking for and preventing nullability work, when checking for NA values requires a runtime check?

5) How do data frames work? Are they typed like structs?

6) Which object systems does it support, if any? S3, S4, Reference Classes, or the 3rd-party R6?

As much as I like static types, I feel like R is maybe the language where I need or want them the _least_. How often do you really run into a situation where you pass a character vector to a function that requires a numeric vector and it crashes your program?

99% of the time what you really want is known-valid data frames for data processing, and statically-sized arrays for math stuff.






> As much as I like static types, I feel like R is maybe the language where I need or want them the _least_.

I really disagree with this.

I think one of the whole reason there is a whole Tidyverse ecosystem that the behavior of (some) R code is unintuitive in a way that adding typing would absolutely improve.

It seems like you're deeply familiar with the R ecosystem, but as a user what I want is a safe subset of R that I can use.

> How often do you really run into a situation where you pass a character vector to a function that requires a numeric vector and it crashes your program?

In R the more likely situation is that you pass in the wrong typed thing and it silently continues with very unexpected values being passed, causing trouble or errors much later in the program. Which is very much a problem that typing helps with.


> In R the more likely situation is that you pass in the wrong typed thing and it silently continues with very unexpected values being passed, causing trouble or errors much later in the program. Which is very much a problem that typing helps with.

Can you name one practical example of this happening as a result of passing in a vector of the wrong class()/mode()? Not a data frame with the wrong column types, but an actual standalone vector. Can you name an example in the Tidyverse ecosystem that specifically improves on the type-safety ("class/mode-safety") of the standard library? I can't, but maybe that's just because it's been too long since I did anything serious with the language.

I can definitely think of complicated interfaces where you can silently get strange results by passing in the wrong thing. sweep() and apply() are obvious examples, where you can accidentally swap the argument order and silently get a nonsensical result. But that's a matter of array shape, not of type. Try passing an argument of the wrong type (again, where "type" in this case means the class or mode of the vector) to sweep() or apply(), and watch what happens: you get an error message informing you that you passed a value of the wrong type. At worst, you get an obtuse error message informing you that you passed a value of the wrong type, but bubbled up from some internal code. But you get an error all the same.

R is actually very strongly-typed, and abstracts over some details that would otherwise cut into that type-strength. For example, R doesn't have the Numpy problem of exposing different physical storage sizes for integers and floats! It just has abstract numeric arrays, backed by whatever the hell storage type the R language implementers decided to back them with, with no opportunity for the user to accidentally mix things up and lose precision, overflow, or crash on contact with some pre-compiled Numba function.

I maintain that a much, much more pertinent problem is that array shape is not part of the type system, and moreover that a lot of R code is (by design) highly polymorphic with respect to array shape, precisely because there is no such thing as a "scalar" number or string but we still want to let people use numbers and strings in scalar-like fashion.

NULL I think falls into this category as well. NULL in R is a bit like nil in Lua or undefined in Javascript, in that it has a kind of dual function as a "value that is not any other value" and a "non-value that cannot be inserted into a collection, instead deleting whatever was previously there". But when is the last time someone got a NULL and a numeric vector mixed up, and wasn't able to figure out what happened? Is all the complexity of a static compiler really necessary to catch that relatively rare mistake?

Maybe the one exception here is the factor class. But there's no mention of factors here, and (as with array shape), validating factor levels is probably more important as validating that the thing is a factor in the first place, as opposed to character.

The NA checking proposed is another story. Now that would be useful, but so would checking things like min/max ranges, the presence of certain columns in a data frame, etc. For example Python has its data frame input validation framework Pandera that offers at least some of these guarantees at the type level.

As for classes, I noticed that they implement what looks like a nice concise syntax for creating S3 class objects with structure(). That's great, but you could have just written a helper library to do that.

Anyway, here's a project where someone designed a whole language and wrote a compiler for it, and I'm just one cantankerous former R user doubting whether that project is ever going to be useful. If this is just a hobby project to scratch someone's itch: ignore me. But if this is intended to be a serious thing for serious use in production, then I'd encourage the creators to reconsider how they portray their value proposition, and to maybe reconsider whether the goal of their project aligns with the needs and desires of actual R users in industry, of whom there are still many, but definitely not as many as there used to be.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: