Totally unrelated since I can't access the website, but one thing that I'd like to see in a typing system used in data analysis is to be able to see the dimension of the data structures used. For instance, to be able to tell at compile time if a matrix multiplication is going to crash due to dimension mismatch.
So far, in Python at least, at best I can tell that it's a float array, without specifying the dimension (for example 2d for a matrix) or better, specifying its dimension is (n,p) where n and p are both type variables.
I love Rust for systems dev, and I truly can't wait to read the article, because I can't imagine ever wanting to use it for exploratory data analysis, one-off stuff, or notebooks.
Why deal with the borrow checker and compilation just to plot something in a notebook? Python is kind of ugly IMO, but it seems much quicker for prototyping.
Main reason is - you're already proficient with Rust and enjoy using it.
I don't think there's much borrow checker issues in data analysis code. You `.clone()` everywhere, and it's still going to fly fast.
Some benefits:
* static typing niceness: language server, typo catching, some types preventing human error here and there etc.
* expressing some ideas in a much more robust way
* best in class tooling like package manager, rustdoc
* hitting some unexpected not strictly data-analysis requirement is never a blocker
* you can reuse your code, compile to WASM and embed in a page
* performance of auxiliary code is never an issue
* unlike Python, you can send it to another person and expect it to work just like it did for you
* if it turns out that the "one-off" has to become more complex and serious, you don't have to throw away everything and start rewriting it "in more serious way"
In my opinion, we need a statically typed contender to R, Python and Julia.
The use case is quite simple. Most data-oriented applications where you need linear algebra, probability theory, statistics do contain a significant amount of data pre-processing and business logic once extended to enter production. Here static typing is advantageous. Keeping the whole codebase in the same language is a significant advantage.
Furthemore, some statically typed languages do have typing features that are advantageous for data analysis. Think F# type providers or type systems where e.g. the shape of arrays is guaranteed to match at compile time.
The question is whether Rust is a good language for this, or we need something more like OCaml or F#. All these languages pop up in job offers for quantitative analysis, insurance risk modeling, etc. So there seems to be a demand.
Are you sure it's static typing that you want? I find people often conflate having a powerful type system with static typing. Julia has a very powerful and expressive dynamic type system and I really appreciate it.
Static typing is significantly more restrictive, which I think is harmful for the interactive workflows which are so prevalent in exploratory data oriented applications.
> Furthemore, some statically typed languages do have typing features that are advantageous for data analysis. Think F# type providers or type systems where e.g. the shape of arrays is guaranteed to match at compile time.
For instance, julia can do this with it's type system too. There's nothing here requiring static typing. See StaticArrays.jl[1] which has a SArray type (static size, static contents), a MArray type (static size, mutable contents) and a SizedArray type (is a statically sized wrapper for a regular array (rather than Tuple) making it more appropriate for large arrays)
> we need a statically typed contender to R, Python and Julia.
I'm not sure we really need a "contender". We need a complement. I personally use D. I love the language, I can compile my functions into a static library, and I can have R load them the same as it loads a library of C functions. It's worked well for me for years. Should work for plenty of languages, whether that's Rust, Nim, or whatever.
There are at least two reasons I've moved away from the idea of a replacement language:
- It's pretty easy to do if you already have experience with a compiled language. The benefit of a replacement language is small.
- You can move to the new language slowly, to the degree that you want, without giving up or rewriting any already working code. Others can easily use your code without having to move to that language.
Nim has lots of cool things and could probably but it really needs to chose one domain to focus on (and data analysis could be it, even though it sounds like a tough challenge given how much love Julia is getting as the new cool kid in that place). Today it looks like it's chasing too many games at a time with limited success.
I wonder if this type signature may be too intimidating for newcomers:
Result<(), Box<dyn std::error::Error + 'static>>
It's probably huge effort to somehow target examples both for newbies and advanced users. I don't know what's good solution and if it's a real problem at all.
Rust is a language that you need to sit down and read the book. It's not for casual exploration in the same way that JS/Py are.
People in the Rust ecosystem will tell you to either unwrap the error or use https://docs.rs/anyhow/1.0.34/anyhow/ to completely ignore it. Great thing - it's simple to ctrl-f and turn your one-off code into production code with powerful errorhandling.
It's a language where it's worthwhile to sit down and read the book, but it aspires to be fairly accessible too via zero-cost abstractions and integrated tooling. I think it's been moderately successful in this goal all things considered, but there's always more progress to be made.
Thank you for writing this! I've been meaning to get a Jupyter notebook set up with Rust.
By the way, I tried to sign up for the newsletter – since you also offered a 10% discount there, and was going to buy the book using that – but it returns with a 404 when processing via https://newsletter.datacrayon.com/subscription/form.
Thanks for offering the coupon to the HN community :) And of course thank you for putting this together for all of us to read and learn from. I took advantage and ordered this book, along with a few others on the site.
I've wanted to add Data Analysis along with ML to my utility belt. With your books and Andrew Ng's class coming up this Monday, I have a fun Winter of learning ahead of me!
I run my blog behind Cloudflare purely so that if I get a surprise burst of traffic from something like Hacker News I can survive the spike. Worth considering!
edit: looks like nothing I do will bring the WordPress instance back to life :) in the meantime: https://datacrayon.com/posts/programming/rust-notebooks/pref...