Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: I Wrote a Book on Data Analysis with Rust Notebooks (datacrayon.com)
105 points by DataCrayon on Dec 5, 2020 | hide | past | favorite | 28 comments



I will have to invest in better hosting!

edit: looks like nothing I do will bring the WordPress instance back to life :) in the meantime: https://datacrayon.com/posts/programming/rust-notebooks/pref...


Totally unrelated since I can't access the website, but one thing that I'd like to see in a typing system used in data analysis is to be able to see the dimension of the data structures used. For instance, to be able to tell at compile time if a matrix multiplication is going to crash due to dimension mismatch.

So far, in Python at least, at best I can tell that it's a float array, without specifying the dimension (for example 2d for a matrix) or better, specifying its dimension is (n,p) where n and p are both type variables.


I believe that catching that at compile-time in Rust would have to wait for RFC 2000 to be implemented: https://rust-lang.github.io/rfcs/2000-const-generics.html

This would allow generics that depend on const values (i.e. matrix dimensions in this context). At present generics can only depend on types.


I love Rust for systems dev, and I truly can't wait to read the article, because I can't imagine ever wanting to use it for exploratory data analysis, one-off stuff, or notebooks.

Why deal with the borrow checker and compilation just to plot something in a notebook? Python is kind of ugly IMO, but it seems much quicker for prototyping.


Main reason is - you're already proficient with Rust and enjoy using it.

I don't think there's much borrow checker issues in data analysis code. You `.clone()` everywhere, and it's still going to fly fast.

Some benefits:

* static typing niceness: language server, typo catching, some types preventing human error here and there etc. * expressing some ideas in a much more robust way * best in class tooling like package manager, rustdoc * hitting some unexpected not strictly data-analysis requirement is never a blocker * you can reuse your code, compile to WASM and embed in a page * performance of auxiliary code is never an issue * unlike Python, you can send it to another person and expect it to work just like it did for you * if it turns out that the "one-off" has to become more complex and serious, you don't have to throw away everything and start rewriting it "in more serious way"


In my opinion, we need a statically typed contender to R, Python and Julia.

The use case is quite simple. Most data-oriented applications where you need linear algebra, probability theory, statistics do contain a significant amount of data pre-processing and business logic once extended to enter production. Here static typing is advantageous. Keeping the whole codebase in the same language is a significant advantage.

Furthemore, some statically typed languages do have typing features that are advantageous for data analysis. Think F# type providers or type systems where e.g. the shape of arrays is guaranteed to match at compile time.

The question is whether Rust is a good language for this, or we need something more like OCaml or F#. All these languages pop up in job offers for quantitative analysis, insurance risk modeling, etc. So there seems to be a demand.


Are you sure it's static typing that you want? I find people often conflate having a powerful type system with static typing. Julia has a very powerful and expressive dynamic type system and I really appreciate it.

Static typing is significantly more restrictive, which I think is harmful for the interactive workflows which are so prevalent in exploratory data oriented applications.

> Furthemore, some statically typed languages do have typing features that are advantageous for data analysis. Think F# type providers or type systems where e.g. the shape of arrays is guaranteed to match at compile time.

For instance, julia can do this with it's type system too. There's nothing here requiring static typing. See StaticArrays.jl[1] which has a SArray type (static size, static contents), a MArray type (static size, mutable contents) and a SizedArray type (is a statically sized wrapper for a regular array (rather than Tuple) making it more appropriate for large arrays)

[1] https://github.com/JuliaArrays/StaticArrays.jl


> we need a statically typed contender to R, Python and Julia.

I'm not sure we really need a "contender". We need a complement. I personally use D. I love the language, I can compile my functions into a static library, and I can have R load them the same as it loads a library of C functions. It's worked well for me for years. Should work for plenty of languages, whether that's Rust, Nim, or whatever.

There are at least two reasons I've moved away from the idea of a replacement language:

- It's pretty easy to do if you already have experience with a compiled language. The benefit of a replacement language is small.

- You can move to the new language slowly, to the degree that you want, without giving up or rewriting any already working code. Others can easily use your code without having to move to that language.


> In my opinion, we need a statically typed contender to R, Python and Julia.

Nim is getting there.


Nim has lots of cool things and could probably but it really needs to chose one domain to focus on (and data analysis could be it, even though it sounds like a tough challenge given how much love Julia is getting as the new cool kid in that place). Today it looks like it's chasing too many games at a time with limited success.


Yup! I hope it gets more love from the scientific community. https://forum.nim-lang.org/t/5242

Also worth mention would be F#: https://github.com/SciSharp


The core team has plans to add optional static typing to julia..won't be there for a year or two though probably


Yes, I wanted to try Rust for this year's Advent of Code and I gave up before I figured out how to read the input file.


If you give it a try again, https://doc.rust-lang.org/std/fs/fn.read_to_string.html is probably whet you want.


I wonder if this type signature may be too intimidating for newcomers:

Result<(), Box<dyn std::error::Error + 'static>>

It's probably huge effort to somehow target examples both for newbies and advanced users. I don't know what's good solution and if it's a real problem at all.


Yes, normally I would hide that. Not sure why it wasn't here. I'll make a ticket :) Thanks!


Rust is a language that you need to sit down and read the book. It's not for casual exploration in the same way that JS/Py are.

People in the Rust ecosystem will tell you to either unwrap the error or use https://docs.rs/anyhow/1.0.34/anyhow/ to completely ignore it. Great thing - it's simple to ctrl-f and turn your one-off code into production code with powerful errorhandling.


It's a language where it's worthwhile to sit down and read the book, but it aspires to be fairly accessible too via zero-cost abstractions and integrated tooling. I think it's been moderately successful in this goal all things considered, but there's always more progress to be made.


Thank you for writing this! I've been meaning to get a Jupyter notebook set up with Rust.

By the way, I tried to sign up for the newsletter – since you also offered a 10% discount there, and was going to buy the book using that – but it returns with a 404 when processing via https://newsletter.datacrayon.com/subscription/form.


I'm glad you're interested! It links through to 2 videos for setting up the environment if you need it too.

Sorry about the newsletter - I did not expect anything near this kind of traffic, and it's hit everything :)

I've made a temporary 15% discount coupon "hnhug10" for anyone who wants/needs one!

In the meantime I need to start thinking about more resilient hosting...


Thanks for offering the coupon to the HN community :) And of course thank you for putting this together for all of us to read and learn from. I took advantage and ordered this book, along with a few others on the site.

I've wanted to add Data Analysis along with ML to my utility belt. With your books and Andrew Ng's class coming up this Monday, I have a fun Winter of learning ahead of me!


"This site can’t be reached." Hug of death?


Definitely hugged to death - I wasn't prepared for that amount of traffic!


I run my blog behind Cloudflare purely so that if I get a surprise burst of traffic from something like Hacker News I can survive the spike. Worth considering!


same


Are "Rust Notebooks" jupyter notebooks running a rust kernel, or have I missed something here?


I was curious about this as well, and found an article from OP about how to set one up: https://datacrayon.com/posts/programming/rust-notebooks/setu...

It's a Jupyter Notebook running Rust via an extension


Please, release a mobi or epub version :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: