Hacker News new | past | comments | ask | show | jobs | submit login
Python vs. Rust for Neural Networks (ngoldbaum.github.io)
290 points by rencire 35 days ago | hide | past | web | favorite | 142 comments

Nobody writing NN in Python, they are just describing it.

For NN or DL in general, the correctness doesn't really lie too much on the code quality level, like ownership Rust people love to talk about. It is more about Numeric stability under/overflow and such. Choice of programming language offers limited help here.

I don't think Rust has a killer app for ML/DL community to offer as of now, the focus is vastly different.

I've had a few Rust lovers come and mention this project to me recently. None of them had any data science or ML experience. None of them knew that Python is just used to define the high level architecture.

At the same time, comparatively tedious languages like Rust will never attract data science practitioners. They don't care about the kind of safety it brings, they don't care about improving performance in a component that's idle 99% of the time.

The bulk of the load in an DL workflow is CUDA code and sits on the GPU. Even the intermediate libraries like cublas would only see marginal-to-none benefits of being reimplemented in rust.

This is a cool project, but it has no chance to displace or even complement Python in the data science space.

I am a data scientist and I care. The time when you could just do proof of concepts or a PowerPoint presentation is long behind us. So now we have to start to take it into production, which means we get the exact same problems as SE has always had.

Iff Rust helps us take it into production we will use it.

But it’s a lot of land to cover to reach Pythons libraries so I’m not holding my breath.

That said, Pythons performance is slow even when shuffling to Numpy.

I must be missing something. Modern data science workloads involve fanning out data and code across dozens to hundreds of nodes.

The bottlenecks, in order, are: inter-node comms, gpu/compute, on-disk shuffling, serialisation, pipeline starvation, and finally the runtime.

Why worry about optimising the very top of the perf pyramid which will make the least difference? Why worry if you spent 1ms pushing data to numpy when that data just spent 2500ms on the wire? And why are you even pushing from python runtime to numpy instead of using arrow?

Not everyone operates at that scale, and not every data science workload is DNN based

I agree with your general point, however, but the role I'd hope for with Rust is not optimizing the top level, but replacing the mountains of C++ with something safer and equally performant.

But the title of this post is Python vs Rust, not C++ vs Rust. Maybe BLAS could be made safer but i don't think that's what's happening here

A big push for NN is to get them running in real time on local GPUs so we can make AI cars and other AI tech a reality. 2500ms could be life and death in many scenarios

Every small thing counts when you have big data which is exactly why you need performance everywhere, if Rust can help with that I don’t mind switching my team to that.

The problem are usually when you do novel feature engineering not the actual model training.

But I was a C++ dev before checking the assembly for performance optimization so I guess I have more wiggle room to see when things are not up to snuff. If I got a cent for every: -you are not better than the compiler writers you can’t improve this.

Especially from the Java folks. They simply don’t want to learn shit, which is fine if they just where not so quick with the lies/excuses when proven wrong.

This is just not true. The python runtime is not the bottleneck. DL frameworks are DSLs written on top of piles of highly optimized C++ code that is executed as independently from the python runtime as possible. Optimizing the python or swapping it out for some other language is not going to buy you anything except a ton of work. We can argue about using rust to implement the lower level ops instead of c++. That might be sensible though not from a perspective of performance.

In a "serving environment" where latency actually matters there are already a plethora of solutions for running models directly from a C++ binary, no python needed.

This is a solved problem and people trying to re-invent the wheel with "optimized" implementations are going to be disappointed when they realize their solution doesn't improve anything.

Yes. Let’s say you want certain features of a voice sample. You need to do that feature engineering every time before you send it to the model. Doesn’t it make sense to do it in C++ or Rust? This is currently already done. So if you already are starting to do parts of the feature engineering in Rust why not continue?

Yeah it’s not reasonable right now because Python has the best ecosystem. But that will not always be the case!

I can’t exactly tell what you mean but I think you’re confusing two levels of abstraction here. C++ (or rust) and python already work in harmony to make training efficient.

1. In tensorflow and similar frameworks the Python runtime is used to compose highly optimized operations to create a trainable graph. 2. C++ is used to implement those highly optimized ops. If you have some novel feature engineering and you need better throughput performance than a pure python op can give you, you’d implement the most general viable c++ (or rust) op and then use that wrapped op in python.

This is how large companies scale machine learning in general, though this applies to all ops not just feature engineering specific ones.

There is no way that Instagram is using a pure python image processing lib to prep images for their porn detection models. That would cost too much money and take way too much time. Instead they almost certainly wrap some c++ in some python and move on to more important things.

I know. That’s how we do it too. You don’t see any benefits in instead of Python wrappers + C++ just do Rust? Especially in handling large data like voice iff there was a good ecosystem and toolbox in place?

Maybe but then we’re no longer making an argument about performance, which is what I was responding to in your initial claim about “everything counts” and numpy shuffle being slow. That’s a straw man argument that has zero bearing on actual engineering decisions.

EDIT: clarification in first sentence

The python runtime is not the bottleneck.

This smells like an overgeneralization. Often things that aren’t a bottleneck in the context of the problems you’ve faced might at least be an unacceptable cost in the context of the 16.6 ms budget someone else is working within.

In what circumstance would one measure end to end time budget in training? What would that metric tell you? You don't care about latency, you care about throughput, which can be scaled nearly completely independently of the "wrapper language" for lack of a better term, in this case that's python.

It seems some commenters on this thread have not really thought through the lifecycle of a learned model and the tradeoffs existing frameworks exploit to make things fast _and_ easy to use. In training we care about throughput. Thats great because we can use a high level DSL to construct some graph that trains in a highly concurrent execution mode or on dedicated hardware. Using the high level DSL is what allows us to abstract away these details and still get good training throughput. Tradeoffs still bleed out of the abstraction (think batch size, network size, architecture etc have effect on how efficient certain hardware will be) but that is inevitable when you're moving from CPU to GPU to ASIC.

When you are done training and you want to use the model in a low latency environment you use a c/c++ binary to serve it. Latency matters there, so exploit the fact that you're no longer defining a model (no need for a fancy DSL) and just serve it from a very simple but highly optimized API.

Looks good. I’ve tried Numba and that was extremely limited.

Current project we can’t use GPUs for production so we can only use it for development. Not my call, but operations. They have a Kubernetes cluster and a take it or leave it attitude.

We did end up using C++ for somethings and Python for most. I’d feel comfortable with C++ or Rust alone if there was a great ecosystem for DS though.

I see a graph on ... logarithmic scale ? No unit ? I don't know what that benchmark means.

Good lord, hopefully latency isn't 2.5 seconds!

Latency in training literally does not matter. You care about throughput. In serving, where latency matters, most DL frameworks allow you to serve the model from a highly optimized C++ binary, no python needed.

The poster you are replying to is 100% correct.

quote is 'data spend 2500 ms on the wire'. That's not latency. For a nice 10GbE connection, that's optimistically 3 GB or so worth of data. Do you have 3GB of training data? Then it will spend 2500 ms on the wire to distribute to all of your nodes as part of startup.

I can’t even. How could you ever get 2500 msec on transit? That’s like circling the globe ten times.

Maybe a bunch of SSL cert exchanges through some very low bandwidth connections? ;)

Still, it's more likely a figure used for exaggeration, for effect.

You are a data scientist who seems to lack an understanding of how deep models are productionized..

I do so not unfrequently and I don't see how rust bindings would help me at all

or maybe, just maybe, you know nothing of optimizations, so you just go: impossible!

the optimization you are describing is premature - you don't need rust to productionize your models and in most cases you don't even need to be coding in a low-level language at all.

I don’t think I’ll change your mind.

You clearly have more experience than I do that can solve everything in Python instead of C++ and/or CUDA.

You win.

I mean, it'd be easier to change my mind if you had a single reason behind anything you've claimed.

I would love to understand why Rust would be more effective for productionizing ML models than the existing infrastructure written in Python/C++/CUDA.

I'm by no means a specialized data scientist, but I've done some (very surface-level) text crunching with both the TF/Numpy stack and Rust.

To me, the nice thing about switching to Rust for that kind of stuff was that it dramatically raised the bar of what I could do before reaching for those hyper-optimized descriptive libraries.

Want to calculate the levenshtein distances of the cross product of 100k strings? Sure, just load it into a Vec, find a levenshtein library on crates.io, and it'll probably be fast enough.

Could it be done in Python? Sure, but with Rust I didn't have to think about how to do it in a viable amount of time. Does that mean that Rust is going to take over the DS world? Probably not in the short term, Rust currently can't compete with Python's DS ecosystem.

But if I'm doing something that I don't know will fit into an existing Python mold (that I know about) then I'll strongly consider using it.

> But if I'm doing something that I don't know will fit into an existing Python mold (that I know about) then I'll strongly consider using it.

The thing is, for anything performance intensive and scientific, you're almost guaranteed to find a Python binding.

It has all these bindings because scientists are almost always writing either Python, C++, or Fortran (with a smattering of R or Octave on the side).

Want to do linear algebra? NumPy is always there, otherwise you can go lower level: https://docs.scipy.org/doc/scipy/reference/linalg.blas.html

Old-school optimization? https://www.roguewave.com/sites/rw/files/attachments/PyIMSLS...

Computer vision? https://pypi.org/project/opencv-python/

In fact, I'd basically argue against using most language-native implementations of algorithms where performance is at stake, because most implementations don't have all the algorithmic optimizations.

And they will cover a small part of what is found in research papers.

Python has the best ecosystem, but Rust was made by a competent team so we will root for it.

CUDA is an important part of the story.

I think the industry is moving to 'MLIR' solution (Yes, there is a Google project called exactly that, but I am referring to the general idea here), where the network is defined and trained in one place, then the weights are exported, delegated to optimized runtime to be executed.

If such trend furthers down, then there will be very little reason to replace Python as the glue layer here. Instead it will become that everything is training in Python -> exported to shared format -> executed in optimized runtime, kind of flow.

Rust's opportunity could be to replace C++ in this case. But do mind that this is also a competitive business, where the computation is pushed further down to the hardware implementations, like TPUv1 and T4 chips etc.

Personally I hope for alternatives because I don't find Python that nice to work with compared to other languages. It's not awful, but I really miss having type and nullability errors detected by the compiler and reported to me with meaningful error messages.

Also there are a few weird things with the Python workflow, like dealing with Python2 vs Python3, and generally having to run everything in the context of a virtual environment, and the fact that it doesn't seem to really have a single "right way" to deal with dependency management. It's a bit like the story with imports in Javascript: those problems were never really solved, and the popular solutions are just sort of bolted on to the language.

It's amazing how many great tools there are available in Python, but it does sometimes seem like it's an under-powered tool which has been hacked to serve bigger problems than it was intended for.

> like dealing with Python2 vs Python3, and generally having to run everything in the context of a virtual environment, and the fact that it doesn't seem to really have a single "right way" to deal with dependency management.

Not to mention, this problem seems to be getting worse, not better. People are moving off of python 2.7, which was kind of the de-facto LTS release of python... leaving (currently) no LTS version of python and no clear path for the community to establish a new LTS version that the community will still support — there are so many releases with so many breaking changes in python 3 within the last few years that there is seemingly no consensus and no way to compromise.

> It's amazing how many great tools there are available in Python, but it does sometimes seem like it's an under-powered tool which has been hacked to serve bigger problems than it was intended for.

This is becoming more and more clear with every release of python IMO. The language is evolving too quickly for it to be used on large developments, but it’s still being used for that.

We have an entire test framework and libraries for high performance embedded systems level testing which is written entirely in python. The amount of high speed operations (timing tests, measurements, etc) is very obviously overwhelming for the intention of the language and libraries, yet the company keeps pushing ahead with this stuff. In order to mitigate the issue, we are developing more high speed embedded systems to offload the work from the test framework and report measurements to the framework. I think it’s quickly becoming extremely expensive and will only become more expensive — the framework is extremely “pythonic” to the point of being unreadable, using a number of hacks to break through the python interpreter. Jobs much better allocated to readable C++ libraries are being implemented in unreadable python spaghetti code with no clear architecture - just quick-and-dirty whatever-it-takes python.

I love python but I think it’s a quick-and-dirty language for a reason. What python does well cannot be beat by other languages (for example, prototyping), but I think it is often misused, since people can get something up and running quickly and cleanly in python, but it eventually has diminishing (and even negative) returns.

Can you name breaking changes in between python 3.x versions?

Python 3.7 made async a reserved word, which broke the widely used celery package up until their most recent release.

Raising a StopIterationErorr in a generator in 3.7 now raises instead of ending the iteration, which broke several functions at my workplace.

3.8 has several further backwards incompatible changes incoming: https://docs.python.org/3.8/whatsnew/3.8.html#changes-in-pyt...

There are alternatives -- Julia and R both have bindings for Keras, which gets you Tensorflow, CNTK and Theano. (I think there are bindings for Tensorflow directly from R and Julia as well.) Once you have a trained model, it doesn't really matter from the production standpoint how you trained it.

Python should and will be replaced, but not at all for any of the reasons mentioned in this thread.

A good ML language is going to need smart and static typing. I am so tired of having to run a whole network just to figure out that there's a dimension mismatch because I forgot to take a transpose somewhere - there is essentially no reason that tensor shapes can't just be inferred and these errors caught pre-runtime.

Do you have an example of a tensor library that keep track of shapes and detect mismatches at compile time? I had the impression that even in static languages having tensors with the exact shape as a parameter would stress the compiler, forcing it to compile many versions of every function for every possible size combination, and the output of a function could very well have a non deterministic or multiple possible shapes (for example branching on runtime information). So they compromise and make only the dimensionality as a parameter, which would not catch your example either until the runtime bound checks.

If you explain a little more of what you mean, I might be able to respond more effectively.

> I had the impression that even in static languages having tensors with the exact shape as a parameter would stress the compiler, forcing it to compile many versions of every function for every possible size combination, and the output of a function could very well have a non deterministic or multiple possible shapes (for example branching on runtime information).

I was a bit lazy in my original comment - you're right. What I really think should be implemented (and is already starting to in Pytorch and a library named NamedTensor, albeit non-statically) is essentially having "typed axes."

For instance, if I had a sequence of locations in time, I could describe the tensor as:

(3 : DistanceAxis, 32 : TimeAxis, 32 : BatchAxis).

Sure, the number of dimensions could vary and you're right that, if so, the approach implied by my first comment would have a combinatorial explosion. But if I'm contracting a TimeAxis with a BatchAxis accidentally, that can be pretty easily caught before I even have to run the code. But in normal pytorch, such a contraction would succeed - and it would succeed silently.

You understood it correctly. Named dimensions is certainly a good idea even in dynamic languages as a way of documentation and runtime errors that actually make sense (instead of stuff like "expected shape (24, 128, 32) but got (32, 128, 24)"). I hope it catches on.

But combined with static checking, it could be very very powerful. Definitely agree re: the power even in dynamic languages, I use namedtensor for almost all my personal development now (less so in work because of interop issues)

There's a research language that supports compile-time checking of array dimensions: futhark [1]. It's an interesting language that compiles to CUDA or OpenCL. However it's probably not ready for production (not sure if there's even good linear algebra implementations yet). It does feature interesting optimizations to account for the possible size ranges of the arrays (the development blog is very instructive in that respect).

[1] https://futhark-lang.org/

Thanks, I'll look into it.

Julia for example does have an array library that does compile-time shape inference (StaticArrays), but it cannot scale for large arrays (over 100 elements) exactly because it gets too hard for the compiler to keep track, I'm definitely curious about possible solutions.

Normal static typing won't help you there. You would need some sort of dependent typing. For example, look at Idris.

Rust might not be it.

But AOT/JIT compiled languages that can naturally talk to the GPGPU, without 2nd language syndrome, like Julia, Swift, Java and .NET will certainly be more attractive to data science practitioners.

I can already envision those life science guys that migrate to VB.NET when they have outgrown their Excel/VBA code, to start playing with ML.NET.

Python already gets JIT compiled to CUDA[1] and there's an entire funded ecosystem built around python+gpgpu called RAPIDS[2] which is the future of the ML space by most indicators.

I don't see any other language even making a dent in the Python ecosystem without some kind of new killer feature that can't be quickly replicated in Python.

[1] https://numba.pydata.org [2] https://rapids.ai

You beat me to making this point.

I don't see anything in the article's python code that the numba's jit decorator cannot handle. When numba works (it's rapidly improving), it's seriously impressive.

For this particular case, you should be able to get really good performance without sacrificing readability.

Also Jax - https://github.com/google/jax

Swift and static type checking and compiler analysis might be that language and feature combo.

Thank you for the cite to rapids.ai, that looks extremely interesting :)

Your last paragraph is my nightmare

> comparatively tedious languages like Rust will never attract data science practitioners.

Well, fast.ai is using swift now.

... I think it's fair to say 'never say never'.

You're probably right, rust isn't really the sweet spot for this stuff, but its also a case that python has some down sides that are pretty severe, and well acknowledged.

Just because fast.ai has some investment in Swift does not mean that S4TF has attracted mind share. The bulk of fast.ai is still taught on Pytorch.

The parent comment literally said data science practitioners don't care about speed or safety because the GPU is where all the real work happens; that's false, I've provided an example of it being false from a respected party. What do you want me to say?

eh, I give up. Believe whatever you want to believe.

I'm not saying that no data science practitioners care about speed and safety, and its easy for me to come up with cases where they should/do care.

My point is that even fast.ai still views S4TF as somewhat niche, and that data science practitioners as a whole still don't care.

I am a practitioner, I care, but everyone on this thread seems to be distracted by the red herring of speed and efficiency.

None of these languages are going to be speedier in the GPU. But python has serious design problems that manifest itself when developing a large project.

You can only go so far with dynamic typing

I wouldn’t call Swift a tedious language.

With type inference, immutability, etc, Swift is far from tedious:



It’s not quite as nice as Python but it’s an enjoyable language.

> They don't care about the kind of safety it brings, they don't care about improving performance in a component that's idle 99% of the time.


> Because Swift for TensorFlow is the first serious effort I’ve seen to incorporate differentiable programming deep in to the heart of a widely used language that is designed from the ground up for performance.

> But Python is not designed to be fast, and it is not designed to be safe. Instead, it is designed to be easy, and flexible. To work around the performance problems of using “pure Python” code, we instead have to use libraries written in other languages (generally C and C++), like numpy, PyTorch, and TensorFlow, which provide Python wrappers. To work around the problem of a lack of type safety, recent versions of Python have added type annotations that optionally allow the programmer to specify the types used in a program. However, Python’s type system is not capable of expressing many types and type relationships, does not do any automated typing, and can not reliably check all types at compile time. Therefore, using types in Python requires a lot of extra code, but falls far short of the level of type safety that other languages can provide.

ie. My point: OP's point is categorically false.

...not that swift is tedious. I <3 swift.

...But fast.ai is a Python library. Partially written in swift.

Validating the original point that nothing will replace python for DL applications any time soon but middleware will continue to be implemented in c++/rust/swift/whatever you fancy.

S4TF isn't the first and certainly not the last end to end non-python DL stack. It might be worth highlighting as an example if it ever reaches mindshare above the noise floor amongst those stacks.

> Our hope is that we’ll be able to use Swift to write every layer of the deep learning stack, from the highest level network abstractions all the way down to the lowest level RNN cell implementation. There would be many benefits to doing this...

Well, the tldr: you’re wrong.

The more approachable reading: python isn’t going anywhere, but people are looking at other things for more than just low level implementations with a python wrapper.

...it’s early days yet, who knows where things will go... but maybe do a bit more reading and have an open mind?

There’s more to life than python.

Python is easy to get started with, but once a project grows to any meaningful size, I would rather have a compiler which is giving me more correctness guarantees than Python is capable of.

IMO Swift strikes the best balance between strictness and productivity of any language I've worked with.

> Python is easy to get started with, but once a project grows to any meaningful size, I would rather have a compiler

boy do I have a continent full of python developers to introduce you to... “python - by any means necessary” is their motto.

But in all seriousness, there are a lot of people (where I work) who started with python and have been daily driving it for so long that any other language is too tedious to get started with. Even if it means writing unreadable, hacky python, python still wins out for them.

I suspect there are a lot of similar people in data science.

Rust has those too, GGGP still considered it tedious.

I think the tedious part would be the borrow checker. Also, Rust doesn't have top-level type inference (by design).

Yeah there is some tedium in general with Rust syntax. Semicolons, for example, feel old fashioned, and there's a lot of verbosity in things like unwrapping. It's a fine language and I like working with it, but there's a lot of details involved in Rust development which don't make sense for data scientists to worry about.

This is why I think Swift is a much better choice to replace or at least compliment Python than Rust. It has a modern, powerful type system and all the quality of life advantages which come with it, but it manages this with a lot more usability than Rust.

A well written swift framework almost becomes a DSL for the problem domain, which is a great property for a data science tool to have.

Swift is not fun on non-Mac platforms. It feels a lot like Google's Dart in terms of how it was positioned and advocated.

Modern Rust isn't difficult to use. This is becoming a really tired meme from detractors. The compiler is incredibly helpful, non-lexical lifetimes are a thing, and unless you're doing a lot of sharing and parallelism, you can avoid many borrow checker problems until you learn RAII.

It's a fair criticism of Swift that it is not well supported across platforms. The story is getting better, but it is still miles behind Rust in this regard. I am a huge fan of Swift as a language, but I am very critical of the tooling.

> Modern Rust isn't difficult to use. This is becoming a really tired meme from detractors.

I beg to differ. I've been programming professionally for over a decade, and I have shipped projects in a variety of languages, and I can safely say that Rust has a steeper learning curve and requires more cognitive overhead to use than many other languages. I find it relatively nice to work with rust in spite of this because the tooling is so great, but it's undeniable that Rust has made tradeoffs which sacrifice ease of use in favor of safety and performance.

The tooling for Swift 4 TF is not anywhere near satisfactory. I can't even seem to get it on my computer without installing XCode (and that's on an apple computer)

I should be able to pull a docker image and have s4tf immediately at my fingertips

Seems the Fast.ai forums have got you covered here. https://forums.fast.ai/t/swift-for-tensorflow-using-gpu-with... describes how to do it, while not completely 'at my fingertips' it's pretty close. and Googles repo at https://github.com/google/swift-jupyter has a clean setup, too.

I've come across that first link before, but remember I had some issues.

The two problems are a. I don't have a GPU locally b. I would rather have a compiler than just a REPL/jupyter

Not local, but you can start using S4TF immediately via Google Colab.


if I’m going to be developing, I need a compiler, not a REPL or a notebook. Haven’t managed to find that

The Dockerfile in the swift-jupyter repo is a superset of what you need. You could remove the lines dealing with jupyter and you'd be left with a Docker container with the s4tf compiler.


Thank you - I will give that a try!

The "Rust is difficult" criticism isn't getting tired, but it must be getting tiresome for the Rust community because they don't have a good answer to it.

The top level comment from an early adopter in the last Rust release thread on HN was complaining about the tedious complexity.

I think this project has immediate merits when it comes to productionizing small, sparse networks for running on the CPU.

I do research and prototyping in Python, but I have to deploy on mobile devices. I was going to roll my own implementation, but now that this exists, it's something I'm going to look into.

Load the model from torch script? In C++?

Don't see why Rust made this possible.

Ideally we'd have native Rust loading of tensorflow, pytorch, etc. models. I've hand-coded a few, but it was pretty monotonous.

You're right, and... I'm working professionally using DL for computer vision, for robotics. We spend the majority of our time writing the business logic around the learned parts. To get all of that code bug free and fast is way harder in python that I expect it would be in Rust. Rust's ndarray ecosystem is immature, but Python's static type checking ecosystem is too (no stable mypy stubs for numpy ndarrays), as is Julia's AOT compilation story.

As somebody who programs in both Python and Rust (and likes both languages) I think Rust's place would be parts of the code that have to be fast, and that you want to get right.

Calling Python code from Rust or Rust from Python is totally doable, and there is in my view no reason why you shouldn't use both in the use cases that suit them.

And the speed part is serious. Some guy once asked for the fastest tokenizer in any given language and my naive implementation came second place in his benchmark right after an stripped down and optimized C variant.

So using Rust for speed critical modules and interfacing them from easy to use Python libraries isn't exactly irrational.

> Calling Python code from Rust or Rust from Python is totally doable, and there is in my view no reason why you shouldn't use both in the use cases that suit them.

What would be the best route to do so in your experience?

Calling C (or C++) code from Python is not only totally doable, it's what all the popular libraries do. Furthermore, most ML stuff runs on the GPU, which also is C-like code.

Hence, Rust offers no performance benefit that isn't already there. It really only offers safety and modern language features, at the cost of being tedious to use.

A bit snarky isn’t it? I explained why it isn’t irrational to use Rust — nowhere did I say you have to use Rust or that Rust is the only solution to that problem (which it obviously isn’t). Assuming the person opposite doesn’t know things is an extra unpleasant move on your part and I am not sure why you feel attacked by any of what I wrote.

For the record, I also use C and I know that expert level C can beat naive Rust at any time (expert level Rust should allow you to do literally the same thing you do in C, just wrapped in a “I can handle this”-unsafe block.

Your response tells more about yourself than it does about the subject at hand. Rust isn’t stealing your cookies and nor am I.

Sounds like you’re over reading into what was essentially a two-point comment... there was no snarkiness in that comment at all - just a statement that rust is not really offering much benefit for this circumstance. Yeesh... projection

Sorry for that, it was entirely unnecessary — as you rightly noticed I read something into the comment that wasn’t there.

Not a data scientist, but isn’t Julia better positioned to challenge Python for ML workflows than Rust?

Definitely, since Julia has the same approach as Python of allowing quick and dirty solutions for data analysis/modelling, but with a much larger scope even when you don't have complete library support. In a few hours you can make a functional pytorch clone (and just using special GPU arrays you get it running on GPUs) with similar performance [1], and within a day (given a very good understanding of the language) a method that compiles the gradient directly from unmodified Julia code [2]. Plus native matlab-like goodness such as multi-dimensional arrays, so you don't have a separate library for fast operations and you can just use normal loops or whatever you want.

But while Julia targets Python fast and concise (while not compromising speed or power), it does not target the slower but more correct (though there is a culture of testing, which is quite important for math-oriented problems since the type system will not catch the most subtle and troublesome problems). There is space for a language to do exploratory/research work that can be quickly deployed in a fast iterative cycle and another for the new Spark/Flink or critical production areas that needs to take the extra effort (like self-driving cars), which could be Rust (or Scala, or Haskell, or Swift or stay with C++/Fortran).

[1] https://github.com/MikeInnes/diff-zoo

[2] http://blog.rogerluo.me/2019/07/27/yassad/

Rust isn't challenging anything for ML workflows, let's be serious.

Rust people will talk about ownership, because when you work with the language you learn to appreciate what it implies. Rust code for ml can be fairly effortlessly compiled to any target - including wasm. Not so for python running over c-compiled libs. Rust has the potential to NN over a bunch of phones and laptops all collaborating on the same thing together. Huge potential, esp around dapps that need a little ml in their lives.

Numerical stability sounds like floating point artifacts are a problem, could a specification/verification language with more exact arithmetic semantics help?

Its not about floating point artifacts. Especially as full IEEE floats are completely deterministic and have exact semantics.

It’s something more fundamental. Similar issues happens elsewhere in numerical computation. See https://en.m.wikipedia.org/wiki/Numerical_stability

I’d phrase it as, “It’s not _just_ about floating point artefacts.” Rounding is a source of error inherent to storing arbitrary precision numbers in finite space, floating point numbers are just a very common way to do that. Numerical stability is basically about information loss, typically when adding a very small number to a very large number causing information about the small number to be rounded out.

It happens with ints too, but it is so much more obvious than with floats that no one would be surprised.

Repeating i += 0.1 a million times will produce the same value as i += 0.2 a million times if i is an int, since the rounding will make both effectively i += 0.

There’s plenty of issues beyond that. For example, many people aren’t gonna use a library that doesn’t have a good AS engine.

Neural network libraries (Tensorflow, Pytorch) have a C++ backend and a Python interface. Which is great - you get a performant compiled language as the backend and a flexible user-friendly language as the interface.

Rust vs Python is a weird question because in reality no one writes their own neural network with numpy, and no one expects Rust to act like an interpreted language suitable for data science workflows. It would be more apt to compare Rust and C++.

Yeah, this is the real point.

Python is an interface to C, C++, and FORTRAN for a lot of stuff. There are even crossover libs for running R.

This is like comparing apples and steaks.

"Fortran", since 1990 (FORTRAN refers to F77 :-)

Or maybe Rust can be compared to Julia if Julia is used for custom machine learning rather than just calling into pre compiled CUDA kernels?

Also, does Rust have a GPU/CUDA backend yet?

This seems to be comparing a hand implemented neural network in python (and numpy) and one in rust. Even in this simple case, the author discovers that in the python case, most of the time is spent in non-python linear algebra libraries.

Most of the major deep learning frameworks for python (tensorflow, keras, torch, mxnet, etc) will not normally be spending the majority of their time in python. Typically, the strategy is to use python to declare the overall structure of the net, and where to load data from, and then the actual heavy lifting will be done in optimized libraries written probably in C++ (or fortran, I seem to recall BLAS used fortran).

I think BLAS is a spec rather than one library. So your version may or may not be fortran. I do think the original "reference implementation" was written in fortran, which is sometimes called "The BLAS library" but I think most BLAS you see in the wild are not that.

Yes. BLAS was originally specified in Fortran. But many BLAS implementation (like cuBLAS, the CUDA/Nvidia version for GPU's) don't use Fortran at all.

> I think BLAS is a spec rather than one library.

Yep; poor wording on my part. Thanks. :)

This is the strength of python, though, and one you cannot ignore. Python is old, and has fast C ops for everything you may want to accomplish. There's no shame in that method in benchmarks.

>> Is rust suitable for data science workflows? >> Right now I have to say that the answer is “not yet”. I’ll definitely reach for rust in the future when I need to write optimized low-level code with minimal dependencies. However using it as a full replacement for python or C++ will require a more stabilized and well-developed ecosystem of packages.

I'm not sure rust is really aiming to be something used for data science workflows. I'm not sure the community will be putting much effort into making this a reality.

Fine, but it seems reasonable for someone to check. It's much easier to make a decision about whether a programming language is what you want if you've got evidence from someone who has tried something similar to you, as opposed to just hearing people say "Language X is awesome because of unimaginably low-level (from my point of view) feature Y".

Rust seems great, to be honest, just not universally so. Nothing wrong with defining the boundaries.

There are people working on language features that will get Rust closer to parity with C++ for numerical computing, most prominently "const generics", which will make it more ergonomic to write numeric libraries (see C++'s Eigen) that use static array sizes. This will ultimately be important for how aggressively the compiler can optimize the code, via eliminating bounds checks, etc.

> I'm not sure rust is really aiming to be something used for data science workflows

I’m not totally convinced rust knows what it’s aiming to be, other than replacing c++...

Rust seems more suitable for implementing the next OpenBLAS.

While Julia's single language mantra is great, as long as things like Python exist, there will be a need for C/C++/Rust.

Yea, this whole discussion feels weird to me. Different use cases. I love Rust (and dislike Py lol), but from everything i hear a highly dynamic frontend (like Py) has little downsides to authors of ML/etc. All of the hotpaths are in other already because Python is so slow.

The only downside i've seen is sometimes the programmer will want more safety. In such a scenario Rust for the "frontend" would be very useful.

So we have two concerns, frontend and backend. For the backend Rust would perfectly acceptable, but i'm not sure it is fixing a safety issue/etc in other (C/etc) languages - aka, perhaps little value in the backend. For the frontend it only has value in some areas.

Regardless, i love Rust and would totally welcome any tooling to keep me in Rust. However i'm not an ML person hah.

A dynamic language can be super frustrating to develop with because you have to keep tensor dimensions memorized in your head or in comments

How is à statically typed language going to help with tensor dimensions?

Named/typed axes checking - ie. I shouldn't be able to contract along two dimensions of different types (like batch dimension contracted with time dimension)

Novel feature engineering is becoming a must. Do you reach for Python or C++? Do you want multiple languages if one was enough?

"The first step here was to figure out how to load the data. That ended up being fiddly enough that I decided to break that off into its own post." this is exactly why we have R and pandas!! Julia is doing an increasingly excellent job of it as well.

It's going to be tough to beat Numpy, Cython and Numba on speed with Rust.

And this kind of workload really isn't the problem Rust wants to solve.

vs C++-14 ? Indeed most DL is in fact C++. The Pytorch recent C++ API is a must. As professionals in this industry, my colleagues and I have switched to full C++. I'd be interested in advantages of Rust vs C++ instead of Python (which truely in terms of performances is C in the background).

Nice to see, a Julia implementation should be fun to do, but the point of the article is true:when you work with linear algebra, most of the time is spent in BLAS.

I’m a newbie in NN topic and feel surprised to hear that noone uses Numpy in actual NN implementations, although it’s written in C++ and highly optimized. Why is that ?

And, how about Gonum (Go equivalent) ?

Finally, I’m currently going through the deeplearning.ai program. I got one week left, and will experiment with building some apps. Which technical stack should I choose ?

Most software (including Python and Numpy and Go and pretty much every Rust program) runs on your computer's CPU. The CPU is good at running programs with a lot of different instructions and if-statements and loops and stuff.

But for neural networks, people often prefer to use special hardware like graphics cards, since graphics cards are really good at doing relatively simple math on many pieces of data at once. So they create special libraries like TensorFlow that can send commands to the graphics card instead of doing the math on the CPU. (And they don't use Numpy because even though it's highly optimized, it's highly optimized for CPUs, and graphics cards are a lot faster than CPUs at running neural networks.)

There's 3 fundamental gaps between Numpy and NN libraries (other commenters have jointly mentioned 2/3)

1. Numpy doesn't run on GPU.

2. Numpy isn't high level enough for NN building.

3. Numpy doesn't have auto differentiation.

Other options solve some of these - Autograd solves 3, Jax solves 1 and 3, etc.

But if you want all 3 then you want to use Pytorch.

The main reason numpy isn't used in NN implementations is that it does not, natively speaking, have GPU support. Tensor structures in PyTorch and TensorFlow have the most solid backend support for GPUs (TPUs) and have a good amount of numpy's ndarray capabilities. There is recent work to put numpy on the same footing for deep learning. Check https://github.com/google/jax

Numpy is at a too low level for applied NN implementations.

If I'm not doing research on new methods but want to build a model for a particular problem using well-known best practices, then all the custom code that my app needs and what I need to write is about the transformation and representation and structure of my particular dataset and task; but things like, for example, optimized backpropagation for a stack of bidirectional LSTM layers are not custom for my app, they're generic - why would I need or want to reimplement them except as a learning exercise?

That'd be like reinventing a bicycle, for generic things like that I'd want to call a library where that code is well-tested and well-optimized (including for GPU usage) by someone else, and that library isn't numpy. Numpy works at the granularity of matrix multiplication ops, but applied ML works at the granularity of whole layers such as self-attention or LSTM or CNN; which perhaps are not that complex conceptually, but do require some attention to implement properly in an optimized way; you can implement them in numpy but you probably shouldn't (unless as a learning exercise).

In the end, hypothetical future good Rust libraries for linear algebra will just be turned into Python extensions and embraced into the mainstream.

They'll also, equally well, serve the needs of Rust programs which need serious number crunching (presumably a small niche); there is no "versus" in the comparison.

I think it's quite impressive actually that someone can pick up Rust and manage to out-perform Numpy in their first project. BLAS implementations are decades-long exercises in optimization.

In my own experience, Rust has been excellent for the more boring side of data science - churning through TBs of input data.

> I think it's quite impressive actually that someone can pick up Rust and manage to out-perform Numpy in their first project.

They didn't. At the end of the article they discuss this.

"In fact it’s worse than that. One of the exercises in the book is to rewrite the Python code to use vectorized matrix multiplication. In this approach the backpropagation for all of the samples in each mini-batch happens in a single set of vectorized matrix multiplication operations. This requires the ability to matrix multiplication between 3D and 2D arrays. Since each matrix multiplication operation happens using a larger amount of data than the non-vectorized case, OpenBLAS is able to more efficiently occupy CPU caches and registers, ultimately better using the available CPU resources on my laptop. The rewritten Python version ends up faster than the Rust version, again by a factor of two or so."

The original python code was written to be easily understood; not performant. Shifting more of the work to the libraries improved the performance.

That is a different algorithm, though, right? They didn’t rewrite the rust code to use matrix multiplication, so it’s no longer a direct comparison, and is instead comparing unoptimized rust with a less efficient algorithm to super optimized C++ with a more efficient algorithm (being called from python).

In which case, IMO, it’s fairly surprising that the rust implementation is only 2x slower.

This is a common phenomenon (as anyone who has tried to rewrite the standard library buffer cache can tell you) - the reason is that these algorithms are oftentimes optimized to perform decently on the very very worst cases which means that they'll be slightly slower overall

Ah, that's a thought that I hadn't considered. Avoiding pathological worst case behavior does sound like a reasonable thing for a well-used library to do.

This approach I think is missing the point. You will write highly optimized libraries in Rust, and then use those in Python.

This is why Python has eaten the world. Not because its the best at any one thing, except bringing all those things together - at which it is unparalleled, and is unlikely to be surpassed anytime soon.

numpy, scipy, pandas, tensorflow all those have very little actual Python code, its c++ and even fortran here and there.

This whole Python vs Bla thing is just silly nonsense. I know Python and some Bla, and so should you. Tonight someone will release SuperFantasticNewThing implemented in Bla, tomorrow someone else will wrap that in Python, and tomorrow night the rest of us will use PySuperFantasticNewThing, and that's exactly how it should be.

Tensorflow is now having Swift support and then there is XLA.

Thanks to ML.NET, I can enjoy .NET JIT/AOT compilers performance, while binding to the same Tensorflow C++ libraries that are wrapped for Python.

Swift and Kotlin enjoy similar bindings on their respective mobile platforms.

Then there is Julia.

Python might have gotten there first, but its kingdom is already getting slowly eroded.

For God's sake, there are some folks starting to implement BLAS in Julia, so I have high hopes for the language

I think you're being unnecessarily dismissive of discussions surrounding python's fitness, and I find your remark "that's exactly how it should be" confusing.

Python is an ok interface language, in that it's script-like, dynamically typed and simple to comprehend. It's popular, which makes on-boarding efficient due to the sheer volume of tutorials online. And, it has built up a large ecosystem, because of the last two points.

That said, it's naive to suppose that python is the currently-ideal or future-ideal interface language. It's just ok, plus it's popular.

Yes exactly: Python is dominant at the moment, because it has a head start, but it's not because Python is uniquely suited to the problem by any means.

This is the same logic one might have used to conclude that the LAMP stack would be dominant forever in web development if you took a snapshot of the world in 2004.

I could see how Rust could be a good option for writing the prediction server. In my use case, DL models are trained offline. It does not matter if it’s faster by 30 mins in training (maybe when I R&D-ing).

On the other hand, you typically have to reach for something like c++ for a low latency, high thruput environment. It can be a bear to write a server. Anything more ergonomic would be welcome

They already have a language to blow python out of the water in ML and it's called Julia.


Its possible to transpile. But the tech is not mature yet

I dont think, rust is competing with python in a similar space. The biggest win for Rust would be to have someone in the community rewrite Cython in Rust and get rid of GIL. Its a win-win for everyone.

Given work Chris Lattner is doing with Swift it has better chance of becoming a contender in this space vs Rust.

Rust isn't going to be used until it has better CUDA support.

tldr: The bottleneck is in the linear algebra libraries. Unoptimized Rust is 2x faster than unoptimized Python. Optimized Python is 2x faster than unoptimized Rust because it can vectorize matrix multiplications, but Rust can't.

My takeaway: stick to a GPU where everything is more parallel.

tldr "Right now I have to say that the answer is “not yet”. I’ll definitely reach for rust in the future when I need to write optimized low-level code with minimal dependencies. However using it as a full replacement for python or C++ will require a more stabilized and well-developed ecosystem of packages."

Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact