Hacker News new | past | comments | ask | show | jobs | submit login
Rust for the small things? but what about Python? (dataengineeringcentral.substack.com)
37 points by dancrystalbeach 25 days ago | hide | past | favorite | 87 comments



> I’ve come to realize now that the demise of Python has been greatly exaggerated

Has it? Because this is literally the first I've heard anyone claim (or claim that others have claimed) Python is on a downward trajectory. If anything it's become the de facto standard language for anyone doing anything, other than low-level hardware programming; from data science glue code to web applications to one-off scripts to backend pipelines to command line tools, it seems like "Python" is the default answer these days.


Pythons biggest risk right now is stability. Seems like things keep breaking between "minor" releases. You used to never have to worry what minor version you were using, but within the past couple of years it has become very important.


IMO pythons biggest risk is tooling. Installing and maintaining a Python toolchain, virtual environments, installing 3rd party deps, etc is pretty painful compared to languages/runtimes like Node.js, Rust, C#, PHP, etc (which admittedly are some of the better languages in this regard).

It's not the biggest of deals (and I doubt it will significantly hurt Python adoption for niches like data science which have complex installation requirements anyway). But it definitely keeps me from using Python for "quick scripts" which it should otherwise be a good choice for. And it's frankly embarrassing that a supposedly modern language is barely doing better than C in this regard (ok, it's quite a bit better than C, but still a way behind what I would expect).

Having said that, stability is also a big deal, so maybe you're right. Perhaps both are a big deal.


While I think your concern is a bit overblown, I would concur that tooling is a Python gap. Not because there is any big deficiency in what they can do to manage a project, but because there are just too many options. There is no blessed workflow.

The Python leadership has refused to take a stance on picking a winner. The power vacuum has lead to multiple competing tools, which all do things a little differently.

I have taught Python to a few people and the initial ramp up is embarrassing. I cannot point to a guide on Python.org that says This Is The Way(TM). Instead, I have to give an opinionated workflow full of caveats on how to setup an environment, because there is no “correct” way to get started.

I really do not care who wins. I have had to adopt and transition many packages over the years. Just pick something.


Do you have any examples of breaking changes in minor releases? I use Python every day but haven't come across any, but then I'm not doing data analysis with it so there's probably a huge chunk of Python that I never even think about.


The deprecation policy allows backwards incompatible changes after being documented and printing a deprecation warning for 2 releases. There are only minor releases in the current era of Python development (Python 3.10, 3.11 etc and Python 4.0 only in the far horizon)

https://peps.python.org/pep-0387/#making-incompatible-change...


You’ve never had to use two packages that required incompatible Python versions?


Not that I can remember. But that's the fault of the packages rather than Python isn't it? I've definitely never run into packages needing different minor versions like 3.12.x and 3.12.y


Typescript on node is low key amazing on the backend. It’s got a rusty problem though - build times suck on large code bases.


I just wish runtimes could take advantage of Typescript, and use type hints in performance improvements. Deno, and bun, currently do not, despite running Typescript "natively".


The problem with TypeScript is that it’s purely “advisory.” The actual shape of the objects at runtime might not match the declared types in the system, and once type-checking is complete, it’s a free-for-all.

That being said, you can use AssemblyScript, which offers a “type hints that optimize” approach. Unlike TypeScript, AssemblyScript is compiled to WebAssembly and leverages type information.

Under the hood, V8 performs its own shape analysis on objects to optimize performance. It’s quite effective and can handle a lot of optimization scenarios, though it would be interesting if V8 could use TypeScript’s type information to pre-seed the optimizer with known object shapes (it does not currently).


Indeed would be interesting. Another one of the downsides of everything running on v8. It just feel it gives people the "wrong" impression, even if it helps (massively) with DX to run TS natively.


Do you mean transpilation time, or typechecking time?


The thing that's actually useful, so type checking. esbuild strips the type information without verification plenty fast, but at that point why even bother.


Build time ought to mean compilation time/“transpilation time” since you presumably need the build output.


> to web applications

Like Flask and other backend stuff, or has someone figured out how to turn Python into a WASM and we're all doomed?


To me, a clear benefit of rust over python, and I love python is that once you write something in rust, you have an executable that you can give to someone that doesn’t know how to program.

Python is great for something I will use myself, but not so great for when I want someone else to use my code.


I don't agree with this.

If I give somebody a binary, I have to compile this binary for their operating system and architecture. If they don't know how to program they'll be confused as to why they cannot share this program with others.

If I share somebody a python script, they can still execute it as a program as long as a shebang is set. They can use this script on any operating system and can open it with at text editor to view it's inner workings.

This makes scripting languages like Python much more accessible for single-file scripts.


> If I share somebody a python script, they can still execute it as a program as long as a shebang is set.

As long as they have the right version of Python and all of your program's dependencies installed, which they most likely won't.

> They can use this script on any operating system and can open it with at text editor to view it's inner workings.

Someone who doesn't know how to program most likely isn't going to be using multiple operating systems and isn't going to want to open the script in a text editor to view its inner workings.

I agree with the parent comment that distributing binaries is way more seamless when your users are not very technical.


AND holy shit I have to install another venv to run special-snowflake-3 script I will literally lose it. I literally have a folder of all the venvs I need to run one-off shit people pass me. I get the lack of wanting to compile things but thats not a problem with modern build systems like Rust/Go have- Beyond trivial.


I have recently started to use uv, and you can just do "uvx tool" and it executes the tool (https://docs.astral.sh/uv/guides/tools/). Sure, the tool needs to do something to be able to be executed like that, but it seems pretty easy. And there's also a way to embed the dependencies with a comment at the top of a python script: https://docs.astral.sh/uv/guides/scripts/#running-a-script-w...

The venv created for this is ephemereal (it can not be if you want), so you don't need to keep in mind cleaning up stuff and so. Also uv is really fast in creating the venv and installing whatever is needed. Coming from using plain pip and venvs (and having pain setting up different python version interpreters for projects), and poetry just after that, I am pretty happy with the improvements.


Yeah, really... Python tries to hide some of the insanity behind venvs, conda, etc. but it often ends up some dependency nightmare that's barely holding together. Everyone that tells me "oh Python is so great, it just works" is either only using it for the absolute most basic tasks or kidding themselves. I'd often get a blank stare when I respond to that with "then why am I helping you fix this virtual environment with gigs of cruft for a 'simple task' that has suddenly and inexplicably stopped working for you?"

The equivalent things in Go, Rust, etc are a breath of fresh air for sure


> If I share somebody a python script, they can still execute it as a program as long as a shebang is set.

I'm not familiar with Python, but I presume its similar to Ruby, where besides trivial scripts, this is false - as soon as there are libraries required, you need to find a way to handle them, which in some cases may include handling multiple envs (I write and distributes such scripts).

Compiling for multiple environments in a controlled environment is by far much more convenient than debugging scripts failing on end users' systems.


IMO it's far worse than Ruby. Since Python is meant to be a glue language, scripts can have every type of esoteric dependency, including binaries, os libs, java programs, etc...


Ruby isn't any different


>If I share somebody a python script, they can still execute it as a program as long as a shebang is set.

Since when has windows come with python installed and worked with a shebang?

Mac OS doesn't come with python installed and clicking it won't run it.

So anyone " confused as to why they cannot share this program with others." isn't going to be able to use your python script.


This would work in a language with better and more modern packaging (like Julia) but giving people Python files and expecting them to run them in correct versions might just be a bigger ask then asking them to compile C code.


Unlikely, you need to install virtualenv.

As they may not have same python version, may have different python libraries (version)

It's not simple to run a python script specially one which uses libraries (let's be honest how often your core is not using those popular libraries)


> they can still execute it as a program as long as a shebang is set

Not if you use any libraries.


This is what pyinstaller is for. It's not quite as good as a binary executable, although are you really sending ad hoc binaries to people? Seems like a nightmare for version control etc. The pyinstaller manual is much easier than learning rust!


PyInstaller does have --onefile but that's kind of cheating since it creates and extracts the files to a temp folder.


It’s not cheating at all. It’s an inelegant solution, but it works.


There are a host of Python to Executable tools that either compile the code to c or include the interpreter and run the code dynamically.


The Rust implementation is on the heavy compiled end. The Python implementations are on the more interpreted end.

Considering how widespread Python is it seems like solving the one-executable problem for Python should be easier than solving Rust’s problem (heavy compilation). Like with a wrapper or something.


Python is so “widespread”, that I have seen (non-technical) users with 4-5 different versions of python on their machine. And default/system python in path is 2.7 :dead:



You can totally make binaries with Python, there are even many, many ways to do so. None really come out of the box, and some assembly is usually required, but if you want to do it, you can.

That someone is too lazy to do so is not Python's problem, that's their problem.


That's what I use Go for.

It's much simple to write scripts in Go.

Here's an example; https://github.com/zerocorebeta/bashlike



The main criteria for me would be the frequency of execution. Say I have a model that will be retrained every quarter via CI/CD or Airflow DAG. Does it make a huge difference for me to have parsing done in 410.71ms in Rust vs 740 ms in Python(a convoluted example,of course)? Probably not. Would 400.71 ms vs 5 min, for example still make a difference? I don't think so either.

It would be a different matter entirely if that piece of code is executed more frequently and is also taking a lot of time as I could save both computing resources and money


At Facebook, Python was discouraged because many common tools took seconds to load, even when called with only —-help. :/


Curious if really python was the problem there. Especially if I look at the problems of other "fancy" software from Facebook.


> 400.71 ms vs 5 min

That spread in performance is not realistic though. Of course rust is that much faster, I have no doubt that you could 100x the performance of equivalent python, the problem is that it's an imaginary scenario. Nobody is writing a tight loop in python, you're delegating to pandas/polars, SQL, external calls or what have you. Nobody cares that the code that initializes a call to a 10-minute training run of Xgboost takes 0.001 seconds instead of 0.1 seconds. Nobody cares if you fearlessly optimize the 7 lines of code to connect to a database, whose runtime is dwarfed by a SQL query over the wire.

I don't want to come off as negative towards rust, I do think it's a great language, I'm just really perplexed when people try to use it in contexts that are far removed from its sweet spot.


I agree with you overall but there's a huge caveat: you need to make sure whoever is writing the code in Python has performance in mind. The problem I encounter a lot is that many Data Scientists write unoptimized code that is taking longer to run than it should(ex. using for loops instead of numpy vectorization). You could have inefficient code in Rust as easily but even then the runtime difference would be the same.


Man, I get that python is easy to write but maintaining deployed python code is some of the worst experiences in modern software devlopment.

Less Code != less buggy or more stable code, it just means more implicit code. I contend you spend way more time after release debugging runtime issues or patching random edge cases that are just completely eliminated in typed languages, or deploy/env issues that are eliminated in languages that produce a single binary.

Developer efficiency should include your support time after the writing code.


I'd say that writing small stuff in Rust has two major advantages over Python:

1. Dependency management is s godsent compared to Python. With Rust I'm confident that I'll be able to pull the code on a new machine and just do `cargo build` and it will work. I'd like to use a lot of curses to describe Python here.

2. Python works well if you can fit everything in your head. But 5 years for now it's scary to make even smaller changes in a Python codebase. With Rust you'll get much more support from the compiler, wether it be refactoring, squashing bugs, or adding features.

So in the long run I prefer Rust.


Good thing about Python:

- Very good, fast language for PoC, even for low level programming projects such as compilers;

- Very easy to setup in a new VM - no weird bash scripts, no complex package download, no need to change a hidden configuration file according to a 10 year old reply of a 20 year old SO post;

- Very easy to run - again no need to touch anything, just python something;

- Very good integration in VSCode;

- Virtual env is a bless for multiple PoCs and is very easy to spin up even for people like me who don't work in terminals very often;

As someone who just want to write some code without understanding a million tools, this is a blessing.


I've found pathlib (in Python's standard library) very convenient for tasks like these. I think the entire second example could just be:

    lines = Path("in.txt").read_text().splitlines()
    trimmed = "\n".join(lines[1: -1])
    Path("out.txt").write_text(trimmed)
Granted the example in the article has advantages (like not loading the full file at once) if you want something more permanent, opposed to a quick script ran once/occasionally.


Your script requires over 3x the entire file in memory (text, lines, "\n".join(lines)) whereas the posts rust/python are line by line, so support arbitrarily large files.


If you are dealing with many-gigabyte files or a performance-critical pipeline then yeah, you'd want to do it line-by-line. My example is just a quick and dirty script, or REPL input, without any of the boilerplate - which I think is an area Python excels in.


Codebases tend to grow over time. I am not a fan of python for more complex projects. Some of the patterns I have seen lend to codebases that are hard to follow.


* Python requires less code

* Does speed and safety matter in every application (probably not)

* Developer efficiency matters

———-

Why can’t have everything in a safe and fast language?


I think we can have everything, up to the limit of garbage collection.

Rust and C++ are tricky (in very different ways) due to the lack of GC.

However, there are many languages that are as productive as Python (maybe more so, if you factor in types and functional programming) yet execute 10x faster.


As productive, yes. I would say Perl is a very productive language.

Would they be just as readable, though? Especially for newcomers that are not yet that familiar with programming?


Because those are conflicting requirements. Let's make an example, you want to multiply two n-by-n matrices, A and B. Easy enough.

Wait, is either A or B Hermitian? If so we can do better. Is either of them symmetric? Is either of them triangular? Where do you want to allocate them? Do you want to overwrite either of them or you want a new matrix? If so, how are you going to allocate that memory? Would you like to fuse-multiply-add with your order?

Python's answer is "shut up nerd, just A@B". Necessarily you are going to lose some control over the execution.

Different languages take different tradeoffs on the "simple specification vs. detailed control over execution" scale.


In the last example, the call to openoptions can be replaced with the simpler File::create(), which opens the file in write mode, creates it if it doesn't exist and truncates it if it does.

The weird iteration can be replaced with Itertools with_position method and filtering on Position::Middle. The python code counts the total lines by reading the file once and then iterates the file again using the count. This would be possible in the Rust approach too and would look mostly the same.

[1]: https://doc.rust-lang.org/stable/std/fs/struct.File.html#met...

[2]: https://docs.rs/itertools/0.11.0/itertools/trait.Itertools.h...

As with any endeavor, knowing your tools helps most tasks. This is what the example looks like with full error handling and a fairly succinct yet fast approach.

    use std::{
        fs::File,
        io::{BufRead, BufReader, BufWriter, Write},
        time::Instant
    };

    use color_eyre::{eyre::WrapErr, Result};
    use itertools::{Itertools, Position};

    fn main() -> Result<()> {
        color_eyre::install()?;
        let start = Instant::now();

        let path = "foo/bar/baz.txt";
        let tempfile = format!("{path}.tmp");

        let input = File::open(path).wrap_err(format!("Failed to open file: {path}"))?;
        let output = File::create(tempfile).wrap_err("Failed to create output file: {tempfile}")?;

        let reader = BufReader::new(input);
        let mut writer = BufWriter::new(output);
        for (_, line) in reader
            .lines()
            .with_position()
            .filter(|(position, _)| *position == Position::Middle)
        {
            let line = line.wrap_err("Failed to read line")?;
            writeln!(writer, "{line}").wrap_err("Failed to write line")?;
        }

        println!("Elapsed: {:?}", start.elapsed());
        Ok(())
    }
As a Rust dev, I find that marginally easier to grok than the python code, probably due mainly to familiarity. But I can see why the python code would be more easier for a python dev.

My rust specific dev experience is ~18 months. I presume the OP's experience in Python is probably equal or more than this.


Article compares Rust to Python claiming that whilst Python is indeed slower, it is more productive for developers.

But there are many languages in the world, and there are some that are as productive as Python, yet execute much faster.


> there are some that are as productive as Python, yet execute much faster

I'd like some recommendations for this from the community. As productive as Python but faster execution is the kind of sweet spot I want to explore and learn. Any specific examples or recommendations for such programming languages?


It depends on what you're doing, but for web stuff Go is the popular middle-ground choice that's not quite as fast as Rust but is much faster than Python or Ruby. Go is garbage collected, unlike Rust, so you get the productivity benefits of not having to think too hard about memory management.


Julia's whole thing is python but not slow. Crystal is the same but for Ruby.

zig and nim for tries to feel like a scripting language while still being fast.

Personally I like clojure for such things.


Julia 100%. Multiple dispatch and a proper macro system are a superpower. It also doesn't abstract over data in the same way: if you want a byte, you can have a UInt8. But it does abstract over data: a function can take a Number, and that is a very expressive supertype in Julia.

The type system uses a lattice, which can be a bit unfamiliar, but once you get it under your belt, it's the jam. I highly recommend getting to know it.


Nim [0] IMO has syntax that's better than Python, while having a performance of C. See "Nim for Python Programmers" [1] for detailed comparison.

Small example of Nim code (from relevant article [2]):

  var gc = 0
  var total = 0

  for line in lines("orthocoronavirinae.fasta"):
    if line[0] == '>': # ignore comment lines
      continue
    for letter in line:
      if letter == 'C' or letter == 'G':
        gc += 1
        total += 1
  
  echo(gc / total) 

  [0] - https://nim-lang.org
  [1] - https://github.com/nim-lang/Nim/wiki/Nim-for-Python-Programmers
  [2] - https://benjamindlee.com/posts/2021/why-i-use-nim-instead-of-python-for-data-processing/


OCaml? It has an advanced type system but you almost never have to annotate types in your code. So it feels like writing in a dynamic language, like Python.


Clojure (either native JVM or Babashka)

Once you've got a repl up and running the JVM launch time isn't an issue and repl driven development lends itself well to data munging/exploration

Clojure's data centricity (functional, everything's a sequence, strong data manipulation std lib) really makes it a dream for these quick and dirty data transformation/exploration jobs

If you want to get fancy, https://github.com/nextjournal/clerk (notebook interface) and https://github.com/techascent/tech.ml.dataset (data frame library) will get you there


Ultimately it depends on the task but I'm more way more productive writing D than python (assuming it's not just gluing libraries together in which case python obviously wins) because of the metaprogramming and that the basics are done well enough that I can actually trust the results without constantly staring at stdout second guesssing the semantics.

Python isn't a particularly good force multiplier purely as a language these days. It was very wise to not go full OOP, so we are spared from the truly horrible, but python is basically C with lists and new syntax as most people use it.


If you can stand some archaism, Go is this language for the web stuff. It has nice things too, like efficient concurrency and easy deployment.


If you like functools, itertools and list comprehensions, you will probably like F#.

It is significantly faster than Python but has a similar vibe.


Lua?

That being said, performance isn't really an aspect of the language but of the compiler and runtime. PyPy for example, has a Jit compiler.


Scala, once you get a feel for it.


JavaScript is the obvious pick with those criteria.


For data engineering -- R and Julia.


FWIW, you can also make Python execute much faster than Python in many cases, by using PyPy instead of CPython. I've had great success using it to optimize my naively-written scripts. (But in cases where that's still not enough, I'll usually switch to Rust or another AOT-compiled language instead of contorting the script into unreadability.)


If you look at productivity you should also consider size of the ecosystem of libraries.


This. I love Rust and I also love python, although for different reasons. But in python you can always find that one library or codebase that you need or that will inspire you to get there. In Rust it still sometimes happens that when you go off-road, you may be on your own for some functionalities.


There's also something about python, that it was/is extremely fast to learn and pick up.


python is basically "if pseudocode was a language".


New programmers don't know pseudo-code either; I'm not sure why being like pseudo-code would make it easier to learn.

I think Python is easier to learn than many languages - definitely its competitors at the time Perl and C++ - but I don't see why it is any easier than BASIC, Java, Clojure, Scheme, F#, JavaScript...


I also can't explain why it's easier than java, but it is. My theory is the syntax is just insanely clean, making it quick to read. There's also a lot of approachable scripts for it, that make it easy to get momentum. Javascript has too many "gotchas" to make it quick to learn. Sure BASIC is quick to learn, I learned the basics when I was kindergartenish, but it's not something you would use for the past 20+ years, and wouldn't be suited to modern applications.


Agreed, I wouldn't say Python is particularly easy to learn.

"Easy to learn" is difficult to define anyway. Is chess easy to learn? Any child can learn the rules of chess in a few minutes, but it doesn't make them good at chess.

Assembly is like the rules of chess (assuming RISC at least). Super simple, but you still need to learn how to do anything useful with it.

The Python book on my shelf is about 4 times as thick as K&R. So by that (admittedly rather silly) metric, Python is significantly harder to learn.

One might say Python being more expressive makes it easier, but nothing is more expressive than languages like Perl and awk yet they aren't considered "easy".

We could also consider the principle of least astonishment. Python has some surprises: semantic whitespace being the obvious onev and others like a trailing comma creating a tuple. It used to C type languages were "hard" due to what happens if you miss a semicolon etc, but modern IDEs have pretty much made it irrelevant. Python's surprises are still there.

So yeah I think Python sits somewhere with Scheme as being easy enough to get to the point of being able to do useful computing. But there's still tons to learn that any programmer has to learn regardless of language.


Not the point of the article, but the Python version reads the file twice. Once to get the total line count and again to write out the chosen lines. The Rust version proceeds with a single read of the source file.


Yep... and it took the algorithm that will look more complicated for the Rust implementation. The premise was "Rust is fast; Python is easy" and that had to be proven :-(

I'm sure a Rust version with counting lines won't have any performance advantage... and is easy to write and quite nice to read.


To me the main benefit of Python is the standard library. You can do a lot with it without any additional packages. Things get tricky when you have a lot of external packages in rust or Python.


I just wish I could provide a binary of my python code and call it a day


aws s3 ls s3://bucket/prefix/ --recursive | wc -l

sed 1d


> I find it’s not overly verbose or hard for even one of those lowly Python coders to follow what’s happening

Unnecessarily rude.


Considering the article ultimately roots for Python, I think this is meant more as a sarcastic jab at Rust coders' views on python coders, than it's meant to be a serious comment about the average python coder.


Yeah, I think you're right - I don't think the joke lands particularly well though.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: