Ruff: A fast Python linter, written in Rust

jwilk · on Feb 14, 2023

Discussed in 2022: https://news.ycombinator.com/item?id=32666035 (46 comments)

charliermarsh · on Feb 14, 2023

Didn't expect to see this here but I'll take this opportunity to say thank you to all the contributors and to anyone that's given Ruff a try.

I'm around if anyone has questions.

phodge · on Feb 14, 2023

First of all, this sounds like an amazing project. My workplace runs isort+flake8 in pre-commit hooks, so making these even 10x faster would be a huge quality of life improvement for us.

Personally I'm interested to hear from you what are the specific reasons you can't achieve this kind of performance with CPython.

Usually the major factors are: A) python's generalised data structures (int, list, etc); B) extra overhead of common operations like reading a variable value, calling a function or iterating over a collection; C) no real multithreading (i.e. the GIL); D) lack of control over memory management.

I'd love to know if there's anything else that makes Python that much slower.

charliermarsh · on Feb 14, 2023

It's mostly the reasons you've hit on but I'll try to add some color to them based on my experience with Ruff.

1. The "fearless concurrency" that you get with Rust is a big one though. Ruff has a really simple parallelism model right now (each file is a separate task), but even that goes a long way. I always found Python's multi-processing to be really challenging -- hard to get right, but also, the performance characteristics are often confusing and unintuitive to me.

2. Ruff performs very few allocations, and Rust gives us the level of control to make that possible. (I'd like to perform even fewer...) We tokenize each file once, run some checks over that stream, parse it into an AST, run some checks over that AST, and with a few exceptions, the only allocations outside of that process are for the Violation structs themselves.

3. Related to the above (and this would be possible with CPython too), by shipping an integrated tool, we can consolidate a lot of work that would otherwise be duplicated in a more traditional setup. If you're using a bunch of disparate tools, and they all need a tokenized representation, or they all need the AST, then they're all going to repeat that work. With Ruff, we tokenize and parse once, and share that representation across the linter.

4. Again possible with CPython, but in Ruff, we take a lot of care to only do the "necessary" work on a given invocation. So if you have the isort rules enabled, we'll do the work necessary to sort your imports; but if you don't, we skip that step entirely. It sounds obvious, but we try to extend this "all the way down": so if you have a subset of the pycodestyle rules enabled, we'll avoid running any of the expensive regexes that would be required to power the ignored rules.

drcongo · on Feb 15, 2023

I'd like to thank everyone involved in making it. I've been using it for a couple of months and absolutely love it.

twalla · on Feb 14, 2023

I was working with my team recently to build a standardized python project boilerplate and came across Ruff - the one thing I really liked, apart from the speed, was that I could finally have my linting config in pyproject.toml, which the flake8 project seems to have some kind of aversion to.

claytonjy · on Feb 14, 2023

This is how I judge python tools these days. I will not use most tools that offer no ability to configure with pyproject (except pre-commit), and will look for alternatives if you can use pyproject but the docs seem to discourage it.

Biggest missing piece of the workflow now is a modern replacement for Tox/Nox; only one I see is Hatch, but that replaces a lot more, for better or worse.

bobuk · on Feb 14, 2023

I'm a big fan but be careful, Ruff doesn't support current versions of python >=3.11 yet. Instead of direct python parsing ruff uses RustPython, that is still lack of match operator support and other modern 3.10+ stuff. If you're ok with that - go ahead and switch to ruff, you will never get back.

charliermarsh · on Feb 14, 2023

That's right -- we don't support match statements, and we don't support except* (which is part of 3.11 IIRC), but I _think_ we support everything else. And of course, I intend to support all of those language features.

If interested: CPython moved to a new parser, a PEG parser, in Python 3.9, and part of the motivation was to support language features like pattern matching, which introduced ambiguities in the grammar that the existing parser couldn't handle. For the same reason, they've been non-trivial to implement in the RustPython parser -- they either require clever techniques, or the parser needs to be written as a PEG parser. I am hoping to do the former, but I need to find time to prioritize it.

parlortricks · on Feb 14, 2023

Oh that explains why it is hung up on "match item['blah']:" then. I was scratching my head trying to work out what i had done wrong since the code worked fine.

Love the tool

gen220 · on Feb 14, 2023

On their page, they claim "Python 3.11 compatibility" – perhaps they've added it since you formed your impression, or is this "compatibility" word misleading?

mkesper · on Feb 14, 2023

The Readme mentions 3.11 compatibility so that's astonishing.

the_mitsuhiko · on Feb 14, 2023

It supports 3.11 but not all syntax constructs of 3.11.

dagurp · on Feb 14, 2023

Can they claim to support it then?

the_mitsuhiko · on Feb 14, 2023

I don't know. But that's not unique to ruff. Plenty of Python linting tools had issues with new Python syntax in the past. Rust's rustfmt can still not format some of the most recent syntax additions and rust-analyzer has some limitations with new stuff as well.

elcapitan · on Feb 14, 2023

The benchmarks are impressive, and maybe this is a stupid question, but who lints an entire codebase constantly? In my daily use, I just lint on save, and those are smaller files where I never noticed a difference between different linters.

edflsafoiewq · on Feb 14, 2023

Aren't linters global? If you change the number of args to a function, you want the linter to tell you all call sites with the wrong number of args right?

elcapitan · on Feb 14, 2023

That's something I would expect from more advanced static analysis like mypy, but not flake8? Maybe just different use cases under the same category name.

edflsafoiewq · on Feb 14, 2023

Well a linter can do as much or as little as you want. Certainly having global checks is better than having only local checks and if you can check the whole codebase in under half a second, why not?

elcapitan · on Feb 14, 2023

Is it doing that though? My understanding from their FAQ and comparisons is that they're intending to replace flake8 and the likes, not mypy. And if they're implementing a subset, then the de-facto standard for the larger per-commit checks is still the one you'll have to run.

edflsafoiewq · on Feb 14, 2023

Quick test shows ruff doesn't seem to.

claytonjy · on Feb 14, 2023

Same, but in CI I generally lint with

    pre-commit run --all-files

because it can be much tricker or more error-prone to only lint the correct diff when you're dealing with MRs. I definitely notice the speed hit there, though it usually pales in comparison to running pytest.

Honestly mypy is the slowest one now, which ruff doesn't intend to replace.

abalaji · on Feb 14, 2023

folks who work on teams, with frequent diffs where the linter goes in CI and sometimes catches bugs: see flake8-bugbear et. al.

kdeldycke · on Feb 16, 2023

If the main argument for Ruff is speed, its real advantage is the consolidation of the Python QA menagerie:

  - isort (import statement sorting)
  - pyupgrade (syntax upgrade for newer Python versions)
  - pylint (general linting)
  - pycln (remove unused imports)
  - pydocstyle (docstring syntax checks)

All of these can be replaced with a single ruff call. Ruff consolidates the rules from all these tools into a comprehensive and non-overlapping corpus. And removes the burden of having to find the right invokation order.

What's missing for ruff to be the gold standard, is to adopt features from:

  - autopep8 (wraps long comments)
  - docformatter (docstring auto-formatting)
  - black (determinism code formatting)
  - blacken-docs (applying black on Python code blocks in documentation)

woodruffw · on Feb 14, 2023

ruff is really nice! I'm a very happy user of it on a handful of projects: it's easily 10x faster than the PyCQA tools (flake8, etc.), and has decent (and growing) coverage for most of their checks. It also integrates nicely with GitHub Actions' annotation support, which has been a boon for pointing junior devs to correctable errors.

pauleveritt · on Feb 14, 2023

Funny enough, we did a PyCharm webinar with Charlie just a couple of hours ago. https://www.youtube.com/watch?v=jeoL4qsSLbE

pityJuke · on Feb 14, 2023

Wow, this project progressed quickly. In 5 months it convinced pandas and SciPy to convert? Impressive!

adamc · on Feb 14, 2023

Doesn't seem to cover much of pylint's ground, though. Pylint is slow but I've found a lot of errors using it.

pushedx · on Feb 14, 2023

Do you have any examples of violations that pylint catches that Ruff doesn’t catch? Genuinely curious.

charliermarsh · on Feb 14, 2023

There's an issue tracking parity here: https://github.com/charliermarsh/ruff/issues/970

Though it's probably a bit conservative. Some rules are implemented as straight one-to-one ports from Pylint, and those are easy to check off, but others are Pylint rules that we've already implemented under other names, and those have to be tabulated on a case-by-case basis.

adamc · on Feb 14, 2023

The issue linked seems to cover it.

Narushia · on Feb 14, 2023

Haven’t yet used Ruff in any project, but it does look very promising. I’m personally waiting for them to reimplement most of Pylint’s checks.

https://github.com/charliermarsh/ruff/issues/970

avdempsey · on Feb 14, 2023

I'm more or less waiting on the same thing, but wonder how many it will be able to pick up. The maintainer states somewhere that many of Pylint's remaining checks involve type-checking and inferring types for less than fully typed code. I've also seen him state somewhere that writing a type-checker is probably out of scope since these are massive projects in their own right. I've not seen any statement about the expected future coverage of Pylint.

Perhaps running Ruff plus a type checker gives us close to what Pylint does today? Pylint is pretty comprehensive, and awesome for that, but I'd love to lint at the speed of Ruff.

stavros · on Feb 14, 2023

I switched to it, and it's amazing. It basically runs instantly. I basically prefer the speed to the checks now, but YMMV.

dshpala · on Feb 14, 2023

"An fast"? Shouldn't this be "A fast", or is it an exception of some sort?

cornstalks · on Feb 14, 2023

First line in the post is "An extremely fast Python linter, written in Rust." I assume it was a copy 'n' paste, but stripping "extremely" kinda broke the grammar.

seanw444 · on Feb 14, 2023

They strip adjectives from the titles? Huh.

cornstalks · on Feb 14, 2023

I don’t know if HN does it automatically or if the OP did it manually.

soperj · on Feb 14, 2023

likely too long, so they removed a word.

kelsolaar · on Feb 15, 2023

Love it, we switched all the main colour-science repos for it a few weeks ago and I’m planning to do the same at work!

abalaji · on Feb 14, 2023

I got a 250x (4.5s -> 16.3 ms) speedup when using `ruff` in a real world use case. Highly recommend folks migrate from the usual combo of flake8 + isort + many flake8 plugins.

dundercoder · on Feb 14, 2023

I love python. I really do. But when I see speedups like this from Rust, Nim, or other compiled languages, it makes me question my life choices.

matsemann · on Feb 14, 2023

I shudder when I look at the amount of pods / resources we use. Not only is it slow, but since the GIL makes python effectively single threaded, you end up instead with loads of separate processes / workers in many contexts, multiplying your need for RAM. And since you can't easily just spin up some background threads, you often end up with loads of different entrypoints also complicating the deployment.

At a previous company using the jvm, we used 6x t3.small instances at AWS to handle a much bigger load than what we now have 100+ pods handling in python.

shicholas · on Feb 14, 2023

imo Python is going to eventually be nice-looking macros for Rust in the near future and you will barely have to change your code. What's not to like about that?!

tccole · on Feb 14, 2023

That’s gonna be nice until you have to do a UDF on a data frame

shicholas · on Feb 14, 2023

(same with Ruby, which is also great)

sorokod · on Feb 14, 2023

Indeed, when a linter for language is x100 better when written in another language ...

matsemann · on Feb 14, 2023

On our huge codebase in github actions: pylint 177 seconds, flake8 121 seconds, ruff 2 seconds

faitswulff · on Feb 14, 2023

Instead of the stock complaint about titles including "written in Rust," I'd like to acknowledge that the standard of excellence seems to be higher for tools written in Rust than others. Perhaps that's a consequence of their recency - either way, it would be interesting to hear some cases where tooling built in Rust falls short compared to those written in other languages.

travisjungroth · on Feb 14, 2023

I love ruff. Some reasons it ended up being better that aren’t “standard of excellence”.

#1 Second-mover advantage. It got to learn from all the existing tools, skip all that.

#2 Deduplicated work. Three separate linters means parsing the code three times. A “do everything” does it once.

#3 Rust is faster than Python.

These things combine into someone showing up and suddenly making a tool that a lot of people are using to replace 3+ tools. But that doesn’t mean the previous versions are bad. They were necessary to be where we’re at.

nicoburns · on Feb 14, 2023

I imagine a key benefit here is going to be performance, because a lot of Python tooling is written in Python which will be significantly slower.

xyzzy4747 · on Feb 14, 2023

The first rule of Python should be that if you have to use loops, don’t code it in pure Python.

avgcorrection · on Feb 14, 2023

So instead of a side-topic complaint about “written in Rust”, you’re requesting an off-topic discussion about whatever other tooling than this which is both written in Rust and that is bad?

claytonjy · on Feb 14, 2023

Tangential, but do any python users have a tool they like for testing that fully-formed doc comments (e.g. Google or numpy style) are correct, meaning the arg names and types reflect the function signature?

Pydocstyle, pylint, and ruff will all check for some amount of presence and format, but if a type is wrong or an arg missing, you're on your own, AFAICT.

I found pydoctest¹, but it looks somewhat unmaintained, and bugs like "doesn't work with relative imports" makes me hesitant to use it.

¹https://github.com/jepperaskdk/pydoctest

charliermarsh · on Feb 14, 2023

I've never used, but have heard good things about darglint (although it's now archived): https://github.com/terrencepreilly/darglint

A lot of folks want us to add that kind of enforcement to Ruff (https://github.com/charliermarsh/ruff/issues/458), and I want to do it, but it's a big project and some other stuff has higher priority :)

claytonjy · on Feb 14, 2023

Thank you for the response Charlie! I also found darglint but yes, even less maintained than pydoctest.

Totally understood this isn't a top priority, but very glad to hear it is potentially in-scope. I've been looking for an excuse to get into rust... we'll see.

charliermarsh · on Feb 14, 2023

Of course. There's at least one Ruff contributor who I know is working on a Python docstring parser, written in Rust, with formal grammars etc. (unlike Pydocstyle which relies on regex). I don't know the status of that project, but it would be very cool to see, and Ruff could of course leverage it :)

tracker1 · on Feb 14, 2023

Interesting to see how many of these types of tools are starting to be written in Rust. I think it's the structure that kind of lends itself to this type of thinking/work, with better performance than tooling in the scripted language(s). Similar to Rome.tools, was amazed at the performance difference from eslint and similar in the JS/TS space.

etra0 · on Feb 14, 2023

I think this is because now one of the fastest languages is also safe. Before, the usual options for writing 'performant code' were usually C or C++, and even though modern C++ has better tools for memory management and safety, it was still scary to most people and you could still have some sort of memory bugs.

Now, we also have Rust in the performant side of things, and the guarantee that 'if compiles, it'll work' (aside from logic bugs), is quite a big one. Also, considering how helpful the compiler is, and the mentioned guarantees, more people are enthusiastic about having faster things.

onei · on Feb 14, 2023

I think it's less about safe, and more about Rust's other qualities. The language lends itself to writing parsers thanks to its syntax and zero-cost abstractions. Match expressions in particular made parsing a breeze when I last dabbled in writing compilers. The way it makes you think about memory management saves allocations you'd accidentally make in languages with GC. And it probably doesn't hurt that rust/cargo are nice tools that guide developers onto writing similar CLI experiences.

In short, it shines for this problem domain in my experience.

etra0 · on Feb 14, 2023

I agree, I should have phrased it as *I believe it's one of the reasons*. Developer ergonomics matters a lot, too. The compiler is incredibly helpful, it makes these kind of projects less scary to contributors too, and I agree that pattern matching is fun as hell.

fulafel · on Feb 15, 2023

Indeed. It's not far from eg Ocaml in this respect.

fulafel · on Feb 14, 2023

Lots of safe languages are close enough to C speed, like Ocaml, Go, many JVM languages (w Graal native image if fast startup is needed like here), Ada, Swift, etc. You can still have kinds of memory bugs in all of these incl Rust, esp leaks.

etra0 · on Feb 14, 2023

It might been my ignorance but I haven't seen any tools written in JVM or Swift for these kind of scenarios, but I haven't looked too hard either.

I do think Go has have success in the same space because it is fast, but IMO the ergonomics in Rust for this problem space are a bit better. Of course, that's just my opinion.

> You can still have kinds of memory bugs in all of these incl Rust, esp leaks.

You're right. Leaks are not covered in the memory safety guarantees by Rust, neither are OOMs, but you can't deny you're on a way more safer space than with either C or C++.

tracker1 · on Feb 14, 2023

Well, in the JS space, also seeing tooling come out in Go as well (esbuild) so I don't think it's just Rust... I do think the story to get an executable to a given environment is quite a bit better for Go or Rust than say C# or JVM... Distributing tools in those languages feels dirty by comparison, even if containers make applications much cleaner (in terms of dist/usage).

fulafel · on Feb 15, 2023

Re distribution - for JVM, w GraalVM native image you can do that as well.

For example Babashka is distributed like that (https://github.com/babashka/babashka/releases)

rightbyte · on Feb 14, 2023

The original Unix app philosophy was these piped C programs, right?

Just today I made a simple app in C translating text files to an array you could put in a .c-file, as a piped build tool with a small twist compared to xxd. I thought about doing it in Python first, but I realized it was way simpler to do in C. Both as a user and programmer.

I guess Rust have the same lure.