Hacker News new | past | comments | ask | show | jobs | submit login
Ruff: A fast Python linter, written in Rust (ruff.rs)
210 points by goranmoomin on Feb 14, 2023 | hide | past | favorite | 67 comments

Discussed in 2022: https://news.ycombinator.com/item?id=32666035 (46 comments)

Didn't expect to see this here but I'll take this opportunity to say thank you to all the contributors and to anyone that's given Ruff a try.

I'm around if anyone has questions.

First of all, this sounds like an amazing project. My workplace runs isort+flake8 in pre-commit hooks, so making these even 10x faster would be a huge quality of life improvement for us.

Personally I'm interested to hear from you what are the specific reasons you can't achieve this kind of performance with CPython.

Usually the major factors are: A) python's generalised data structures (int, list, etc); B) extra overhead of common operations like reading a variable value, calling a function or iterating over a collection; C) no real multithreading (i.e. the GIL); D) lack of control over memory management.

I'd love to know if there's anything else that makes Python that much slower.

It's mostly the reasons you've hit on but I'll try to add some color to them based on my experience with Ruff.

1. The "fearless concurrency" that you get with Rust is a big one though. Ruff has a really simple parallelism model right now (each file is a separate task), but even that goes a long way. I always found Python's multi-processing to be really challenging -- hard to get right, but also, the performance characteristics are often confusing and unintuitive to me.

2. Ruff performs very few allocations, and Rust gives us the level of control to make that possible. (I'd like to perform even fewer...) We tokenize each file once, run some checks over that stream, parse it into an AST, run some checks over that AST, and with a few exceptions, the only allocations outside of that process are for the Violation structs themselves.

3. Related to the above (and this would be possible with CPython too), by shipping an integrated tool, we can consolidate a lot of work that would otherwise be duplicated in a more traditional setup. If you're using a bunch of disparate tools, and they all need a tokenized representation, or they all need the AST, then they're all going to repeat that work. With Ruff, we tokenize and parse once, and share that representation across the linter.

4. Again possible with CPython, but in Ruff, we take a lot of care to only do the "necessary" work on a given invocation. So if you have the isort rules enabled, we'll do the work necessary to sort your imports; but if you don't, we skip that step entirely. It sounds obvious, but we try to extend this "all the way down": so if you have a subset of the pycodestyle rules enabled, we'll avoid running any of the expensive regexes that would be required to power the ignored rules.

I'd like to thank everyone involved in making it. I've been using it for a couple of months and absolutely love it.

I was working with my team recently to build a standardized python project boilerplate and came across Ruff - the one thing I really liked, apart from the speed, was that I could finally have my linting config in pyproject.toml, which the flake8 project seems to have some kind of aversion to.

This is how I judge python tools these days. I will not use most tools that offer no ability to configure with pyproject (except pre-commit), and will look for alternatives if you can use pyproject but the docs seem to discourage it.

Biggest missing piece of the workflow now is a modern replacement for Tox/Nox; only one I see is Hatch, but that replaces a lot more, for better or worse.

I'm a big fan but be careful, Ruff doesn't support current versions of python >=3.11 yet. Instead of direct python parsing ruff uses RustPython, that is still lack of match operator support and other modern 3.10+ stuff. If you're ok with that - go ahead and switch to ruff, you will never get back.

That's right -- we don't support match statements, and we don't support except* (which is part of 3.11 IIRC), but I _think_ we support everything else. And of course, I intend to support all of those language features.

If interested: CPython moved to a new parser, a PEG parser, in Python 3.9, and part of the motivation was to support language features like pattern matching, which introduced ambiguities in the grammar that the existing parser couldn't handle. For the same reason, they've been non-trivial to implement in the RustPython parser -- they either require clever techniques, or the parser needs to be written as a PEG parser. I am hoping to do the former, but I need to find time to prioritize it.

Oh that explains why it is hung up on "match item['blah']:" then. I was scratching my head trying to work out what i had done wrong since the code worked fine.

Love the tool

On their page, they claim "Python 3.11 compatibility" – perhaps they've added it since you formed your impression, or is this "compatibility" word misleading?

The Readme mentions 3.11 compatibility so that's astonishing.

It supports 3.11 but not all syntax constructs of 3.11.

Can they claim to support it then?

I don't know. But that's not unique to ruff. Plenty of Python linting tools had issues with new Python syntax in the past. Rust's rustfmt can still not format some of the most recent syntax additions and rust-analyzer has some limitations with new stuff as well.

The benchmarks are impressive, and maybe this is a stupid question, but who lints an entire codebase constantly? In my daily use, I just lint on save, and those are smaller files where I never noticed a difference between different linters.

Aren't linters global? If you change the number of args to a function, you want the linter to tell you all call sites with the wrong number of args right?

That's something I would expect from more advanced static analysis like mypy, but not flake8? Maybe just different use cases under the same category name.

Well a linter can do as much or as little as you want. Certainly having global checks is better than having only local checks and if you can check the whole codebase in under half a second, why not?

Is it doing that though? My understanding from their FAQ and comparisons is that they're intending to replace flake8 and the likes, not mypy. And if they're implementing a subset, then the de-facto standard for the larger per-commit checks is still the one you'll have to run.

Quick test shows ruff doesn't seem to.

Same, but in CI I generally lint with

    pre-commit run --all-files
because it can be much tricker or more error-prone to only lint the correct diff when you're dealing with MRs. I definitely notice the speed hit there, though it usually pales in comparison to running pytest.

Honestly mypy is the slowest one now, which ruff doesn't intend to replace.

folks who work on teams, with frequent diffs where the linter goes in CI and sometimes catches bugs: see flake8-bugbear et. al.

If the main argument for Ruff is speed, its real advantage is the consolidation of the Python QA menagerie:

  - isort (import statement sorting)
  - pyupgrade (syntax upgrade for newer Python versions)
  - pylint (general linting)
  - pycln (remove unused imports)
  - pydocstyle (docstring syntax checks)
All of these can be replaced with a single ruff call. Ruff consolidates the rules from all these tools into a comprehensive and non-overlapping corpus. And removes the burden of having to find the right invokation order.

What's missing for ruff to be the gold standard, is to adopt features from:

  - autopep8 (wraps long comments)
  - docformatter (docstring auto-formatting)
  - black (determinism code formatting)
  - blacken-docs (applying black on Python code blocks in documentation)

ruff is really nice! I'm a very happy user of it on a handful of projects: it's easily 10x faster than the PyCQA tools (flake8, etc.), and has decent (and growing) coverage for most of their checks. It also integrates nicely with GitHub Actions' annotation support, which has been a boon for pointing junior devs to correctable errors.

Funny enough, we did a PyCharm webinar with Charlie just a couple of hours ago. https://www.youtube.com/watch?v=jeoL4qsSLbE

Wow, this project progressed quickly. In 5 months it convinced pandas and SciPy to convert? Impressive!

Doesn't seem to cover much of pylint's ground, though. Pylint is slow but I've found a lot of errors using it.

Do you have any examples of violations that pylint catches that Ruff doesn’t catch? Genuinely curious.

There's an issue tracking parity here: https://github.com/charliermarsh/ruff/issues/970

Though it's probably a bit conservative. Some rules are implemented as straight one-to-one ports from Pylint, and those are easy to check off, but others are Pylint rules that we've already implemented under other names, and those have to be tabulated on a case-by-case basis.

The issue linked seems to cover it.

Haven’t yet used Ruff in any project, but it does look very promising. I’m personally waiting for them to reimplement most of Pylint’s checks.


I'm more or less waiting on the same thing, but wonder how many it will be able to pick up. The maintainer states somewhere that many of Pylint's remaining checks involve type-checking and inferring types for less than fully typed code. I've also seen him state somewhere that writing a type-checker is probably out of scope since these are massive projects in their own right. I've not seen any statement about the expected future coverage of Pylint.

Perhaps running Ruff plus a type checker gives us close to what Pylint does today? Pylint is pretty comprehensive, and awesome for that, but I'd love to lint at the speed of Ruff.

I switched to it, and it's amazing. It basically runs instantly. I basically prefer the speed to the checks now, but YMMV.

"An fast"? Shouldn't this be "A fast", or is it an exception of some sort?

First line in the post is "An extremely fast Python linter, written in Rust." I assume it was a copy 'n' paste, but stripping "extremely" kinda broke the grammar.

They strip adjectives from the titles? Huh.

I don’t know if HN does it automatically or if the OP did it manually.

likely too long, so they removed a word.

Love it, we switched all the main colour-science repos for it a few weeks ago and I’m planning to do the same at work!

I got a 250x (4.5s -> 16.3 ms) speedup when using `ruff` in a real world use case. Highly recommend folks migrate from the usual combo of flake8 + isort + many flake8 plugins.

I love python. I really do. But when I see speedups like this from Rust, Nim, or other compiled languages, it makes me question my life choices.

I shudder when I look at the amount of pods / resources we use. Not only is it slow, but since the GIL makes python effectively single threaded, you end up instead with loads of separate processes / workers in many contexts, multiplying your need for RAM. And since you can't easily just spin up some background threads, you often end up with loads of different entrypoints also complicating the deployment.

At a previous company using the jvm, we used 6x t3.small instances at AWS to handle a much bigger load than what we now have 100+ pods handling in python.

imo Python is going to eventually be nice-looking macros for Rust in the near future and you will barely have to change your code. What's not to like about that?!

That’s gonna be nice until you have to do a UDF on a data frame

(same with Ruby, which is also great)

Indeed, when a linter for language is x100 better when written in another language ...

On our huge codebase in github actions: pylint 177 seconds, flake8 121 seconds, ruff 2 seconds

Instead of the stock complaint about titles including "written in Rust," I'd like to acknowledge that the standard of excellence seems to be higher for tools written in Rust than others. Perhaps that's a consequence of their recency - either way, it would be interesting to hear some cases where tooling built in Rust falls short compared to those written in other languages.

I love ruff. Some reasons it ended up being better that aren’t “standard of excellence”.

#1 Second-mover advantage. It got to learn from all the existing tools, skip all that.

#2 Deduplicated work. Three separate linters means parsing the code three times. A “do everything” does it once.

#3 Rust is faster than Python.

These things combine into someone showing up and suddenly making a tool that a lot of people are using to replace 3+ tools. But that doesn’t mean the previous versions are bad. They were necessary to be where we’re at.

I imagine a key benefit here is going to be performance, because a lot of Python tooling is written in Python which will be significantly slower.

The first rule of Python should be that if you have to use loops, don’t code it in pure Python.

So instead of a side-topic complaint about “written in Rust”, you’re requesting an off-topic discussion about whatever other tooling than this which is both written in Rust and that is bad?

Tangential, but do any python users have a tool they like for testing that fully-formed doc comments (e.g. Google or numpy style) are correct, meaning the arg names and types reflect the function signature?

Pydocstyle, pylint, and ruff will all check for some amount of presence and format, but if a type is wrong or an arg missing, you're on your own, AFAICT.

I found pydoctest¹, but it looks somewhat unmaintained, and bugs like "doesn't work with relative imports" makes me hesitant to use it.


I've never used, but have heard good things about darglint (although it's now archived): https://github.com/terrencepreilly/darglint

A lot of folks want us to add that kind of enforcement to Ruff (https://github.com/charliermarsh/ruff/issues/458), and I want to do it, but it's a big project and some other stuff has higher priority :)

Thank you for the response Charlie! I also found darglint but yes, even less maintained than pydoctest.

Totally understood this isn't a top priority, but very glad to hear it is potentially in-scope. I've been looking for an excuse to get into rust... we'll see.

Of course. There's at least one Ruff contributor who I know is working on a Python docstring parser, written in Rust, with formal grammars etc. (unlike Pydocstyle which relies on regex). I don't know the status of that project, but it would be very cool to see, and Ruff could of course leverage it :)

Interesting to see how many of these types of tools are starting to be written in Rust. I think it's the structure that kind of lends itself to this type of thinking/work, with better performance than tooling in the scripted language(s). Similar to Rome.tools, was amazed at the performance difference from eslint and similar in the JS/TS space.

I think this is because now one of the fastest languages is also safe. Before, the usual options for writing 'performant code' were usually C or C++, and even though modern C++ has better tools for memory management and safety, it was still scary to most people and you could still have some sort of memory bugs.

Now, we also have Rust in the performant side of things, and the guarantee that 'if compiles, it'll work' (aside from logic bugs), is quite a big one. Also, considering how helpful the compiler is, and the mentioned guarantees, more people are enthusiastic about having faster things.

I think it's less about safe, and more about Rust's other qualities. The language lends itself to writing parsers thanks to its syntax and zero-cost abstractions. Match expressions in particular made parsing a breeze when I last dabbled in writing compilers. The way it makes you think about memory management saves allocations you'd accidentally make in languages with GC. And it probably doesn't hurt that rust/cargo are nice tools that guide developers onto writing similar CLI experiences.

In short, it shines for this problem domain in my experience.

I agree, I should have phrased it as *I believe it's one of the reasons*. Developer ergonomics matters a lot, too. The compiler is incredibly helpful, it makes these kind of projects less scary to contributors too, and I agree that pattern matching is fun as hell.

Indeed. It's not far from eg Ocaml in this respect.

Lots of safe languages are close enough to C speed, like Ocaml, Go, many JVM languages (w Graal native image if fast startup is needed like here), Ada, Swift, etc. You can still have kinds of memory bugs in all of these incl Rust, esp leaks.

It might been my ignorance but I haven't seen any tools written in JVM or Swift for these kind of scenarios, but I haven't looked too hard either.

I do think Go has have success in the same space because it is fast, but IMO the ergonomics in Rust for this problem space are a bit better. Of course, that's just my opinion.

> You can still have kinds of memory bugs in all of these incl Rust, esp leaks.

You're right. Leaks are not covered in the memory safety guarantees by Rust, neither are OOMs, but you can't deny you're on a way more safer space than with either C or C++.

Well, in the JS space, also seeing tooling come out in Go as well (esbuild) so I don't think it's just Rust... I do think the story to get an executable to a given environment is quite a bit better for Go or Rust than say C# or JVM... Distributing tools in those languages feels dirty by comparison, even if containers make applications much cleaner (in terms of dist/usage).

Re distribution - for JVM, w GraalVM native image you can do that as well.

For example Babashka is distributed like that (https://github.com/babashka/babashka/releases)

The original Unix app philosophy was these piped C programs, right?

Just today I made a simple app in C translating text files to an array you could put in a .c-file, as a piped build tool with a small twist compared to xxd. I thought about doing it in Python first, but I realized it was way simpler to do in C. Both as a user and programmer.

I guess Rust have the same lure.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact