Hacker News new | past | comments | ask | show | jobs | submit login
Bowler: Safe code refactoring for modern Python (pybowler.io)
198 points by jxub on Sept 13, 2018 | hide | past | web | favorite | 47 comments

Author here: I created Bowler as a "hackamonth" project when I joined Facebook's internal Python Foundation team. We've already used it for a bunch of random codemods that touch a large number of source files throughout our codebase.

Happy to answer any questions you might have!

https://pybowler.io/docs/basics-setup mentions a facebook/bowler github repository, but no such repository exists (or it is private). Where does the source code live?

Edit: Found it — it's currently at https://github.com/facebookincubator/bowler instead.

Good catch, I'll get that fixed! [edit: fixed now!]

Excellent! Thanks for the quick fix.

How does it compare to redbaron? [1]

> RedBaron is a python library and tool powerful enough to be used into IPython solely that intent to make the process of writing code that modify source code as easy and as simple as possible. That include writing custom refactoring, generic refactoring, tools, IDE or directly modifying you source code into IPython with a higher and more powerful abstraction than the advanced texts modification tools that you find in advanced text editors and IDE.

[1] https://github.com/PyCQA/redbaron

I haven't actually used RedBaron, but from glancing at the docs, it would look like there are two primary differences:

1) RedBaron uses a procedural API that modifies state continuously as you call methods on the class, whereas Bowler allows you expressive a series of queries/transforms up front, and then execute those all at once on your entire codebase. This also allows Bowler to provide a fluent API, so that you can chain all method calls from each other.

2) RedBaron uses a custom AST implementation, which still lists "Python 3.7 support" on its roadmap. Building on "fissix" (a backport of lib2to3) means that Bowler had day one support for Python 3.7 syntax, and already supports the provisional Python 3.8 syntax. This means your refactoring tools will never prevent you from adopting new versions of Python as soon as they are released.

Could this be used to create a macro system for python?

I'm sure it could be used that way, but the compile() function and the ast library (part of the stdlib) are a more direct way to create a macro system:


I've built macros using the ast library before and it works, but I also found I could adjust my DSL's grammar just a bit and express the same thing with ordinary Python code. That's why Python macro systems rarely gain much traction.

BTW Bowler looks cool. I expect to try it next time I do a big refactor.

For bytecode level transformations rather than AST level: https://github.com/llllllllll/codetransformer

This was first presented at PyconAU 2018 - https://www.youtube.com/watch?v=9USGh4Uy-xQ

This is akin to the other "codemod" facilities Facebook already uses for large-scale refactoring in busy codebases (particularly, their JS codebases), but for Python (https://github.com/facebook/codemod, https://github.com/facebook/jscodeshift).

Notably, or notoriously, our previous codemod project just used regexes, which could result in ballooning complexity, especially when needing to modify code that might include type annotations. Bowler was designed specifically to allow refactoring against more complicated subjects, such as function signature changes, where you lose all predictability in formatting at both the definition and call site.

I could imagine a library of transformers being created using this tool.

For instance we’ve just enabled the pep3101 Flake8 plugin which enforces newstyle string formatting over % formatting. I’d love to see a transformer that automates that refactoring.

This sounds like exactly the sort of transforms and codemods that Bowler was designed to handle. Furthermore, we plan to support more linter-style features, and would like to have it integrated with tools like Flake8 or Phabricator, so that it can simultaneously find lints, and immediately suggest modifications to resolve those lints.

Is this to help migration from python 2 to python 3, or a tool to make python 3 better? it's unclear after I read it except it mentions it's based on 2to3.

Bowler doesn't have much to do with 2to3, except that they have the same library underneath.

lib2to3, despite the name, is a fairly generic refactoring library.

It was not built for the 2 to 3 migration, as there are plenty of great tools already out there for that, like 2to3 itself or futurize. It was built more for cases where we want to make changes to things like function or method signatures, and update both the definition and all call sites, across our entire codebase. It's still a very rudimentary tool at this point, but it was designed in a way that allows us to check in and reuse many components of the codemod, so that we can extract longer term value from these tools.

How well does this handle dynamic constructs? Eg if I rename a method and elsewhere have

    if hasattr(obj, ‘foo’):
does that get caught or not?

No, it would not catch that, because it's not tracing elements through the AST. It would be possible to do some amount of tracing, but it would be impossible to catch and modify all variants of that, since Python is an extremely dynamic language. Imagine something like:

    attr = "foo" if conditionA else "bar"
    obj = A() if conditionB else B()
    if hasattr(obj, attr):
Because of this, Bowler is focused on being more practical, and leans on the expectation that an engineer will be in the loop and validate the resulting diff, run unit tests, etc.

I apologize but I'm really having trouble understanding what the problem you are referring too. In python syntax that would be written as -

    if ConditionA:
        attr = "foo"
        attr = "bar"

    if ConditionB:
        obj = A()
        obj = B()

    if hasattr(attr, obj):
I haven't tested Bowler yet but I would think something that refactors code would handle that pretty easily?

A() and B() could return two different classes that both support foo and bar methods. If your refactor is to rename one of those methods for one of the classes this code breaks and it's not trivial to update automatically, is it?

in general, given this is python, A and B could do something completely foul like request data from the internets and `eval(...)` it then return the result, or redefine `True, False = False, True` or redefine

  hasattr = lambda *args: False`
or something equally insane and difficult to statically analyse

> Eg if I rename a method and elsewhere have `if hasattr(obj, ‘foo’):`

Further, even in the absence of insane dynamic code, it probably isn't possible to automatically make a correct decision without knowing the intent of the original code.

For example, maybe we're renaming the `foo` method of `Fizz` class to `barr`, and currently the program is written such that `is hasattr(obj, 'foo'):` only happens to be called on `Fizz` instances. Is it the case that the programmer only ever intended that code to be called on `Fizz` instances, or did the programmer intend it should operate on all types that have a `foo` attribute, including old-style `Fizz` instances and new types that might be encountered in future? In the former case, we should rewrite the logic to `hasattr(obj, 'barr')`, in the latter case perhaps we should leave it unchanged, so it will no longer trigger after the refactor.

i do agree it should be possible to automatically flag this as "here be something interesting to consider! please intervene and make a manual decision!"

I think these sorts of cases of obfuscated code are outside the scope of refactoring tools. No one who’s looking to use a refactoring tool has to write or read code that uses eval or redefines True.

> I would think something that refactors code would handle that pretty easily?

Nope. Something that refactors code could potentially identify that as an ambiguous case reliably, but it's impossible to statically refactor code of that sort in the general case. Without external/runtime behavior verification/specification, that is.

>In python syntax that would be ...

That was Python syntax.

See below

  Python 3.6.6 (tags/v3.6.6:4cf1f54eb7, Jul  6 2018, 15:35:19) 
  [GCC 4.8.4] on linux
  Type "help", "copyright", "credits" or "license" for more information.
  >>> conditionA = True
  >>> attr = "foo" if conditionA else "bar"
  >>> print(attr)
  >>> conditionA = False
  >>> attr = "foo" if conditionA else "bar"
  >>> print(attr)

I haven't had to use it before, but another tool in that space is "undebt" from Yelp:


Is it possible to use this in vscode ?

That would be awesome.

The possibility is there, but we haven't pursued any specific integrations yet. For obvious reasons, we would most likely work on Nuclide integration first, and something like VSCode support would probably need to come from the community. As Bowler is primarily providing the API framework, most integrations will likely just be plugins that make calls to the Bowler API, or execute the bowler CLI, with appropriate queries.

ditto but for emacs

Seems like more work to me. Simple search and replace is usually sufficient for something like what the intro video shows.

This looks tedious to use directly, but it'd be good to build things like editor plugins on top of. The docs also show that it can do stuff that would not be simple to do with search and replace (e.g. transforming function arguments).

The original Smalltalk refactoring engine didn't take off until it was packaged as the Refactoring Browser. But the real utility in it was making arbitrary scripting of meta-level and syntax driven code transformation very accessible. You can literally be an order of magnitude more powerful with refactorings with a tool like that.

Python is a good language for for this sort of tool. There is a lot of potential for the power of Python harnessed to rewrite Python.

For something that simple, I would agree, but the demo is barely scratching the surface. I'd like to make a more in-depth version showing a more complicated transform, such as changing a function signature and having it update the callers across multiple files.

Yeah, maybe some other example would be more convincing. I think this case would still be a good job for search and replace, you'll need to investigate the surrounding code to get the new data to pass to the function or modify the existing data to support the new format. It's unlikely that you add or remove an argument from a function and nothing else needs to change.

I'm finding Visual Studio Code does an excellent job with "rename symbol" or "change all occurrences".

Rename symbol in the python extension uses Rope, fwiw: https://pypi.org/project/rope/

Funny, I've found it seems to struggle with any references that cross language boundaries.... except with typescript, for some reason.

EDIT: Sorry, I meant 'file' boundary, not 'language' boundary.

Why not, you know, edit the actual sentence instead of adding a thing that says EDIT but leaves the original confusingly unedited?

Because, I guess, it's courteous to the people who already commented on the original, so they don't look like they were talking nonsense.

You make the edit and then you describe the edit, to avoid that.

Ah, good point. (I guess I wasn't confused by the edit - though if I'd been here longer I probably would've been - but I was by your comment!)

Wait, what? You can't just politely walk away from a from a pointless argument over minutia like that. You are new around here. These must extend to a minimum thread depth of 12. Please review the guidelines.

Hahaha! Gold. Thanks, you made my day. Sorry you lost mana for it, I'd +100 if I could. Well, I was reading about https://en.wikipedia.org/wiki/Sayre%27s_law on here recently.

Mine was in reference to Python projects - I haven't tried it on cross language references.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact