Hacker News new | past | comments | ask | show | jobs | submit login
Pyright: Static type checker for Python (github.com/microsoft)
214 points by JacobHenner on March 24, 2019 | hide | past | favorite | 87 comments



I'm excited; this looks better than Pyre. It being in Typescript is ironically enough a pretty big deal.

I think they're clearly writing this for vscode, right? I hope they are at least. Having spent the last decade and half writing Python code, I have completely stopped enjoying writing Python code (even typed Python) because of how good TypeScript and the VSCode typescript experience is. It doesn't even sort of compare.

At this point I just want pretty much the same thing that happened to JS to happen to Python: For MS to make a Python superset with good type hinting (not the atrocious pep484 syntax) which compiles down to Python.

… and wasm. Which compiles down to Python and wasm.


Yeah, I think it is for VS Code and Azure Data Studio and I think the tooling team likes it -

* https://github.com/Microsoft/vscode-postgresql * https://github.com/dbcli/mssql-cli


What's wrong with the pep 484 syntax in your view?


PEP484 and the current state of typings in Python:

- Imports everywhere. Even basic types (list, dict, set) have to be imported

- Basic types don't share the same casing as the actual types they represent (Set vs set, Dict vs dict, List vs list)

- Awful support for enums.

- Syntax is generally ugly. Typescript got everything right; it's intuitive, clear, and scales pretty well even for complex types. I'd be happy if they just copied typescript syntax.

- Types have a runtime impact.


+1, also Union must be imported and can't be simply spelled |.

Althought now annotations don't have a runtime cost anymore since the y are lazily evaluated.


Agree. One way to fix it would be to allow you to define the typing in a completely separate file. Right now you can define some of it in a separate file but not all of it. I keep the horrendous typing syntax out of my code by not using it.


Typing in a separate file defeats almost all the benefits of typing. Being able to read the code without also having to open a second source, the documentation. It's almost as bad as going back to stoneage c++ with header/source split up in separate files.

Reading and writing types should be fast, agile, easy and effortless. Only way to acheive that is having it as close the usage as possible


> - Types have a runtime impact.

Do you mind expanding on this, and what could be done differently?


type annotations are actual objects instantiated at runtime, usually only done at module import for non-local annotations but done each function invocation for locals. this is nice for extensibility as it makes interacting with and extending the type system as easy as writing any other normal python but in older versions of python it was extraordinarily slow, particularly with generics (as they were previously implemented by creating new actual typeobjects each time they had arguments applied (to be usable as baseclasses) - this has been corrected in 3.7 with the addition of __mro_entries__). 3.8 is however making annotations evaluated on demand not eagerly pretty much removing all runtime performance impact.


Thank you, I've been eyeing typing in Python and appreciate your explanation.


In what sense does

    x: float = 1.0
instantiate?


List[Tuple[Tuple[str, int], Dict[str, str]]] creates a bunch of objects; if you're having a closure these are necessarily created each time the closure is invoked, by evaluating the entire "List[Tuple[Tuple[str, int], Dict[str, str]]]" expression.


Why can’t you close over that type instantied outside the closure?

Complicated closures are often more idiomatically as classes where the aggregate type would be a class member.


There are two cases to consider. One is where the closure is called multiple times. This is the "by necessity" case, because the enclosed definition is executable and Python must make believe that this definition is re-executed every time the function is called.

If you are just calling the enclosed function then of course this does not happen. Compare:

    def mytype():
        print("!")
        return dict()

    def closure():
        def enclosed(x: mytype()):
            pass
        return enclosed

    print("Calling the closure")
    for i in range(10):
        closure()
        
    print("Calling the enclosed")
    f = closure()
    for i in range(10):
        f(0)
Note that using closures is almost always faster in Python than classes, because accesses to enclosed variables are done through offsets instead of attribute lookup (this gap has been somewhat closed, but is still significant in some cases; arguably almost all of these cases indicate that you are using either the wrong language, or the wrong implementation). So adding overhead to closures can be a somewhat valid reason to avoid using these annotations.


it's not all bad but doing generic types is annoying because you have to do

  from typing import TypeVar
  T = TypeVar('T')
in every module where you want to parametrize a type. (not sure how that works with 3.6+'s unevaluated annotations. that's what my linter thinks)

also callable types are really ugly... this:

  Callable[[Foo, Bar], Baz]
vs something like `(Foo, Bar) -> Baz` is just yuck

putting constraints on a type is pretty ugly too iirc, i think it looks something like

  F = TypeVar(bound=Foo)
  def bar(a: F, b: F) -> F: ...
  
hacking type annotations into the language's existing syntax is hard :(


I actually find this a feature.

If you have a function that takes `(Foo, Bar) -> Baz`, What you really have is a function that takes in a FoobarBazifier.

So name your function-interface:

    FoobarBazifier = Callable[[Foo, Bar], Baz]

    def my_func(cb: FoobarBazifier) -> Baz:
        ...
This is pythonic, in the same way that preferring named functions over lambdas (and therefore not having a single character lambda syntax) is pythonic. Its controversial and people dislike it, but it is nice.

It also has the side effect of letting you Subclass the interface if you want to (via NewType), so that if you have multiple functions that match the same signature but fulfill different interfaces, you can name the interfaces differently and use them in different places.

Java mostly enforces this because you have to define an explicit interface and class for your callbacks (or you did until java 8), this is essentially the same thing, but with 1 line of boilerplate instead of 25 across 3 files.


import typing as ta -> ta.List...


Microsoft, Facebook, and Google all have their own Python type checkers now.

* https://github.com/google/pytype

* https://github.com/facebook/pyre-check


Don't forget the one from Dropbox:

* https://github.com/python/mypy


The one blessed by Guido, you mean.


MyPy is not owned by Dropbox. They use it and many significant contributors work there, so it is easy to get this impression. But it was started before that happened.


Just in line with the zen of python: There should be one-- and preferably only one --obvious way to do it.


I just installed it. It gives red squiggly lines under variables that haven't been assigned (typos) and for incorrect methods on variables it can type-deduce. This is without me adding any typing-hints to the script I'm working on.

Only issue I see immediately is it does not use the default sys.path that the vs-code-configured python interpreter uses, so it cannot resolve some of the packages I'm using.

But overall, very nice.


Microsoft, please make one for Ruby! C# vet and lover of VS and VS Code, now working on a legacy Ruby/Rails code base and hating a new life of lifecycle callbacks, fat controllers, fat models, magic methods, and un-upgradeable gems. Ugh.


Stripe has one called Sorbet[0] (StrangeLoop2018 had a good talk[1] on it) that hasn't been released yet but hopefully will be soon.

There's also soutaro/steep[2] which looks really promising and is available now.

[0]: https://sorbet.run

[1]: https://www.youtube.com/watch?v=uFFJyp8vXQI (slides: https://sorbet.run/talks/StrangeLoop2018)

[2]: https://github.com/soutaro/steep


Rails has so much metaprogramming that static typing isn't really going to get you very far. Type checkers and linters won't know what the hell is going on.


+1. ruby is by a good degree the hardest language to statically analyze that ive ever worked significantly with. type anns fit python relatively well due to it's philosophy being pretty much the direct opposite of ruby's: "explicit > implicit" vs "having a conversation"

ed: to the degree that the man himself is adamantly against them in his language (except as perf hints) https://bugs.ruby-lang.org/issues/9999#note-13


Django has quite a lot of "magic" as well. There is work being done on a mypy plugin to support this magic in addition to type stubs. It is already quite usable and supports type-checking many common patterns: https://github.com/mkurnikov/django-stubs

See also mypy's recent blog post on plugins (which are also used for supporting Python 3.7's dataclasses, which are quite dynamic as well): http://mypy-lang.blogspot.com/2019/03/extending-mypy-with-pl...


Django and django rest framework make such a heavy use of metaclasses that I gave up on using mypy for a Python project. My code was littered with annotations and stubs and it still wasn't catching obvious type errors, and I had to fall back to use `Any` in way too many places. Couple that with the horrible typing syntax and the fact that mypy was slow, kept crashing with segfaults and often wouldn't even signal type errors on fully typed functions, and the result was a lot of effort for almost no gain.

If I compare that experience to when I switched from JS to TypeScript, it's like night and day. TypeScript is designed to work well with development tools and already existing libraries, while Python's static types feels designed by someone who never worked with static types while completely ignoring code patterns used in existing libraries.


Stripe is working on one called sorbet. The playground[0] is available, but to my knowledge the gem hasn't dropped just yet.

[0] https://sorbet.run/


It's a shame Crystal isn't taking off that much, I use it for some personal projects and it's such a joy to work with thanks to the static typing, unfortunately the community is a bit too small. A static-typed ruby is just the best thing I can wish personally.


> Pyright is written in TypeScript and runs within node. It does not require a Python environment or imported third-party packages to be installed.

I know which of those environments I would prefer... it's not node. It makes total sense for VS Code though, so fair enough.


As someone who has worked with Eric Traut (the primary author of this from what I can see) in a previous life, I will say you can take this to the bank any day of the week. One of the best developers out there


How does he decide what to work on?


Is mypy not open source? Why not collaborate? Mypy performance improvements are something that would be welcomed.


Mypy performance is quite good in daemon mode for incremental updates. The recently released mypy-mypyc (AOC compiled version) gave me a 6x performance improvement on typechecking a large project.


Yeah, the daemon is way better for sure.


It's the second item in the README - one of the goals is no dependency on Python.


Isn't "No Dependency on Python Environment" a strange goal? Surely if you were going to depend on anything for a Python tool it would be that you've got Python installed?

I could understand if the goal was "to easily run within VS Code".


Ah, I just searched for "mypy" and saw the performance bit. Seems kind of odd, I see the vscode justification but if you're running this, having a dependency on Python seems like a low bar?


I imagine the biggest driver/customer here is VS Code and then it becomes about adding such a dependence to VS Code which they probably don't want.


I could see this being nice if you work within VSCode, have to juggle many Python versions, and already have Node installed, but I've personally had no issues with speed or configurability with mypy and pyenv. I think it's an interesting project, but probably half-baked at this point.


Flame mode on.

Can anybody explain to me why people seem to prefer using unsafe and interpreted/slow languages (Python, Ruby, Javascript... which is only fast now because companies have invested millions on it) and then spend a ton of resources writing "type checkers" for these inherently unsafe languages (Typescript, Pyre, Pyright) Why don't they simply use languages that are already safe and fast by design? (Rust, OCaml...)


Rust is not fast by design.

OCaml is kind of but still not as fast as Ruby for me at least.

I spent a lot of time learning Haskell and Scala and I'm still learning. And they never caught up with ruby on rails in terms of velocity for me.

One big thing is about the feedback loop. Typesafe so what? Type is type, it's not either value or business invariant. I'm not convinced as long as you're not writing proofs in dependent-type languages like Idris. So, I still prefer REPL that I can sure about in no time, and watching tests every 0.1s after I change the code.

One can say interpreted languages are optimized for developing because when you change the code, it can be just run once -- there's no point of compiling it because in the next second you'll be changing it.

And compiled languages are optimized for the runtime, because it's compiled it's ready to be run many times.

Of course, there are JIT for interpreted languages and interpreted static-typed languages, but Haskell REPL is still pretty slow compared to Slime. And for Rails, you can actually touch anything runtime entity in the console.


>it's not either value or business invariant. I'm not convinced as long as you're not writing proofs in dependent-type languages like Idris.

Types are proofs in OCaml as well, you don't need dependent types to have Curry-Howard correspondence. You could even prove stuff in Java. And you can ensure invariants with types in OCaml well enough.


There are many, many libraries that I work with that are written in Python. Things for which a single script was once sufficient, then a few of them, and eventually multiple thousands of lines of code.

I'm not going to spend years reimplementing that same logic in some other language. I'm going to build on top of it. Maybe I'll need to make changes to some of the underlying code.

Better tooling - which is exactly what type checkers are - makes my life much, much easier and makes me much more productive.

Besides, all these languages exist for a reason - you choose the tool appropriate for the problem. "safe and fast" does not make a language automatically better for a problem.


The most important aspect of development in most companies is the developer him/herself. It’s typically also the most expensive resource tied to development. At least if you look at it from a management perspective.

This means that it’s extremely risky to build things in hipster languages because the pool from which you can hire is extremely small. It also means you’ll have to spend a lot of money building tools and libraries that are “just” available in more popular languages.

On top of that, most companies don’t need to scale things to the point that Netflix does, and even if you do, PHP was good enough to run Facebook before PHP got fast, Python was good enough to run Reddit, Node is good enough to run Netflix and Ruby was good enough to run Github.

I mean, why would you ever chose Rust for your project from a business/management perspective? To save a few bucks on iron while increasing your development costs by hundreds of thousands and making vacant positions impossible to be filled?

On the flip side, why would you bother joining a hipster language as a developer? Right now 35% of the jobs in my country are for C#, another 35% is for JAVA, 25% of them are for PHP or Python, the remaining 5% is for every other language but typically c/c++ for robotics. JavaScript/typescript is involved in quite a lot of the positions, but almost no one is doing a JS backend. There isn’t a single OCalm or Rust position available in my entire country. I’m not sure there has ever been a Rust related position.

Management doesn’t care about technical reasons, and it never will. Developers tend to go where the jobs are, even if they have hobby projects somewhere else.

Maybe Rust will somehow manage to break through, Python did after all. But I think Python had a huge amount of help from being the replacement for JAVA at many universities. I don’t see that happening for Rust.


You say 70% of jobs are C#/Java. These are statically typed languages. And they are not hipster. So I guess you kind of agree with the parent comment?


Maybe, I’m not personally opinionated in either direction. We use more and more JS and Python in our shop, and we have had no issues with dynamic types.

On the other hand, that’s very likely because every one of us had been doing either JAVA or C# for 15+ years before we started doing anything serious in dynamic languages.

So from a technical perspective I think dynamic languages are equal, maybe even better. From a management perspective though, you have to consider what happens when you hire a developer who didn’t grow up with static types.


Each time a discussion on the question "Is Ruby/Rails still relevant in [this year]" comes up around here, an answer to the effect of "Ruby/Rails is still the fastest way to go from zero to working prototype/product" ranks highly.

Python and Javascript have different standout attributes but it comes back to the same key point: most people building regular apps/sites/products care primarily about building something functional as fast as possible.

It would make sense that productivity would be a primary concern in the earlier stages when it's not yet known whether a product idea is viable.

But type safety might become a consideration later as the development team becomes bigger, the codebase becomes more complex, and production reliability becomes an important consideration.


Ruby on Rails - fastest from zero to working prototype, fastest from working prototype to unmaintainable nightmare.

No one should be surprised when the "language/framework for non-programmers" results in a codebase that was clearly made by non-programmers.


I see from another comment that you’re struggling with a difficult-to-maintain Rails codebase, but this comment is needlessly contemptuous and risks making you seem arrogant. It also sounds like the kind of thing that some smug, newly-converted Ruby/Python programmers would say about PHP (I plead guilty on behalf of my 10-ish-years-ago self).

I haven’t seen anyone call Ruby/Rails a "language/framework for non-programmers". Nobody asserts that someone can or should try to build a professional-grade app or site without having a solid understanding of programming fundamentals.

It’s just a matter of what you optimise for.

Rails makes sense in startup land, where it’s more likely that not that what you’re building won’t turn out to be popular, so it’s best to test the concept as fast as possible to avoid wasting any more time than you need to.

You can easily rebuild in other languages if you’re lucky enough that scale, performance and maintainability become major problems. I’ve seen that happen at several companies including Twitter and Airbnb, and it makes perfect sense.


Is it just me or did no one else parse this as a joke project because pyrite == fools gold?


Project logo suggests they're aware of it.


I think this is great! Static typing is such a blessing. This one looks very high quality. Static typing helps me read code and helps the compiler do static analysis. What's not to like?


Unnecessary annotations, and compile step. No runtime guarantees. More work to rewrite and iterate although tooling makes up for it. Easier to write bad code due to less need to understand the interfaces. Harder to write any code although that might be a feature.


I don't know if any of that is true. The annotations actually help me to understand the code. I actually find it easy to rewrite and iterate with type annotations which are like machine checked comments. I think the purpose of the tool is to make it harder to write "bad" code, unless you mean "bad" in the aesthetic sense then I can't see how that is true either. In general I find it easier to write code since its easier to keep track of what the data looks like.

What do you mean by interfaces?

But then this is an argument that I don't have much time for and too much time has been spent on it already. Lets just say that if there are many major companies that felt the need to invest in a type system for a popular languages that don't have one, then there might be something too it.


With interfaces i mean the code you mingle with, like functions and methods. When you are half forced to read, at least the function signature, and or the manual, in order to understand and use it, it will lead to fewer bugs. The better you understand something, the more people reading it, the less bugs. And by-product you optimize for readability and comprehension. I think a major reason why companies chose to invest in static typing is that so many developers are used to it coming from Java and other languages. Not necessary static typing itself, but the tooling that comes with it.


A type system is not a replacement for understanding. Where ever did you get that idea? The annotations help you to understand the code, they don't make it so you don't have to understand it.

I'd say python is about as popular as Java and so I don't see companies investing a bunch of money to make Java programmers more comfortable when they could just hire an army of Python developers if they wanted to. I just think they see value in the static analysis that types provide. Read about why Facebook added types to PHP they don't mention your hypothesis.

I don't mind programming in languages with dynamic type systems. I find it just as fun. But I do think its easier when there is a way to easily keep track of the type of thing being operated on. I mean the data is what is actually being operated on, the more I can tell about the data from reading the code the better. The most difficult systems I have ever worked on were the ones where unshaped globs of data were being shuffled around in hash tables and heterogeneous sequences. I would always have to use the debugger to find out what was going on at a given point. The code was dispatching on value and type. It was rough.


I think static typing is needed in a low, middle level programming language to help the optimizer, and make sure values doesn't go out of bounds. Fanny enough higher level languages like JavaScript (and probably python to, I don't know Python) are strictly typed eg. you can never go out of bound and get overflow errors, have dangling pointers, etc. When you already have strict typing, there's less need for static typing.

If the statically typed language is compiled to a dynamic language, you will lose most of the performance gains too, for example Dart is almost like JavaScript-with-static-typing, but it's twice as fast because the compiler can actually make use of the static types.

I also think static typing bolted on to a dynamic higher level languages do push you into an object oriented paradigm. And I think the language becomes uglier with annotations. And it makes people lazy so they name things x, y, z because the type annotation fills in the rest. But who cares about syntactic preferences, code readability and naming things :) The real reason you use static typing is to catch bugs ... But the best way to catch bugs is to understand the code, and have many people read it. When I see less good programmers the workflow goes something like this:

-I want to do foo ... autocomplete ... fooBaz, fooBar, fooFoo ... hmm, what can I do with foo ... autocomplete ... foo.bar sounds like what I need, it takes a baz, new Baz() ... damnit how do I get rid of this null exception ...

I did what you suggested and looked up why Facebook created Hack, from a release post (1): "but still spend time looking up mundane method names in documentation"

They don't have time to read and understand code. :P I sometime jokingly say "the code is the documentation". Jokingly because I know that is unacceptable, but I most of the time actually do mean it. :P

"We didn’t want to slow the PHP workflow, so we came up with a new approach to reconcile instantaneous feedback with type safety."

They addressed one of the issues with a compile step, eg slower feedback loop.

1: https://code.fb.com/developer-tools/hack-a-new-programming-l...


> They don't have time to read and understand code.

Where does it say that. It actually says the opposite quite clearly if you cared to post the whole quote and not just what supports what you would like it to.

> They addressed one of the issues with a compile step, eg slower feedback loop.

Can you explain to me how faster feedback from static types that they, a concept which the article reiterates again and again, leads to a slower feedback loop. Clearly the article and many more like it say that the feedback loop actually speeds up with static analysis since you don't have to run the code to get the feedback.

I'm not sure if you actually want to objectively reason.


It's not wise for me to continue the discussion if you are going to resort to mudslinging. I can not simply explain something at the sandpit level, especially not to someone in an highly emotional state.

[this part contains sarcasm] If you read between the lines in the article, and I've read it in other Facebook articles too (between the lines), understanding code is a problem for Facebook, it's against their principle of "go fast and break things". Good that they are not into the self driving cars business :P [/this part contains sarcasm]

Traditional web development is a bit different from traditional development with long compile times. In PHP for example, the code is evaluated on the go, for every request. Development goes like this: Make change in .php file, (upload the file), reload the browser to see what happened. If you add a type-checker between, before they see what happened, it will slow down the development process. This is how static types and type checking slows down development by adding a compilation step. Facebook claim they fixed it, but it's still a concern that need to be addressed when adding static types to other interpreted languages.

I do acknowledge that tooling allows you to see type errors directly in the editor/IDE before you reload the browser. You can however get that without static typing, via inference. (type annotation do make building such tool easier). It's however likely that the bug would be discovered during the manual or automatic testing anyway, so you have to evaluate if the added complexity to dev-ops from changing the language to add static typing, and adding a compilation step, is actually worth it.


Actually the most common php workflow I've seen is: Edit php files directly in production, save, refresh. If you see a blank page (php syntax error, returs HTTP 500) start sweating and quickly check apache error.log. Repeat until syntax error is fixed (and initial feature is implemented).

Or to rephrase an old proverb:

Real php developers do it on production only.


This has gotten silly.


Wut?

No modern type system gives you runtime gaurantees (c++, Java, rust all erase in various ways).

Interfaces are more clear.


In JS I do my own (runtime) checks. People make fun of me and call it poor mans type checker. But I actually think I'm better then a typechecker to give nice error messages and know where more regious checks are needed. When you already do tests and defensive programming, static typing adds very little, it just gets in the way. Requiring others to skim through the code they are calling in order to understand it i think is a feature of dynamic languages. The better you understand the code the fewer errors you will make.


I still don't understand adding type checkers to dynamic languages beyond Javascript, where there was no other alternative. I view the gradual type checker craze as an admission that dynamically typed languages are a failed experiment. So why would you not just switch to a language that's designed with static typing in mind.


In addition to the (extremely pertinent) answer above, there are advantages to dynamically typed languages in some cases, so it shouldn’t be surprising/confusing that sometimes “dynamic typing plus optional type checking with zero run-time overhead” is the best (or at least very good) of both worlds.


Existing code bases.


think about it the other way, and it's an admission that statically typed languages are a failed experiment because sometimes you really want to tell the typechecker "look, i can't prove this at compile time but it is correct", i.e. not have types in your code.

the designer of the stanza language has a good writeup on why stanza uses optional types: http://lbstanza.org/optional_typing.html


Well not every statically typed language is so absolutist. In C/C++ you are perfectly welcome to cast things if you know you can prove things the typechecker can't.


Ecosystems


Can this be run in real time in VSCode? How hard would it be to write a vim plugin for this? I haven't done any vim scripting in a while - is it possible nowadays to run a process asynchronously, so that you could essentially run this tool on every keystroke?


I believe that vim supports the language server protocol via a (some?) plugin(s), and that pyrite provides a lsp mode, so you should be able to use them together. Note that you would be running vim and node to write python code, which seems like a lot of different languages to me, but I’m mostly an emacs person, so...


I don't know about regular vim, but you could do this with NeoVim.


Now that we have type hints and statically typing in Python, I hope to see an open source Python compiler, that improves performance and produces single file binaries.


Nuitka has been around for a few years:

https://nuitka.net/

AIUI, this was basically a one-person effort for a long time, but the project is now growing.


Look up PyOxidizer.

But also look at MyPy's internal MyPyC if you want something that uses type information for some speedups.


I wish there was one too, but programmer-oriented types are not the same thing as machine-oriented types. Compiling to fast code requires the latter one: types like u32, not Dict[String, int]. Current Python typing efforts tend to focus on programmer-oriented types.


No idea if this improves performance (and it's definitely not a compiler), but Pyz produces single files from your Python code: https://pypi.org/project/pyz/


But why does each company have one for their own[0]?

[0]news.ycombinator.com/item?id=19473944


A comparison with mypy would be helpful.


From the README

> Pyright is typically 5x or more faster than mypy and other type checkers that are written in Python. It is meant for large Python source bases. It can run in a “watch” mode and performs fast incremental updates when files are modified.


Any extensions for emacs?


Does this work typecheck matrix operations?


is there a way to get any of these python typecheckers to automatically annotate my code?


Yes -- see pytype.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: