
Open-sourcing Sorbet: a fast, powerful type checker for Ruby - abhorrence
https://sorbet.org/blog/2019/06/20/open-sourcing-sorbet
======
the_duke
It's been funny to watch how more and more static type systems are getting
bolted on to dynamically typed languages in recent years.

Typescript (with stellar adoption), native type annotation support in Python,
Sorbet, PHP 7, Elixir + Dialyzer, ...

I wonder why there isn't a popular gradually typed language that natively
allows writing both dynamic and type-safe code, allowing quick prototyping and
then gradual refactor to type safety.

I guess in part because it's a big challenge to come up with a coherent type
system that allows this, the bifurcation in the ecosystem, and often a
somewhat leaky abstraction. Eg in Typescript you will often still run into
bugs caused by malformed JSON that doesn't fit the type declaration, badly or
insufficiently typed third party libraries, ....

Google's Dart is the only recent, somewhat popular language (only due to
Flutter) that allows this natively - that I can think of right now.

I do think such a language would be very beneficial for the current landscape
though, and projects like this show there is a clear need.

Edit: just remembered another option: Crystal. Also Julia, as pointed out
below.

~~~
tptacek
For whatever it's worth and without wanting to start a language war (I like
Python just fine), I think Python/Ruby-style typing is a false economy for
prototyping. There are a lot of things that make Go slower to write than Ruby,
but mandatory typing isn't one of them. Rather, Ruby's total lack of typing
makes it _harder_ to write: you effectively end up having to write unit tests
just to catch typos.

I wonder whether the perception that type safety slows down Ruby (or ES6)
development comes from the fact that the type systems are bolted on after the
fact.

~~~
zapzupnz
I've never quite understood the notion that languages like Python and Ruby are
amenable to fast prototyping. Ultimately, you wind up just having to carry
complicated type information in your head rather than write it all down in
your code, all whilst the compiler utterly fails to help you in any way in
case your memory ain't what it used to be.

I'm sure those languages and their ilk have advantages for prototyping, but I
agree, mandatory typing in other languages isn't a burden. If you already have
to reason about what arguments are acceptable in functions, what objects can
receive what messages and what those messages should contain, you already have
typing — just inefficiently stored in short term memory.

Those who are the biggest proponents of these languages as useful for
prototyping are also _not_ the ones who rewrite their code in type-safe
languages, so being able to add type annotations for the sake of their
colleagues who eventually have to turn these prototypes into code upon which a
team can collaborate can only be a useful thing.

~~~
nemothekid
> _I 've never quite understood the notion that languages like Python and Ruby
> are amenable to fast prototyping._

I've thought about this for a while and I think the "fast prototyping"
reputation is an accidental feature that sticks because of history.

Python and Ruby were just better languages than Java 1.4, C++, Perl and PHP
were at the time. They were so much better for not only prototyping, but
writing - it was easier to get something working and iterate in Python than it
was in Java. And not only were they better, they were better at the _right
time_ when the internet exploded.

Now, it largely feels like languages like Java and newer languages like Go
have caught up, but Python and friends still enjoy the feature of "fast
prototyping." Now I think this is because since they had gotten so popular,
they now have a massive ecosystem of libraries and idioms that makes it so
easy build applications.

~~~
kyllo
Java and Go today are much better to work with than Java was in, say, the
early 2000s, but to realize the productivity boost of a dynamic language
within a statically typechecked language, you need type inference and a REPL.

You still don't get this with Java or Go, but you certainly do with ML-like
languages (Haskell, SML, OCaml, F# etc.)

A lot of Haskell developers will tell you that writing Haskell feels a lot
like a functional Python, and they use Haskell for scripting as well as
application development.

~~~
cutler
Until Java gets data classes it will make you miserable. Lambdas are clunky in
Java at best. "Much better to work with" than Java 1.3 is setting a pretty low
bar.

~~~
philwelch
> Until Java gets data classes it will make you miserable.

[https://projectlombok.org/features/Data](https://projectlombok.org/features/Data)

------
simplify
A fascinating part about Sorbet is it did not have to introduce any additional
syntax to Ruby (unlike TypeScript, for example). This really speaks to the
expressiveness of Ruby. Very cool.

~~~
est31
Was that additional syntax in TypeScript actually _neccessary_ for type
inferrence? Or is it rather to avoid API hazards when you change some internal
code and suddenly the API of your library breaks because the inferred type has
changed.

~~~
idle_zealot
You can use TypeScript in a mode that only uses type inference and doesn't
require type annotations or definitions. It works surprisingly well.

~~~
matt_kantor
Additionally, TypeScript will parse JSDoc comments into type annotations:
[https://www.typescriptlang.org/docs/handbook/type-
checking-j...](https://www.typescriptlang.org/docs/handbook/type-checking-
javascript-files.html#jsdoc-types-are-used-for-type-information)

------
castwide
Coincidentally, I announced the beta version of a Ruby type checker in
Solargraph two days ago:
[https://github.com/castwide/solargraph/issues/192](https://github.com/castwide/solargraph/issues/192)

It has a few overlapping features with Sorbet, with one major difference being
that Solargraph type checking relies on YARD documentation instead of
annotations.

~~~
hit8run
Love the work you’re doing on Solargraph! Thx for it.

------
anonova
> To enable static checking with srb, add this line (called a sigil) to the
> top of your Ruby file:

> # typed: true

Isn't this called a directive/pragma? A sigil is a symbol on a name.

Either way, I'm excited to see this finally out after seeing the past
presentations on it.

~~~
ptarjan
Thanks for pointing that out. It can be called all of those. We liked sigil
from its connotation:

> Google defines sigil as, “an inscribed or painted symbol considered to have
> magical power,” and we like to think of types as pretty magical

[https://sorbet.org/docs/static#fn1](https://sorbet.org/docs/static#fn1)

~~~
dochtman
That's a pretty unintuitive use, since "sigil" is more commonly used (in
programming languages) as a single _symbol_ , as in a non-alphanumeric
character that's used as some kind of syntax.

[https://en.wikipedia.org/wiki/Sigil_(computer_programming)](https://en.wikipedia.org/wiki/Sigil_\(computer_programming\))

~~~
sdegutis
Right so the $ in PHP or @ in Perl would be a sigil, (and even the $ and % in
QBasic, am I dating myself?) and the # in this example and all C code and some
Swift code would be a pragma. Seems pretty cut and dry.

~~~
e12e
And the $ and @ in _ruby_...

------
hartator
Awesome work.

    
    
        sig {params(person: Person).returns(Integer)}
        def name_length(person)
    

Not sure if I dig the syntax. Furthermore arguments seems to be the official
names for method arguments, not parameters. eg, `ArgumentError`. `params` also
feels like it's linked to Rails `params` variable in controllers. It can be
confusing.

Something like this will also feel more Rubyist:

    
    
        def name_length person: Person, return: Integer
    

But it probably requires a deeper hack or a change in MRI.

~~~
ptarjan
Thanks for the idea.

We used `params` because Method#parameters was what they called it in the
standard library. I actually had it as `args` originally until someone pointed
this out. [https://ruby-doc.org/core-2.6.3/Method.html#method-i-
paramet...](https://ruby-doc.org/core-2.6.3/Method.html#method-i-parameters)

As for the syntax change, we are actually on our 8th iteration of the syntax.
We really wanted this to NOT be a fork of Ruby so finding something compatible
was very important. For example that's why it has the weird `sig {` syntax
too, we didn't want to have to cause load-time and cyclic dependencies from
adding type signatures.

~~~
hartator
> We used `params` because Method#parameters was what they called it in the
> standard library

Super interesting. We should probably have being consistent for naming
parameters vs. arguments in stdlib. It's too late though!

~~~
steveklabnik
On a super pedantic level, "parameters" are the names that you write in the
function definition, and "arguments" are the values you pass as parameters.

    
    
      def name_length(person)
    
      steve = Person.new
      name_length(steve)
    

Here, 'person' is a parameter, and 'steve' is an argument.

Most programmers use them interchangeably.

~~~
learc83
We called those 2 terms "formal parameters", and "actual parameters" if I
remember my programming language concepts class from college correctly.

~~~
sdegutis
I've never been to programming college but in 10 years I have always heard
them as def(parameters) and call(arguments). Probably a lot of that was when
reading about how VMs and compilers work, though.

------
FpUser
I've never understood the so called advantages of dynamic typing. To me it
looks like a land mine in one's project waiting to blow at run time. And what
for? Do developers code so fast that the time spent on typing something like
"int i" will provide any real savings? Now vendors are trying to patch those
with bolted on top syntax extensions/derived languages that need to be
transpiled. What a mess.

~~~
jeremycw
When you're consuming JSON that has deep nesting, arrays that contain multiple
types, etc. Something that may be two lines of code in a dynamic language
could be as much as 100 lines in some static typed languages.

~~~
choward
Sure, you could write one line of code to read it in as maps and arrays, but
at some point you need to validate your input from the outside world to
convert it you your domain objects. Using a dymically typed language doesn't
magically make that problem go away.

------
whycombagator
Related, Square wrote a great article: "RubyKaigi and the Path to Ruby 3"[0].
The section titled "Static Analysis" high level compares Sorbet to Steep

[0] [https://developer.squareup.com/blog/rubykaigi-and-the-
path-t...](https://developer.squareup.com/blog/rubykaigi-and-the-path-to-
ruby-3/)

~~~
sickcodebruh
It’s my understanding that the Sorbet team is involved with bringing types to
Ruby 3. I’m unclear on whether it will be Sorbet itself or if it’s elements of
it. Can’t dig up the source right now, maybe someone can corroborate this?

~~~
darkdimius
This is true, we're part of a single working group that's working on types for
Ruby3.
[https://twitter.com/darkdimius/status/1119130004313350144](https://twitter.com/darkdimius/status/1119130004313350144)
has some detail

------
jtms
Though I haven’t yet used it for anything in production, I think if I were
starting something greenfield and wanted “Ruby with static types” I would go
with Crystal. I really enjoy writing it and the performance you can get is
quite a significant boost over Ruby.

~~~
sickcodebruh
I’d still go Ruby. A language’s ecosystem and community are as much factors in
why someone should choose or avoid it as its syntax. Both of those things are
fantastic for Ruby — I’d argue that they’re some of its best features, in
fact. Crystal? Not so much.

~~~
jtms
I’m a long time veteran of Ruby and someone who deployed production Rails apps
in EARLY v1. I absolutely love and adore it and it’s by far my favorite
language to work with. That being said, when I can write in a very stunningly
similar language and get 10 to 100x performance with very little extra effort
I am going to strongly consider it when deciding on my stack. Also the
ecosystem for crystal is not terrible at all. I think it’s a great project and
shouldn’t be ignored because “the ecosystem”

~~~
sickcodebruh
I also think Crystal is a great project that shouldn’t be ignored because of
its smaller ecosystem. The performance boost is nothing to sneeze at and I
wasn’t suggesting that Crystal’s ecosystem is terrible, but it’s nowhere near
Ruby’s.

EVERY problem has been solved in Ruby. (Edit: Every problem that isn’t
hamstrung by Ruby’s technical limitations. It’s certainly not the right tool
for every job.) There’s a gem or a blog post or a service for everything and
it will probably work very well. The language is stable and predictable. It
won’t be as fast as we might want it to be but it’ll probably work with
minimal effort out of the box. You’ll be able to put it into production with
little to no fuss and it’ll just work. If it doesn’t just work, you’ll have no
problem finding resources to get it resolved. I don’t think you can say any of
that about Crystal.

Maybe that stuff doesn’t matter for every individual or every project but for
me, they’re significant enough that I think they should come up whenever
anyone tries to compare the two.

~~~
petepete
I'd agree with this. It comes down to the project and scope. For tasks where
correctness is paramount it's hard to argue against Crystal, but for most
apps, most of the time, Ruby and Rails are sufficient.

------
dajonker
I have mixed feelings about adding type annotations to an existing project.
IDEs become easier to use, you can avoid certain bugs, refactoring becomes a
bit less error-prone. But this comes at a cost: you need a very high type
coverage, which means you need to rewrite a lot of code to deal with the
different style of polymorphism. It's very likely that you end up with code
that looks as if it were written in a statically typed language but without
any of the performance benefits of such a language.

------
mark_l_watson
Thanks for this. Major contribution to the Ruby community!!

~~~
ptarjan
Thank you! Ruby has been kind to us, we'd like to be kind back.

------
hderms
I'm interested in whether `.rbi` files are going to be the only official route
for typing in Ruby, and if so, how that would end up impacting Sorbet?

~~~
hirundo
It'd be nice to have an option to put that data inline.

~~~
ptarjan
Sorbet supports both formats. You can see the inline syntax on
[https://sorbet.run/](https://sorbet.run/)

------
bakery2k
Does anyone know what Matz thinks of Sorbet? He has previously been opposed to
adding type annotations to Ruby [1].

This is in sharp contrast to Python, where Guido has overwhelmingly embraced
type annotations.

[1] [https://bugs.ruby-lang.org/issues/9999#note-13](https://bugs.ruby-
lang.org/issues/9999#note-13)

~~~
baroffoos
I saw some news saying that ruby 3 would have types built in.

------
mbell
I wonder what the reason for not supporting structural typing is. It seems
like a very natural fit for Ruby.

~~~
ptarjan
We believe that the main goal of a typechecker is to give you nice error
messages. We've found giving names to things makes it easier for folks to
reason about their errors, and introducing names for interfaces isn't that
onerous at Stripe or with our beta testers.

We aren't opposed to eventually support it, but we'd like to see how it goes
with the current form first.

------
itake
The "Getting Started" link at the bottom of the page is broke

[https://sorbet.org/blog/2019/06/20/docs/adopting](https://sorbet.org/blog/2019/06/20/docs/adopting)

~~~
ptarjan
Great find! Fixing now, thanks.

------
poorman
I've used contracts any time this type of thing was necessary.
[https://github.com/egonSchiele/contracts.ruby](https://github.com/egonSchiele/contracts.ruby)

~~~
wpride
We use Contracts too and are in the process of transitioning to Sorbet. In
addition to the same runtime type checking as Contracts, Sorbet offers static
type checking (and will re-use your runtime signatures in its static
analysis).

------
imhoguy
Excellent work! I wonder if somebody already tried to run it against Rails
codebase.

~~~
ksec
Both Shopify and Kickstarter are on Rails.

I wonder why Github didn't join the private Beta?

~~~
heartbreak
Maybe their forked version of Ruby had something to do with it.

~~~
WA9ACE
IIRC last time I poked at their enterprise image and dumped the code there
were bits of custom type annotations. That was around 2-3 years ago.

~~~
ptarjan
The parser for Sorbet was actually entirely given to us by GitHub. They were
fabulous partners early on in the project and we're grateful for their
contributions.

~~~
heartbreak
Understated, but awesome! Thanks for the insight!

------
poorman
The dependency on bazel is very off-putting to me. After having tried it for
other projects and watched as rapid breaking backwards incompatible changes
were made to the tool, I'm opting out of anything requiring it.

~~~
darkdimius
We only use it at build time, as for a user, it should be invisible for you

------
maxfurman
I'm having trouble finding the implementation of `sig` - could someone please
point me to the right file? Thanks. I'm very curious how they pulled this off.

~~~
jade12
Here it is:

[https://github.com/sorbet/sorbet/blob/master/gems/sorbet-
run...](https://github.com/sorbet/sorbet/blob/master/gems/sorbet-
runtime/lib/types/sig.rb)

As you'll notice, `sig` doesn't actually do anything.

~~~
maxfurman
So....then....how does Sorbet use the type signature provided to sig?

~~~
andolanra
Sorbet has two components, broadly speaking: the typechecker, which is a
standalone application, and the runtime, which is a Ruby gem. The typechecker
can parse Ruby code and find the _sig_ blocks in the code to extract type
information from them, and it then can use this type information to perform
type-checking without ever loading a Ruby interpreter, to say nothing of the
actual Ruby code in question. This is intended to happen in a CI pass or a
pre-processing step, but it is entirely offline.

On the other hand, when you run your code, the _sig_ blocks may or may not be
used. There's a lot of machinery in
[https://github.com/sorbet/sorbet/blob/master/gems/sorbet-
run...](https://github.com/sorbet/sorbet/blob/master/gems/sorbet-
runtime/lib/types/private/methods/_methods.rb) that handles understanding what
a _sig_ means and installing a wrapped version of a method that does type-
checking on entry and exit. The intention is that the standalone _sorbet_
executable should be used during development, but it can't catch every error,
so the runtime system will double-check types at runtime, which is especially
helpful when some parts of your code are typed and other parts are untyped,
and control flow passes back and forth between those sections: the runtime
will ensure that you don't accidentally pass an object with an unintended
runtime type into code with static type expectations.

------
sebastianconcpt
I still wonder what problem exactly static typing fixes. Is is only me that
I'm too used to dynamic tech without correctness issues?

~~~
SmooL
Well one thing: when you work on enough code for long enough you start to
forget what's/what. Static typing let's you jump in and immediately know the
shape of your data at any point in the code, without having to re-trace
execution manually

~~~
sebastianconcpt
I know what you mean. I code in ways that compensate for that problem.
Actually minimizing my need to remember anything but intentions. Smalltalk
rewards that style vehemently for example. And I use Smalltalk and JavaScript
in that style and I don't feel any less productive or particularly vulnerable
to correctness issues. While in a TypeScript project I'm working I do feel a
productivity hit (not in the good sense) and I (and my teammates) keep asking
ourselves "what TS is helping us with, really"? So far the only thing I can
say is that TypeScript is a fantastic solution for all the problems that it
creates itself.

~~~
SmooL
I'm genuinely curious how you code in such a way that compensates for the
problem. I'm unfamiliar with Smalltalk but very familiar with JS/Typescript.

I'm sure that if tight-knit group held strongly to certain naming conventions,
then you could convey structure through the semantics. My experience though
has been that anything that relies heavily on "holding strongly to convention
and form" falls quickly to business speed, forgetfulness, and laziness.

As a personal example, a couple of coworkers and I were working on a codebase
in JS, and there was some function that helped manipulate user objects. There
were however two types of user objects, that were _very_ structurally similar,
but with slight variances. Because of the similarities, people started calling
user functions with one type that were meant for the other type, and it would
work. Future modifications would inevitably cause failures, as unexpected
parameters were being passed in.

~~~
sebastianconcpt
In point 10 here I've wrote about that [https://www.quora.com/What-is-your-
review-of-Smalltalk-progr...](https://www.quora.com/What-is-your-review-of-
Smalltalk-programming-language/answer/Sebastian-Sastre)

About the problem of "falls quickly to business speed, forgetfulness, and
laziness" are you guys using peer-reviews with 4 eyes principle?

------
jedisct1
Very cool stuff!

------
ctrlaltdylan
!remindme in 1 month to check out Sorbet case study posts

------
battletested
I really don't like the current movement of introducing static typing in
dynamic typed languages. Why did we create dynamic typed languages after all?
Because we know the pain of having to type everything and especially the pain
of converting one type into another.

In C you mainly need static types because you cannot put 32 bits into a 16 bit
CPU register, or you cannot do pointer arithmetic on values of different
types, etc.. But that is not the reason why we want types in dynamically typed
languages. We just want to prevent passing incompatible types as argument to a
receiving method for example. And by adding static typing to dynamically typed
languages we actually invalidate these languages entirely, full circle. First
we create a dynamically typed language, then we change that into a statically
typed language that transpiles back to a dynamically typed language, which is
terribly inefficient, why would you want that? I have never been able to
convince the proponents, they appear to be in some kind of higher state,
having found the holy grail.

Almost all the benefits of static typing added to dynamic languages can be
achieved by a better and smarter IDE. All these new 'typed' languages with all
their own issues are only temporal I expect. We keep changing and rewriting
while thinking we're doing it 'the right way' now.. Like Typescript, how long
will that live? Flow is deprecated already. The very best thing of the C
language is that it is still C, and that is fucking awesome for C developers,
after all those years they can still write in the language they master.

~~~
jldugger
> Almost all the benefits of static typing added to dynamic languages can be
> achieved by a better and smarter IDE.

That does what? Typecheck things? You'd probably want a tool or library to do
that, so it's pretty fortunate that Stripe bothered to provide those to us.

> We keep changing and rewriting while thinking we're doing it 'the right way'
> now..

Humanity has yet to rescue type inference from the forges fire on Mount
Olympus. After Milner tricked Hindley in a lunchtime debate about tabs versus
spaces and code quality, Hindley punished us mere mortals by placing type
inference in the fires of heavily recursive forges mortals dare not touch.

