Hacker News new | comments | ask | show | jobs | submit login
Pycket: A Tracing JIT For a Functional Language (2015) [pdf] (indiana.edu)
57 points by tosh 4 months ago | hide | past | web | favorite | 19 comments



https://github.com/pycket/pycket - development is active.

> There are currently two different modes of evaluation that we refer as OLD and NEW. The NEW Pycket uses linklets and bootstraps the Racket using the expander linklet exported by Racket (version 7+). The OLD Pycket, on the other hand, uses Racket's binary to fully expand the program and generates json asts and evaluates them.

More on linklets: https://docs.racket-lang.org/reference/linklets.html Status of Racket-on-Chez (2018-Jan): http://blog.racket-lang.org/2018/01/racket-on-chez-status.ht...

Back in 2015, a previous thread[2] mentioned a Pixie issue[1], which was closed with "I worked a ton on this, but ran into time issues before the Strangloop conference. At Strangeloop I got talking to one of the authors of Pycket. Unfortunately it sounds like their implementation isn't much better than mine. The amount of load that is put on the JIT due to this sort of interpreter is quite large. In short, I'd rather have a mutable fast interpreter than a immutable mostly fast interpreter. Shelving this work for now. The PR will still exist, but I'm not convinced the features are worth the cost." Pixie's most-recent commit seems a year ago.

My brief unsuccessful search for current Pycket benchmarks did turn up this[3] 2017 thread/paper on the Nash (Guile) tracing JIT.

[1] https://github.com/pixie-lang/pixie/issues/312 [2] https://news.ycombinator.com/item?id=9545628 [3] https://news.ycombinator.com/item?id=13773796


I haven't checked in on the development of Pycket in a bit, but much of the recent work has gone into supporting linklets. Last I checked, the performance story for the Scheme and Shootout benchmarks used in the paper haven't changed much.

There are some benchmarks which evaluate Pycket's performance when applied to gradual typing [1]. Pycket does a pretty good job of optimizing the overheads of gradual typing (or at least Racket's implementation strategy), though it by no means eliminates the performance costs.

[1] https://dl.acm.org/citation.cfm?id=3133878


With the Racket core currently being re-written in Chez Scheme, how will this affect Pycket? Will regular Racket become fast enough to preclude using Pycket?


Performance varies a lot in Racket-on-Chez. For some programs, such as tight loops over simple data, Chez provides a lot of the same benefit as Pycket, but Pycket is often still faster. Here's a simple loop that sums numbers from 1 to 100000000, with timings for all three variants of Racket:

  [samth@huor:~/sw/plt (master) plt] pycket-c -t count.rkt 
  cpu time: 83 real time: 83 gc time: 0
  4999999950000000
  [samth@huor:~/sw/plt (master) plt] racket -t count.rkt 
  cpu time: 226 real time: 227 gc time: 0
  4999999950000000
  [samth@huor:~/sw/plt (master) plt] racketcs -t count.rkt 
  cpu time: 170 real time: 174 gc time: 2
  4999999950000000
Chez is faster than regular Racket, but not as fast as Pycket.

However, other programs the difference is much bigger. If we use generic iteration instead of type-specializing it, we get these numbers:

  [samth@huor:~/sw/plt (master) plt] racket -t count.rkt 
  cpu time: 2883 real time: 2882 gc time: 0
  4999999950000000
  [samth@huor:~/sw/plt (master) plt] racketcs -t count.rkt 
  cpu time: 1266 real time: 1266 gc time: 0
  4999999950000000
  [samth@huor:~/sw/plt (master) plt] pycket-c -t count.rkt 
  cpu time: 74 real time: 74 gc time: 0
  4999999950000000
Finally, on programs with contracts and gradual types, one of our focuses for Pycket, Racket-on-Chez is currently substantially slower than the current Racket VM.



Thank you so much for all of that!

I'll have to look over this in the coming weekend. It's efforts such as these that keep me coming back to Racket again and again.


One of the things that the Pycket team has been trying to do is use JIT compilation to cut down on the runtime cost of Racket's contract system. This is not an area chez scheme would be able to compete with.

I don't really know what is the current state of pycket other than that though.


Saw this benchmark comparison between RPython vs Graal the other day: http://stefan-marr.de/papers/oopsla-marr-ducasse-meta-tracin...

Is there a scheme implementation on graal vm?


Wonder how this compares to mainstream JS JITs?

Reading just a bit, they say Pycket includes general features needed for languages like JS, but sounds like someone would need to implement a JS front end. Any volunteers?


RPython is a framework for building trace-based JITs. Its selling point is that it automatically produces a JIT from a high-level specification of your language (basically, you would implement an interpreter for JS in RPython and then the framework would spit out a JIT)

In JavaScript land there is less space for high level frameworks because browser vendors are willing to throw thousands of man-hours into JIT development. They are willing to go through this trouble to avoid the runtime overhead of meta tracing.

Additionally, JS JITs have moved towards a more method based than trace based approach over the years, for whatever reason.


I've been thinking along these lines as well. One of my dirty secrets is that I'm jealous of Rust's zero cost iterators. I want those in JS and Lua.

One way to get them is to write an interpreter for a specialized subset of the language, and generate traces for each part of your program that uses e.g. map and reduce. It seems possible to specialize the generated code so that the overhead is essentially zero cost, but I haven't thought about it deeply enough yet.

There must be a way to get Rust's zero cost abstractions without the steep penalty of static typing plus borrow checking annotations.


Macros, perhaps?

Tracing JITs actually can have a hard time with map and reduce because they can't always tell that different calls to map should be compiled with different traces. If you try to use the same trace for map(f) and map(g) you are going to fall out of the trace often.

For an example of this issue coming up see https://github.com/luafun/luafun/issues/32


Whoa. Why does that happen?

The first call does indeed compile down to a tight loop of around 10 instructions. However, the second call compiles down to around 62 instructions - even though it is exactly the same code!

That seems like a bug rather than a consequence of tracing JITs, isn't it?

Macros are one possibility, but they require global knowledge at compile time, which is at odds with the dynamic nature of scripting languages.

Suppose you want to write a function `f` which takes some object `x` and calls `str(x)` on it. If you know the type of `x` at compile time, it's a simple matter to substitute `str(x)` with `x.str()`, if you know it contains a `str()` method. The other code would look like this:

  (define-global str (x)
    (if (and (obj? x)  (get x 'str))
        (call (get x 'str))
        ... else stringify `x` as normal
       ))
i.e. you'd have to write a `str` function that checks the type of `x` at runtime, incurring a cost.

A strongly typed language is one possibility, but it seems like we could do better by tracing at runtime. When you know that `x` has a method `.str()`, shouldn't it be possible to replace all instances of `str(x)` with `x.str()` at the call sites themselves? Just recompile all the functions that call `str(x)`. Then you can even inline the str() method's code directly into the call site.


This is indeed a consequence of tracing. The problem is that traces are associated to loops in the program, and since the map function contains only one loop, all traces for map are associated to the loop in its implementation.

When you manually write a loop, there is only one 'body', so a tracing JIT turns that into a single or small number of traces.

For the loop inside of map, you need to produce side exits and new traces for each function passed in. The more times map is used, the slower it gets. This is made worse by the fact that traces out of side exits tend to not be optimized nearly as well.

Pycket has this same issue with its builtin (or hand written) looping _functions_ like map. Fortunately, Racket provides many useful looping macros which allows Pycket to generate unique traces for each loop, ameliorating this problem somewhat in Pycket.


For the loop inside of map, you need to produce side exits and new traces for each function passed in.

Is it true to say that if two functions have the same body and arglist, and capture no variables, then you should be able to reuse the same traces?

This doesn’t happen very often in real code, but it’s useful to understand the problem.


In theory if you call map again with an identical function it should still fall inside the original trace. Tracing JITs effectively inline all function calls that happen inside the trace so if two functions contain the same body they would be indistinguishable.

But there are all sorts of reasons why this might not be happening in that bug report I linked to. It might be due to the definition of the map and reduce functions (luafun adds lots of features to them, so its not just a simple straightforward loop) or it could be due to something about luajit (it is a complex piece of software after all).

The only way to know for sure would be to examine the traces with "luajit -jdump"


I think in the case of a JITed function in a dynamic language, whether the body is the "same" or not depends on how the interface used by the function is monomorphized - which in turn depends on the trace.


Racket's loops are almost always zero cost, and that is done by specifying the type you are iterating over (in-list, in-string etc). This is generally fine, since when you want to iterate over something you probably already know it's type.

That is done by macros and inlining. I wrote similar macros for guile, which has a very handy source to source optimiser.


The macro and type issues are orthogonal. As sabauma pointed out, the tricky bit is that you need one while statement per trace.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: