
Stalin: a global optimizing compiler for Scheme - malisper
https://github.com/barak/stalin
======
dang
Can anyone give an example of one of the optimizations that make this compiler
so fast, and how it's connected to the whole-program approach—i.e., what
invariants it's able to exploit that aren't available to more traditional
compilers?

~~~
srean
One aggressive optimization that differentiates Stalin (no pun intended, ref
StalinGrad) somewhat, is its inlining of callbacks. For example a generic
numeric multidimensional integral library would take a callback that
implements the function that needs to be integrated. This function is
typically computed several hundreds of thousands of times in a very tight hot
loop incurring a call overhead on each one. In a C library such callbacks
cannot be inlined as it is compiled ahead of time. The whole program analysis
paradigm helps a lot in determining the callback function and inlining it in
place, this triggers more opportunities for more conventional optimizations.

I think a lot of this can be done now with C++'s template metaprogramming, or
in fact using D's CTFE which is quite a pleasure to use. Had Java been not so
broken for numeric stuff, its late inlining facilities would have helped too.
I have played with this a long time ago and at that time nothing I tried, (not
even Fortran) could match it for the same algorithm. Although my fortran would
have left a lot to be desired. I had very little idea about what I was doing
with the piece of Fortran code.

There is a Fortran name for what I was using, it eludes me now, but the
integrator was essentially a coroutine which would yield control to the
function computer and resume integrating once it was computed. Its quite
amazing that Fortran has coroutine facilities, they call it something else.

EDIT:

Java is not smart about using SSE and such SIMD instructions, this is really a
pity. C#, F# are better about this unfortunately its a Microsoft only thing,
yes I am aware of Mono. The other problem is that JAVA is over specified. Java
compilers have very little room to maneuver. For example arguments of a
function are supposed to be evaluated left to right, there goes an opportunity
for low level parallelization. C compilers makes no such guarantees and can in
principle parallelize such instances. The other problem is that unless one
uses low level loops, Java produces just way too many temporaries which on one
hand stresses the garbage collector and on the other wastes time instantiating
and filling the temporaries only to be garbage collected away. In Java the
only form of polymorphism is runtime polymorphism via virtual functions.
Compile time polymorphism can be a lot more efficient. For example if
accessing (i,j)th entry of matrix is a virtual function that all but ensures
that your FLOPS will sink like a stone. Have it as a CRTP in C++ all that will
get inlined, and if you are lucky unrolled and then SIMD'ized. That said Java
numerics has improved quite a bit lately.

Regarding C's whole program analysis, my comment was about the state of the
affairs then, but even now owing to the semantics of the language whole
program analysis in C is orders of magnitude more difficult than in Functional
languages. The latter has much more semantic information to play with. This
was supposed to be mitigated some with the introduction of __restrict__ but it
did not quite deliver on the promise. But cat'ing the entire code to one file
and compiling it does definitely speed up the runtime, poor man's whole
program analysis ! Now you do have support for whole program analysis and link
time optimization that this is no longer such a necessity.

~~~
Totient
> Had Java been not so broken for numeric stuff

I'm curious, what makes Java an especially poor choice for numerically
intensive computing? Is the JIT penalty too high, or is it something else
entirely?

~~~
andrewchambers
JIT isn't a penalty, if anything late code generation lets you make faster
code because hot code paths are known.

My guess is that there are problems with things like the way all objects are
on the heap, and probably other things too.

------
bendlas
Stalin is a beautiful proof of how much a compiler can optimize even "dynamic"
languages under a closed-world assumption. This works because the compiler
basically just infers all the types (EDIT: that it can, and that's sometimes
much more precise than would be expressible in C, hence more optimization
opportunities)

Another example of that, to me, is the google closure compiler for javascript
(in advanced mode).

The PR situation stemming from stalin's name is a bit unfortunate, but you can
always fork and rename it, if you want to work with it, right?

Also, I'm pretty glad that I live in a time where people are free to argue
whether jokes about genocidal dictators are acceptable and in the place where
I can read them doing so, from the other side of the globe.

------
1ris
I (think) stalin isn't maintained since forever; how does it compare to
contemporary compilers? Are there any similar projects for other dynamic
languages?

~~~
tsuyoshi
The closest is probably MLton, a compiler for Standard ML.

~~~
jessaustin
Named, I'm assuming, for John Milton? That so-called poet and all-around
horrible person that even Samuel Johnson considered an "acrimonious and surly
republican"? It's his _Paradise Lost_ we have to thank for the scourge of
"blank verse"! Oh, the humanity! Also he supported and served under Cromwell,
of all people! Surely we can throw this horribly-named compiler on the
dustpile of computer science history!?

~~~
maxharris
Milton never murdered between 20 and 60 million people.

~~~
_delirium
Only because there weren't that many people in the vicinity! A large portion
of those who _did_ have the misfortune to be there got killed, though;
estimates are that about 15-25% of the Irish population was killed.
[https://en.wikipedia.org/wiki/Cromwellian_conquest_of_Irelan...](https://en.wikipedia.org/wiki/Cromwellian_conquest_of_Ireland)

------
dang
The topic here is seminal work on whole-program compilation. Comments about
this work, or about whole-program compilation or Scheme compilation in
general, are welcome.

Indignation over the name is off-topic. So is indignation about indignation
over the name, and so on.

~~~
malisper
Is there anyway you, or some other moderator, could repost this in an
environment that allows a proper discussion? Something like enabling the
"pending comments" system.

~~~
dang
The best thing would be to post substantive on-topic comments which other
users could then upvote. The absence of solid discussion leaves an opening for
ephemeral conniptions.

------
ericcoleman
What does "LaHaShem HaAretz U'Mloah" and "Tam V'Nishlam Shevah L'El Borei
Olam" at the begging and end of source files?

~~~
bradleyjg
That's transliterated Hebrew. The first one means "The world and everything in
it belongs to God", the second "The end, praise God, creator of the world."

No idea why they are there.

~~~
minopret
Yes. Those are traditional mottos from the way Jewish religious books are
used. My wild guess is that they are more likely a sort of inside joke here,
but it is conceivable that they are intended in all seriousness. Anyway, a
person may write in the beginning of a book not only their own name, to mark
their ownership of the book, but also that motto averring in humility that
despite human claims to possessions there is one true owner of all the world.
The second motto is used by the author of the book to express not only the end
of the book's text, but also the author's hope that the goal of the book is
worthy and is fulfilled.

------
zitterbewegung
A long time ago I actually tried to compile this and it took forever. I think
I will try it on my new computer.

------
srean
It is really sad to see commentary around such a marvel that is Stalin to be
exclusively around the political correctness of its name.

Stalin is an amazing piece of software. Its quite unbelievable how much it can
optimize code written in a dynamically typed language like Scheme to give C
code run for its money on speed for things like numerical integration etc. All
those who keep chanting the trope that you need to get low level for speed,
that high level languages cannot be fast, would do well to study it. I am glad
to see it on Github.

Compile times are near epic, but run times are worth marveling, if not for
anything else but to see that such a thing is possible

@bio4m Can we drop that sanctimonious tone please. @vomituddle does have a
point, it tickles me a great deal to think USA, a country responsible for the
political extermination of one ethnicity and the enslavement of another has
this chip on its shoulder as if they are the shining epitome of
freedom/democracy that everyone should bow to. Christsake ! get real. Had
Hitler won, it is not inconceivable that the Godwin's law would have been
quite different, it might as well have picked some US slave runner founder.

Finally, had Stalin been not named Stalin how else would one have StalinGrad,
a derived tool for gradient based optimization. Its worth it many times over
just for that pun.

EDIT: Yeah the downvotes and flags, they solve so many problems. Funny to have
one of the few first comment about the software to be voted and flagged down
below vapid calls to political correctness.

@scarmig that lynched guy was someones grandparent too, same applies to native
Americans women and children burned in their enclaves with escape routes
sealed (if you, and I dont mean scarmig in person, are not familiar its quite
instructive to find out how the mystic river got its name, such incidents were
hardly one off). I myself have family ties with one of the more recent mass
upheavals (if I may avoid the genocide term) but from a different part in
space and time. On a somewhat different note, its a strange small world,
Stalin, Hitler and native Americans ! I remember your comment from a few days
ago.

~~~
scarmig
One person's political correctness is another's "that man murdered my wife's
great grandfather."

Yes, really. "And you lynch negroes" notwithstanding. (And, yes, I'm also that
guy who is more than happy to puncture American propagandistic
historiography.)

------
idlewords
Was 'Hitler' already taken or something?

~~~
robert-boehnke
Yeah, that's the Homo-Iconic Templating Language for ERlang.

~~~
aaronbrethorst
Couldn't actually tell if you were joking. Relieved to discover you were...

------
pinaceae
oh you nerds.

HN doesn't even let me post the alternative names as the post gets
autodeleted. so some names are really, really bad - but hey, let's offend
people for fun.

as if the whole sexism debate wasn't enough.

as to why it is more offensive than genghis khan, i guess it needs to be
explained to the ignorant neckbeard crowd: there are enough people still alive
today that suffered due this mass murderer.

if even russia does not use his name for say naming an aircraft carrier (as
opposed to g.w.bush) you should know it is bad.

this name was picked on purpose, to offend. not a common name, as some other
ignorant shits in here have claimed.

~~~
elementai
The name is a joke, from wikipedia, "Stalin brutally optimizes". Very much
true. You know, it would be quite an undertaking to find russian tech guys
offended by this name.

And I've also found
[https://github.com/n4kz/stalin.js](https://github.com/n4kz/stalin.js) by a
russian guy \- Joseph Stalin => stalin.js \- Badass mustache compiler

~~~
pinaceae
russians not offended is one thing, you'll find more people offended by it in
the surrounding ex-colonies.

just like the people most offended by nazi glorification do not live in
germany. just like people offended by war japan glorification do not live in
japan. just like people offended by confederate glorification are not white.
sexism, not male, etc etc etc.

is it that hard of a concept? empathy towards others?

------
jessaustin
Holy crap HN has its tight underpants on this evening. Relax folks!

~~~
cnp
In the "name" of millions and millions and millions of dead and those
brainwashed through terror, I gotta say no, don't relax. It's not funny in the
least.

~~~
kazinator
Stalin is also just a Russian surname; probably craploads of them in the
Moscow phone book.

Randomly googling for "Igor Stalin", I found a Facebook page and YouTube
channel.

Are there famous Stalins besides the well-known tyrant? Wikipedia informs me
that "The Stalin" were a Japanese punk band, and some rapper called Jovan
Smith took on J. Stalin as a stage name, and an Indian politician called M. K.
Stalin. All of these names are derived from the infamous dictator, though. So
it's very hard to connect anything which uses that name to anything else.

"The Stalin" chose their name evidently because _" Joseph Stalin is very hated
by most people in Japan, so it is very good for our image."_ LOL, facepalm

~~~
yomritoyj
Stalin is not a surname. It is a nom de guerre that means “man of steel”.

~~~
kazinator
I see. "Nom de guerre".

------
argc
Can't we just maintain our sense of humor about things like this and move on?
The project is old, the dictator is dead... And its not like naming a project
after Stalin is "endorsing" him. It's merely personifying the potentially
boring nature of program by calling it "brutal" and "aggressive".

Now I have to think about whether I would be ok with other potentially
offensive names... Boko Haram, the packet capturing tool. It even has the
'Islamic State' functionality that removes HTTP headers from their respective
packet bodies. Hmm..

~~~
gooseyard
just wait until I unveil my Caliphate static analysis tool next week

------
enupten
Does anyone know how this compares to SBCL ?

Pity about the name though. It would've been more appropriate to name it on
one of the "victorious" tyrants of our age. Perhaps one of "Columbus",
"Churchill", "Reagan", or for that touch of infamy "Nixon". Pfft.

~~~
_delirium
SBCL doesn't currently have whole-program optimization. Its focus has been on
standard Common Lisp, which is quite dynamic and can't make the kind of
closed-world assumption used by whole-program optimization (i.e. we know the
entire program at compile-time). It has quite a lot of optimizations, but they
operate function-at-a-time.

Its predecessor, CMUCL, did have a mode called "block compilation" that has
some similarities: [http://common-lisp.net/project/cmucl/doc/cmu-
user/compiler-h...](http://common-lisp.net/project/cmucl/doc/cmu-
user/compiler-hint.html#toc178). It isn't currently present in SBCL; it was
axed as part of the CMUCL->SBCL cleanup. I've occasionally heard suggestions
of adding similar functionality to SBCL, but I'm not sure there's any active
work on it.

~~~
lispm
'whole program optimization' is also kind of unpractical, since these
compilers are typically very slow.

Lucid CL had two compilers, a fast incremental one and a slow one, capable of
block compilation. KCL (the early Lisp to C compiler) used block compilation.
LispWorks should also support it in some form.

Usually 'block compilation' in Lisp means compiling a set of functions
together, instead of compiling each function individually. This block
compilation for example then can reduce function call overhead.

In its more primitive for a single file of source is seen as a block. But
compilers also support block compilation of several files together.

Block compilation in Common Lisp was and is used mostly for application
delivery, where the program is static and dynamic features can be removed.
Thus a delivered program is more static, but also 'faster'.

~~~
_delirium
> Block compilation in Common Lisp was and is used mostly for application
> delivery

The main place I've personally encountered people who care about block
compilation in CL isn't so much delivery (as in, to an external client) as
getting it ready to run "for real", mostly numerical stuff that is being
prepared to run in batch mode on a giant dataset. I guess that's a kind of
deployment though, even if the operator and developer is the same research
group.

I could be imagining things, but one downside I _think_ I see in not having
block compilation in SBCL is that people writing numerical code targeting it
have a tendency to pack as much as possible into flet and labels, using one
big function as the poor man's compilation block (since _within_ a single
function most of CL's dynamic features aren't in play).

I tend not to like code like that when hacking on it. Besides often reading
less clearly and sometimes leading to duplicate or near-duplicate code,
separate functions are more flexible: they can be dynamically redefined while
tinkering with things, they can be called on their own, etc. Of course, that's
precisely why the compiler can't perform certain optimizations on full calls.
But I think people would be more free with the defun if they knew the compiler
would take care of optimizing everything when deployed.

~~~
lispm
> as getting it ready to run "for real"

That's often called 'delivery' in Lisp. It means creating an executable
application or library. It does not mean the actual act of shipping something
to external clients.

If you compile your code to create an application which then is used in
'production' (internal or external), this is the process of 'delivery'.

[http://franz.com/support/documentation/current/doc/delivery....](http://franz.com/support/documentation/current/doc/delivery.htm)

[http://www.lispworks.com/documentation/lw61/DV/html/delivery...](http://www.lispworks.com/documentation/lw61/DV/html/delivery.htm)

------
bio4m
That's a terrible name. The man committed genocide and numerous other crimes
against humanity. Using that legacy to try and get a laugh is horrible.

~~~
vomitcuddle
> The man committed genocide and numerous other crimes against humanity.

the amerikan founding fathers have committed acts of genocide (and numerous
other crimes) against Native Americans and Africans and yet someone decided it
would be a good idea to print them on dollar bills

~~~
bduerst
Since when has losing an arm justified losing a leg?

~~~
mathetic
Well, that is not the point though. He is just pointing out the hypocrisy.

~~~
bduerst
Then I guess I missed something. It seemed like they were downplaying genocide
because people who committed it are printed on dollar bills.

~~~
mieses
I think the negative reaction with has to do with his overly aggressive
mustache, not so much the intentional murder of 30-40 million. Not that
numbers, scale, proportion, context, or substance should matter to this crowd.

------
jastanton
"Stalin - a STAtic Language ImplementatioN", "stalin _brutally_ optimizing
Scheme compiler", "An _Aggressive_ Scheme Compiler"

The naming seems like a stretch to me, and I don't appreciate the theme to be
honest. If you're going for any sort of recognition of the brand then this is
going to get buried under results for Joseph Stalin.

Very poorly named I would have expected something better coming for Purdue
University. I'm sure they wouldn't appreciate this much either.

~~~
thebooktocome
There's plenty of bad taste wandering around Purdue's campus. Most recently,
multiple professors were criticized for joking about the Cousins shooting as
it was happening.

------
pinaceae
awesome name!

would like to see more of this for hardcore stuff like this:

eternal virgin

smelly neckbeard

aspy social retard

and then, let's make it clearer in case my sarcasm needs a tag :

were nigger, kike, wetback already taken?

yes, words matter you fucking morons. needs to be explained to a profession
that relies on precise language aka code to accomplish anything.

------
1ris
Oh HN, only whines about lack of political correctness, nothing technical.
Sometimes you are better.

------
skwuent
Please change the name.

~~~
ryeguy
This is 8 years old.

~~~
shiro
Got to be much older than that---I think Stalin predates R5RS (1998).

I also vaguely remember that the name was chosen as a counter to Dylan
(DYnamic LANguage), but can't find the reference now.

