Hacker News new | past | comments | ask | show | jobs | submit login
Stalin: a global optimizing compiler for Scheme (github.com/barak)
85 points by malisper on Aug 22, 2014 | hide | past | favorite | 70 comments



Can anyone give an example of one of the optimizations that make this compiler so fast, and how it's connected to the whole-program approach—i.e., what invariants it's able to exploit that aren't available to more traditional compilers?


You could have a look at Siskind's paper, Flow-Directed Lightweight Closure Conversion. Warning, it's not exactly light reading:

  ftp://ftp.ecn.purdue.edu/qobi/fdlcc.pdf
Although the subject is nominally closure conversion, he describes how flow analysis is used to propagate information about values around the compiler's model of the program. Because the analysis is on a whole program basis it need not be very conservative - in a whole program compiler you never have to give up and do something the slow way because a value might be used in a place you can't see.

This aggressive flow analysis gives the compiler enough information to pick nice flat C-like representations, inline objects inside other objects, use more specialised calling conventions, etc.

For example, a whole program compiler might determine that the elements of an array are never compared with eq or have their mutable parts assigned to, which could allow the elements to be inlined into the array. Doing this is tough or impossible in a traditional compiler because the contents of files that have yet to be compiled is unknown, forcing the compiler to assume the worst.

Pretty neat stuff.


One aggressive optimization that differentiates Stalin (no pun intended, ref StalinGrad) somewhat, is its inlining of callbacks. For example a generic numeric multidimensional integral library would take a callback that implements the function that needs to be integrated. This function is typically computed several hundreds of thousands of times in a very tight hot loop incurring a call overhead on each one. In a C library such callbacks cannot be inlined as it is compiled ahead of time. The whole program analysis paradigm helps a lot in determining the callback function and inlining it in place, this triggers more opportunities for more conventional optimizations.

I think a lot of this can be done now with C++'s template metaprogramming, or in fact using D's CTFE which is quite a pleasure to use. Had Java been not so broken for numeric stuff, its late inlining facilities would have helped too. I have played with this a long time ago and at that time nothing I tried, (not even Fortran) could match it for the same algorithm. Although my fortran would have left a lot to be desired. I had very little idea about what I was doing with the piece of Fortran code.

There is a Fortran name for what I was using, it eludes me now, but the integrator was essentially a coroutine which would yield control to the function computer and resume integrating once it was computed. Its quite amazing that Fortran has coroutine facilities, they call it something else.

EDIT:

Java is not smart about using SSE and such SIMD instructions, this is really a pity. C#, F# are better about this unfortunately its a Microsoft only thing, yes I am aware of Mono. The other problem is that JAVA is over specified. Java compilers have very little room to maneuver. For example arguments of a function are supposed to be evaluated left to right, there goes an opportunity for low level parallelization. C compilers makes no such guarantees and can in principle parallelize such instances. The other problem is that unless one uses low level loops, Java produces just way too many temporaries which on one hand stresses the garbage collector and on the other wastes time instantiating and filling the temporaries only to be garbage collected away. In Java the only form of polymorphism is runtime polymorphism via virtual functions. Compile time polymorphism can be a lot more efficient. For example if accessing (i,j)th entry of matrix is a virtual function that all but ensures that your FLOPS will sink like a stone. Have it as a CRTP in C++ all that will get inlined, and if you are lucky unrolled and then SIMD'ized. That said Java numerics has improved quite a bit lately.

Regarding C's whole program analysis, my comment was about the state of the affairs then, but even now owing to the semantics of the language whole program analysis in C is orders of magnitude more difficult than in Functional languages. The latter has much more semantic information to play with. This was supposed to be mitigated some with the introduction of __restrict__ but it did not quite deliver on the promise. But cat'ing the entire code to one file and compiling it does definitely speed up the runtime, poor man's whole program analysis ! Now you do have support for whole program analysis and link time optimization that this is no longer such a necessity.


> In a C library such callbacks cannot be inlined as it is compiled ahead of time.

C compilers can do that with whole-program optimization:

1. Clone the function taking the callback, and set that parameter constant so the value of the callback is known.

2. Change the indirect calls to direct calls.

3. Inline the new direct calls.

Of course, the library has to be statically linked, but that's not a problem for scientific users.

Remember, if you optimize something yourself, just make friends with a compiler engineer and get them to add it!


> Had Java been not so broken for numeric stuff

I'm curious, what makes Java an especially poor choice for numerically intensive computing? Is the JIT penalty too high, or is it something else entirely?


JIT isn't a penalty, if anything late code generation lets you make faster code because hot code paths are known.

My guess is that there are problems with things like the way all objects are on the heap, and probably other things too.


Stalin is a beautiful proof of how much a compiler can optimize even "dynamic" languages under a closed-world assumption. This works because the compiler basically just infers all the types (EDIT: that it can, and that's sometimes much more precise than would be expressible in C, hence more optimization opportunities)

Another example of that, to me, is the google closure compiler for javascript (in advanced mode).

The PR situation stemming from stalin's name is a bit unfortunate, but you can always fork and rename it, if you want to work with it, right?

Also, I'm pretty glad that I live in a time where people are free to argue whether jokes about genocidal dictators are acceptable and in the place where I can read them doing so, from the other side of the globe.


I (think) stalin isn't maintained since forever; how does it compare to contemporary compilers? Are there any similar projects for other dynamic languages?


According to this blog post from 2009[0], it was slightly faster than C compiled with the GCC and almost 15x faster than Chicken Scheme. I have tried looking for more rigorous benchmarks but have been unable to find any.

[0] http://justindomke.wordpress.com/2009/02/23/the-stalin-compi...


The closest is probably MLton, a compiler for Standard ML.


Named, I'm assuming, for John Milton? That so-called poet and all-around horrible person that even Samuel Johnson considered an "acrimonious and surly republican"? It's his Paradise Lost we have to thank for the scourge of "blank verse"! Oh, the humanity! Also he supported and served under Cromwell, of all people! Surely we can throw this horribly-named compiler on the dustpile of computer science history!?


Milton never murdered between 20 and 60 million people.


Only because there weren't that many people in the vicinity! A large portion of those who did have the misfortune to be there got killed, though; estimates are that about 15-25% of the Irish population was killed. https://en.wikipedia.org/wiki/Cromwellian_conquest_of_Irelan...


Pretty sure it was a joke.


Standard ML is not a dynamic language.


No, but it does whole-program optimization.


The topic here is seminal work on whole-program compilation. Comments about this work, or about whole-program compilation or Scheme compilation in general, are welcome.

Indignation over the name is off-topic. So is indignation about indignation over the name, and so on.


Excuse me a moment. You can say that "indignation about the name is off topic", that doesn't make it so. You are twisting language to suit your whims.

If you had the guts you could say that indignation about the name is not welcome, or will be censored, or is a distraction and an uninteresting subject here on HN. But the project has a name, and the connotations of that name are relevant, as with every project. If the project were named "the impeach obama compiler" or "darren wilson" or "gas the jews" the idea that issues related to the name were "off topic" would be even less tenable than it is now.

I enjoy the maturity of the commenters on HN, and of the mods as well. But please have the intellectual honesty and backbone to be upfront about what you're doing.


Downvoted for this? Thankless job indeed.

Thank you.

(I will say, substantive discussion for a topic like this seems hard. The people really interested in this are those who haven't heard of it, which means they will have little input. From what I've seen, topics like this usually grow slowly as experienced people post interesting anecdotes or general PL tangents, and others hop on).


Is there anyway you, or some other moderator, could repost this in an environment that allows a proper discussion? Something like enabling the "pending comments" system.


The best thing would be to post substantive on-topic comments which other users could then upvote. The absence of solid discussion leaves an opening for ephemeral conniptions.


What does "LaHaShem HaAretz U'Mloah" and "Tam V'Nishlam Shevah L'El Borei Olam" at the begging and end of source files?


That's transliterated Hebrew. The first one means "The world and everything in it belongs to God", the second "The end, praise God, creator of the world."

No idea why they are there.


Yes. Those are traditional mottos from the way Jewish religious books are used. My wild guess is that they are more likely a sort of inside joke here, but it is conceivable that they are intended in all seriousness. Anyway, a person may write in the beginning of a book not only their own name, to mark their ownership of the book, but also that motto averring in humility that despite human claims to possessions there is one true owner of all the world. The second motto is used by the author of the book to express not only the end of the book's text, but also the author's hope that the goal of the book is worthy and is fulfilled.


A long time ago I actually tried to compile this and it took forever. I think I will try it on my new computer.


It is really sad to see commentary around such a marvel that is Stalin to be exclusively around the political correctness of its name.

Stalin is an amazing piece of software. Its quite unbelievable how much it can optimize code written in a dynamically typed language like Scheme to give C code run for its money on speed for things like numerical integration etc. All those who keep chanting the trope that you need to get low level for speed, that high level languages cannot be fast, would do well to study it. I am glad to see it on Github.

Compile times are near epic, but run times are worth marveling, if not for anything else but to see that such a thing is possible

@bio4m Can we drop that sanctimonious tone please. @vomituddle does have a point, it tickles me a great deal to think USA, a country responsible for the political extermination of one ethnicity and the enslavement of another has this chip on its shoulder as if they are the shining epitome of freedom/democracy that everyone should bow to. Christsake ! get real. Had Hitler won, it is not inconceivable that the Godwin's law would have been quite different, it might as well have picked some US slave runner founder.

Finally, had Stalin been not named Stalin how else would one have StalinGrad, a derived tool for gradient based optimization. Its worth it many times over just for that pun.

EDIT: Yeah the downvotes and flags, they solve so many problems. Funny to have one of the few first comment about the software to be voted and flagged down below vapid calls to political correctness.

@scarmig that lynched guy was someones grandparent too, same applies to native Americans women and children burned in their enclaves with escape routes sealed (if you, and I dont mean scarmig in person, are not familiar its quite instructive to find out how the mystic river got its name, such incidents were hardly one off). I myself have family ties with one of the more recent mass upheavals (if I may avoid the genocide term) but from a different part in space and time. On a somewhat different note, its a strange small world, Stalin, Hitler and native Americans ! I remember your comment from a few days ago.


One person's political correctness is another's "that man murdered my wife's great grandfather."

Yes, really. "And you lynch negroes" notwithstanding. (And, yes, I'm also that guy who is more than happy to puncture American propagandistic historiography.)


Was 'Hitler' already taken or something?


Yeah, that's the Homo-Iconic Templating Language for ERlang.


Couldn't actually tell if you were joking. Relieved to discover you were...


oh you nerds.

HN doesn't even let me post the alternative names as the post gets autodeleted. so some names are really, really bad - but hey, let's offend people for fun.

as if the whole sexism debate wasn't enough.

as to why it is more offensive than genghis khan, i guess it needs to be explained to the ignorant neckbeard crowd: there are enough people still alive today that suffered due this mass murderer.

if even russia does not use his name for say naming an aircraft carrier (as opposed to g.w.bush) you should know it is bad.

this name was picked on purpose, to offend. not a common name, as some other ignorant shits in here have claimed.


The name is a joke, from wikipedia, "Stalin brutally optimizes". Very much true. You know, it would be quite an undertaking to find russian tech guys offended by this name.

And I've also found https://github.com/n4kz/stalin.js by a russian guy - Joseph Stalin => stalin.js - Badass mustache compiler


russians not offended is one thing, you'll find more people offended by it in the surrounding ex-colonies.

just like the people most offended by nazi glorification do not live in germany. just like people offended by war japan glorification do not live in japan. just like people offended by confederate glorification are not white. sexism, not male, etc etc etc.

is it that hard of a concept? empathy towards others?


It's geek sociopathy at its finest. Or maybe that's the "Oh, come on, stop talking about the deliberately chosen name that could have been anything else and talk about the technical details!" attitude.

It's especially farcical given that without the name and the resulting offended people, butthurt Russians, and random assholes yammering about "political correctness", this post would have probably never made the front page.


Holy crap HN has its tight underpants on this evening. Relax folks!


In the "name" of millions and millions and millions of dead and those brainwashed through terror, I gotta say no, don't relax. It's not funny in the least.


There are a hundred other causes you could campaign against, yet you choose a compiler that you undoubtedly will never use for a language you probably don't even know. Even if you are, in the one-in-a-million chance, an avid enough scheme user to want an optimized compiler, then just don't use this compiler and move on. Stop trying to create a witch hunt for something that won't matter to you in 24 hours.


You'll have a pretty hard time if you ever decide to visit China.


Not relaxing doesn't help anybody and not making a single thing better.


Stalin is also just a Russian surname; probably craploads of them in the Moscow phone book.

Randomly googling for "Igor Stalin", I found a Facebook page and YouTube channel.

Are there famous Stalins besides the well-known tyrant? Wikipedia informs me that "The Stalin" were a Japanese punk band, and some rapper called Jovan Smith took on J. Stalin as a stage name, and an Indian politician called M. K. Stalin. All of these names are derived from the infamous dictator, though. So it's very hard to connect anything which uses that name to anything else.

"The Stalin" chose their name evidently because "Joseph Stalin is very hated by most people in Japan, so it is very good for our image." LOL, facepalm


Stalin is not a surname. It is a nom de guerre that means “man of steel”.


I see. "Nom de guerre".


Naming something, even in jest, after one of the most vicious mass murderers in human history is just shameful. Most historians agree that Stalin is responsible for 20 to 60 million non-wartime deaths.

Making jokes about any mass murderer trivializes their victims.


There's a restaurant down the street from me named after Genghis Khan. There's a craft brew named after Tamurlane. I could go on and on.


There's a huge difference: there are a lot of people alive today with relatives whose lives were cut short by twentieth-century mass murderers.


Not to be pedantic, but that is true of Genghis Khan as well.


People make jokes about Hitler all the time. I've seen him appear in various comedies and sketches. I have yet to feel like the holocaust was trivial though.


Making jokes about any mass murderer trivializes their victims.

Uh, no. Comedy is the one realm within human discourse where we're allowed to discuss things and the normal rules don't apply. This has been the case since time immemorial and it's not about to change now.


Can't we just maintain our sense of humor about things like this and move on? The project is old, the dictator is dead... And its not like naming a project after Stalin is "endorsing" him. It's merely personifying the potentially boring nature of program by calling it "brutal" and "aggressive".

Now I have to think about whether I would be ok with other potentially offensive names... Boko Haram, the packet capturing tool. It even has the 'Islamic State' functionality that removes HTTP headers from their respective packet bodies. Hmm..


just wait until I unveil my Caliphate static analysis tool next week


Does anyone know how this compares to SBCL ?

Pity about the name though. It would've been more appropriate to name it on one of the "victorious" tyrants of our age. Perhaps one of "Columbus", "Churchill", "Reagan", or for that touch of infamy "Nixon". Pfft.


SBCL doesn't currently have whole-program optimization. Its focus has been on standard Common Lisp, which is quite dynamic and can't make the kind of closed-world assumption used by whole-program optimization (i.e. we know the entire program at compile-time). It has quite a lot of optimizations, but they operate function-at-a-time.

Its predecessor, CMUCL, did have a mode called "block compilation" that has some similarities: http://common-lisp.net/project/cmucl/doc/cmu-user/compiler-h.... It isn't currently present in SBCL; it was axed as part of the CMUCL->SBCL cleanup. I've occasionally heard suggestions of adding similar functionality to SBCL, but I'm not sure there's any active work on it.


'whole program optimization' is also kind of unpractical, since these compilers are typically very slow.

Lucid CL had two compilers, a fast incremental one and a slow one, capable of block compilation. KCL (the early Lisp to C compiler) used block compilation. LispWorks should also support it in some form.

Usually 'block compilation' in Lisp means compiling a set of functions together, instead of compiling each function individually. This block compilation for example then can reduce function call overhead.

In its more primitive for a single file of source is seen as a block. But compilers also support block compilation of several files together.

Block compilation in Common Lisp was and is used mostly for application delivery, where the program is static and dynamic features can be removed. Thus a delivered program is more static, but also 'faster'.


> Block compilation in Common Lisp was and is used mostly for application delivery

The main place I've personally encountered people who care about block compilation in CL isn't so much delivery (as in, to an external client) as getting it ready to run "for real", mostly numerical stuff that is being prepared to run in batch mode on a giant dataset. I guess that's a kind of deployment though, even if the operator and developer is the same research group.

I could be imagining things, but one downside I think I see in not having block compilation in SBCL is that people writing numerical code targeting it have a tendency to pack as much as possible into flet and labels, using one big function as the poor man's compilation block (since within a single function most of CL's dynamic features aren't in play).

I tend not to like code like that when hacking on it. Besides often reading less clearly and sometimes leading to duplicate or near-duplicate code, separate functions are more flexible: they can be dynamically redefined while tinkering with things, they can be called on their own, etc. Of course, that's precisely why the compiler can't perform certain optimizations on full calls. But I think people would be more free with the defun if they knew the compiler would take care of optimizing everything when deployed.


> as getting it ready to run "for real"

That's often called 'delivery' in Lisp. It means creating an executable application or library. It does not mean the actual act of shipping something to external clients.

If you compile your code to create an application which then is used in 'production' (internal or external), this is the process of 'delivery'.

http://franz.com/support/documentation/current/doc/delivery....

http://www.lispworks.com/documentation/lw61/DV/html/delivery...


That's a terrible name. The man committed genocide and numerous other crimes against humanity. Using that legacy to try and get a laugh is horrible.


> The man committed genocide and numerous other crimes against humanity.

the amerikan founding fathers have committed acts of genocide (and numerous other crimes) against Native Americans and Africans and yet someone decided it would be a good idea to print them on dollar bills


Since when has losing an arm justified losing a leg?


Well, that is not the point though. He is just pointing out the hypocrisy.


Then I guess I missed something. It seemed like they were downplaying genocide because people who committed it are printed on dollar bills.


I think the negative reaction with has to do with his overly aggressive mustache, not so much the intentional murder of 30-40 million. Not that numbers, scale, proportion, context, or substance should matter to this crowd.


"Stalin - a STAtic Language ImplementatioN", "stalin _brutally_ optimizing Scheme compiler", "An _Aggressive_ Scheme Compiler"

The naming seems like a stretch to me, and I don't appreciate the theme to be honest. If you're going for any sort of recognition of the brand then this is going to get buried under results for Joseph Stalin.

Very poorly named I would have expected something better coming for Purdue University. I'm sure they wouldn't appreciate this much either.


There's plenty of bad taste wandering around Purdue's campus. Most recently, multiple professors were criticized for joking about the Cousins shooting as it was happening.


It is even less appropriate in the light of the fact that Stalin was one of the most brutally incompetent tyrants ever.


awesome name!

would like to see more of this for hardcore stuff like this:

eternal virgin

smelly neckbeard

aspy social retard

and then, let's make it clearer in case my sarcasm needs a tag :

were nigger, kike, wetback already taken?

yes, words matter you fucking morons. needs to be explained to a profession that relies on precise language aka code to accomplish anything.


Oh HN, only whines about lack of political correctness, nothing technical. Sometimes you are better.


Please change the name.


This is 8 years old.


Got to be much older than that---I think Stalin predates R5RS (1998).

I also vaguely remember that the name was chosen as a counter to Dylan (DYnamic LANguage), but can't find the reference now.


It's over 20 years old. Doesn't quite predate the other Stalin though.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: