

Haskell Snap Framework templating 3000x faster with new release - LukeHoersten
http://snapframework.com/blog/2012/12/9/heist-0.10-released

======
jlouis
5 microsecs translates to 5000 nanos. Assuming an average CPI of 0.5 we have
10000 instructions to render the template. Or 30-50 main memory fetches. This
isn't too shabby.

But the numbers before the improvement. Ouch. Those were bad.

~~~
Peaker
I'm not sure measuring the number of _CPU instructions_ makes much sense
anymore.

In my experience micro-optimizing code, the majority of time is spent waiting
for memory (bandwidth waits or even worse: latency costs).

~~~
jlouis
Precisely. This is why I mentioned the 30-50 mems. A typical main memory fetch
is somewhere in the 100-200ns range on modern hardware. I tend to measure the
effectiveness of an algorithm based on, roughly, how it accesses memory.

Caches play a role as well, of course. L1 is around 1ns sometimes even less.
if you _hit_ L2 it is in the vicinity of 5-10ns. It easily translates to some
30-40 insns on modern hardware.

It also tells you that hunting for faster execution by compressing
instructions down is not going to cut you that much extra speed nowadays. The
key to fast programs is data representation. Good data representation.

------
lrem
I wanted to ask how is that even possible, then read:

> However, we realized that a lot of the transformations could be done at load
> time and preprocessed to an intermediate representation.

Lazy computation has its disadvantages. Still, impressive gain.

~~~
icarus127
> Lazy computation has its disadvantages. Still, impressive gain.

After reading the article I don't see anything about lazy evaluation. What am
I missing?

~~~
mcherm
I don't think you ARE missing anything. From my reading, the gains are
obtained by pre-computing the string concatenation a single time (rather than
while rendering) for all cases in which it can be.

It's similar to moving something from runtime to compiletime (although those
distinctions don't quite apply).

~~~
SilasX
>I don't think you ARE missing anything. From my reading, the gains are
obtained by pre-computing the string concatenation a single time (rather than
while rendering) for all cases in which it can be.

Lazy evaluation: waiting until a computation _needs_ to be done to perform it.

Problem here: it was inefficient to do a particular computation at the very
moment before it was needed.

So, how is this not a lazy evaluation problem?

~~~
dons
Laziness is a specific property of how variable binding and application works
in a language, which is not at issue here.

Unless I am mistaken, they didn't change the template language evaluation
strategy from call-by-name to call-by-value.

They did change the implementation from an interpreter to a compiler, though.

~~~
lrem
I had the same line of thinking as SilasX. Conceptually, the change is that
instead of deferring work to the last moment available, it is now done
immediately. This is the very difference between lazy and eager computation.
As I'm not a Haskeller, I didn't immediately realize the strong connotation
with language features. Sorry for the confusion.

------
primigenus
I find it interesting that Heist looks to be inspired by or based on XSLT, yet
XSLT is not mentioned anywhere in the documentation. Is it just a happy
coincidence?

~~~
mightybyte
My inspiration for Heist came from Lift's template system. That and FBML.
Heist is essentially a generalized system for building domain specific markup
languages.

------
LukeHoersten
Independent of Haskell, Heist is one of my favorite HTML/XML template engines
so it's great to see such huge advancements.

~~~
riffraff
could you expand on why is it that you favor it?

I ask because I like the idea of a stateless xmlish template language, but I
wonder what this offers over the zillion existing solutions.

"Separates view and business logic" and "enables DRY design" are valuable
goals, but most template languages have them.

~~~
LukeHoersten
Good question. A few reason:

1\. Heist allows you to define your own HTML/XML tags in the host language
(Haskell in this case). This means you're only dealing with (an extended) XML
document when doing layout and design so all the normal XML tools still work.

2\. Some popular template engines _try_ to separate logic and design but end
up letting you cheat a little. Any time you want/have to cheat and put logic
in the template, it really was a shortcoming of the template engine. In Heist,
you can't cheat but you never want to.

3\. The reason #1 and #2 work is because Heist's "recursively applied splices"
is just the right abstraction. My HTML templates end up looking just as pretty
as well factored Haskell code. Heist makes the perfectionist in me happy.

In short, I would say you're right here: '"Separates view and business logic"
and "enables DRY design" are valuable goals, but most template languages have
them.' But just because most template languages have these goals doesn't mean
they've achieved their goals. Heist, in my experience and opinion, does
achieve these goals.

~~~
mifrai
From their compiled heist docs [1]:

 _There are two things that compiled Heist loses: the ability to bind new
splices on the fly at runtime and splice recursion/composability._

I haven't checked or read the doc thoroughly, but if it's what I think it
means - all we get is hierarchal splices. Which is still a lot, but it's not
quite as magical.

[1]: <http://snapframework.com/docs/tutorials/compiled-splices>

~~~
mightybyte
We still keep some of the magic by allowing you to run the old "interpreted"
style splices at load time. These don't have access to dynamic data, but they
do have recursion/composability. This combination retains most of the power
while allowing a huge speed increase. It just means that to take advantage of
both you have to structure things in a certain way.

At this point it seems to me that this structure also ends up being a
desirable one for organizational reasons. But the jury is still out as far as
whether there will still be reason to want more. We're aware that there might
be good reasons to support this extra power and I have a pretty good idea of
how it would be implemented. But I want to get more people using it in the
real world before we address that issue.

------
diggan
"Built for speed from the bottom up. Check out some benchmarks." from the
frontpage of snapframework.com

"When we originally wrote Heist, speed was not our goal." from the link
submitted here

Feel like a contradiction, couldn't they just say that it turned out fast
enough on the first try?

~~~
dbpatterson
I think that comment is referring to the framework / server, which is quite
fast. Heist is a templating system authored by the same people that was not
intended to be fast (though now it is). It is somewhat confusing, but I don't
think the work they did on the framework/server (which is totally separate
from Heist) should be discounted as "fast enough on the first try."

~~~
mightybyte
Correct. We always marketed Heist as a more experimental part of the framework
as a whole. The server and associated API was initially our primary focus.

------
andrewcooke
so why did the api have to change? couldn't the compilation be done on first
use? is the api change simply a change of the encapsulating monad (guessing
wildly)?

not trying to bash haskell, but i think there's an interesting q about how
well it (or any other language) can hide changing implementation details
(particularly major ones like a compilation phase) behind an unchanging api.

or maybe that would have been possible, but the api changed for other reasons
(the general cleanup)?

really interesting article btw. would have loved more detail... an explanation
of introducing compilation in haskell with example would be pretty cool
(pretty sure either pg or norvig has written one - with lisp - that i vaguely
remember reading years ago).

~~~
mightybyte
The changes were significant because Heist isn't just an API. It's an
inversion of control where you provide routines that get run for various parts
of your DOM. They used to be functions that took a node and returned a list of
nodes. In order to do the optimizations that we wanted to do we had to change
the type signature of the callbacks that the users write to something that
took a node and returned a special data structure.

We actually did preserve the old API, so you can actually migrate without
making significant changes to your code. Most of those changes are because of
the general cleanup. So maybe my statement about big breaking changes was
misleading. They're big breaking changes IF you want the performance increase.
Otherwise things still work the way they did before. In fact, the process of
implementing this refactoring impressed upon me that the old paradigm was even
more important that I initially realized.

If you're interested in more detail, check out the rest of the docs linked at
the end. They describe the concepts in more detail with a focus on how to use
them. In January I will also be giving a presentation to the New York Haskell
Users Group (<http://www.meetup.com/NY-Haskell/>) about some of the things I
learned while implementing this new approach and merging it back into the
original Heist code base.

------
papsosouid
But this doesn't apply to splices that need data at runtime, like say pulled
from a database right? Isn't that typically going to be 95% of your splices?
The performance increase seems a bit overstated if it only applies to splices
that are just simple substitutions.

~~~
mightybyte
This might apply to 95% of splices, but not 95% of your template. This
particular benchmark does show the best case, but the typical case of a few
dynamic splices will not affect things much because the page is still getting
converted into a concatenative style and a ton of the splice processing of
things like <bind>, <apply>, etc is happening at load time.

