Something I took for granted for the longest time about Haskell (which remains the only language I know of with the feature) is the ability to write user-defined "rewrite rules". You can say, "okay, GHC, I know for a fact that if I use these functions in such-and-such way, you can replace it by this instead".
foo x (bar x) =
Now, anywhere a user of this code writes something like
foo [1,2,3] (bar [1,2,3])
You can also "specialize" by using faster implementations in certain cases. For example,
timesFour :: Num a => a -> a
timesFour = a + a + a + a
timesFourInt :: Int -> Int
= rightShift x 2
timesFour :: Int -> Int
High-performance Haskell libraries like vector, bytestring, text, pipes, or conduit capitalize on this feature, among other techniques. When compiling code written using libraries like this, this is how it goes:
- rule #1 fires somewhere
- it rewrites the code into something that matches rule #2, "clearing the way" for it to fire
- rule #2 fires
- rule #3 fires
- rule #1 fires again
- rule #4 fires
and so on, triggering a "cascade" of optimizations.
The promise of Haskell is that we already have a "sufficiently smart compiler": today, with good libraries, GHC is capable of turning clear, high-level, reusable functional code with chains of function compositions and folds and so on into tight, fast loops.
I must add, though, that getting rewrite rules to fire in cascades to get "mad gainz" requires one to grok how the GHC inliner/specializer works.
Data.Vector also utilizes an internal representation that makes fusion explicit and hence predictable (inevitable, even) called a "bundle":
but this relies on rewrite rules too, e.g. the previous module contains this rule:
"zipWithM xs xs [Vector.Stream]" forall f xs.
zipWithM f xs xs = mapM (\x -> f x x) xs #-}
It is very interesting, though, and is a good avenue to explore. I've got some reading to do :)
In practice, most of the libraries that include tons of rewrite/specialise/inline (ab)use are either "core" libraries (like vector/bytestring) or have a large userbase (e.g. Conduit), and rules don't really change too much from version to version, so this has never actually had the detrimental effects that "silent replacement of code" might have in the worst case.
This might sound similar to apologetics for dynamically-typed languages: the only real answer to your question is that rewrite rules are ultimately a way to improve performance, and they come with caveats usually associated with replacing clearer algorithms with faster ones. (I'm thinking of the adaptive sort from the C++ STL, which iirc uses different sorting algorithms for differently-sized containers to maximize expected performance. It's not exactly intuitive that vectors of size 100 and vectors of size ~10000 are sorted differently, is it?)
Of course, the only verifiably-correct way to do these things is through the use of machine-checkable proofs of the transforms. The well-known "foldr/build" rule that many optimizations of linked-list functions reduce to, for instance, has been proven outside of GHC, and there are usually similar proofs for other rules known to the library author. The appeal of dependently-typed languages is how they are a nice vehicle for the inclusion of those proofs in library code: if the compiler can't verify that your rule makes sense, it can complain instead. You then end up supplying a proof, or saying "just believe me, okay?" ;)
: "In practice"? Seriously? Next thing I know I'll be parroting "in the real world" and so on...
> Using just these two rules, I saw a 225% increase in the number of nodes processed per second. Under the right circumstances, GHC can even outperform C code by applying these kinds of inter-procedural optimizations.
Also, we have QuickCheck, which allows us to check properties by randomly generating testcases. The post I linked mentions this.
let x = (await grabSomeIntegers()).map(double).filter(biggerThan(3)).map(substract(1));
How do Haskell library authors deal with that?
x = do
ints <- grabSomeIntegers
return (ints & map (*2) & filter (> 3) & map (-1))
x = fmap (map (- 1) . filter (> 3) . map (* 2)) grabSomeIntegers
... right? Nope, that doesn't have to be the case: Haskell supports more than just your $grandparent's parameterized types!
The vector package uses something called data families to have different in-memory representations for vectors of different types, without "letting go of the DRY" at all.
Here, Vector a is a data family: how a value of type Vector a "looks" in memory depends on a. So you'll notice a bunch of lines of the form
newtype instance Vector Int16 = MV_Int16 (P.Vector Int16)
Written using vectors, code like in your example is perfectly DRY; it would look like (this is untested code!) this:
combobulate :: Num a => IO (Vector a) -> IO (Vector a)
combobulate = do
as <- grabSomeIntegersFromNetwork
return (ints & V.map (*2) & V.filter (> 3) & V.map (-1))
The key is the data families: the types expose a simple, consistent API, hiding the actual guts of the memory representation from the user. The library author is free to perform whatever ugly/evil/unsafe manipulations she feels like behind the scenes on the internal representations, or not.
 Defining numeric operations on animals is left as an exercise to the careful reader.
The specialization example is the same as template specialization in C++.
> The specialization example is the same as template specialization in C++.
This sounds very interesting to me, because you're saying it's accomplished entirely in the type system: do you have an example or a reference you can point me to?
I'm guessing there's a type-level tag for an algorithm, and the compiler is told how to combine tags corresponding to a sequence of operations on vectors that looks a certain way. Something like do_the_thing<t, DotWith<Self>, HasFastNorm<t>> = do_the_thing<Norm<t>>? (just thinking aloud in pseudo-C++, nothing to see here, move along) In that case, it would be closer to something using a function on types, which Haskell has in the form of type families.
What you have towards the end of that post is actually something like
type family Vector a where
type instance Vector (Ptr a) = ...
type instance Vector a = ...
I still don't know how the linear algebra/Eigen examples might work, though.