Just scanning the code, I think showing C++ rocks is pretty much a no-brainer (ducks to avoid food fight) but part of what's happened here, I'll bet, is that you've stayed within the default heap size allocated by the compiler.
It's just a nit, and it's an easy fix too, but I wanted to point that out. The .NET examples are probably doing some mem-safe allocations each trip around, while the C++ is just burning through what it already has.
Also there's another point that needs to be drawn out: your code is much cleaner in imperative because you've solved it functionally first. Most imperative programmers wouldn't write anything that looked like what you did. OO guys would still be constructing object graphs. The language you choose plays a major role in how you solve problems.
As far as the F# speed difference, I've struggled with staying with F# or moving on to OCaml. Right now, I think I'd rather have more libraries and slower speed, so I'm staying with F#. For some reason OCaml seems to be a tougher language to pick up -- the community is a bit scattered and finding help on easy topics isn't easy (at least for me). Plus I like the fact that a lot of stuff developed in Windows for .NET can just plop over in linux and still work. That's worth a bit of speed.
And in any case, if it wasn't, if your code is clean you can move fairly easily between OCaml and F#.
- says from the start that the solutions aren't optimal
- shares the code that gives the quoted results right away
- makes it easy to verify/check/contribute
First thing I notice though is that it's (looking at the functional F# code) using exceptions for control flow. It seems the author uses a 'NoMoreWork' exception as a kind of break out of a loop.
While my F# is rusty, I doubt that this is a good idea, probably neither beautiful nor fast..
Edit: Another couple of comments. Things that weren't immediately obvious to me after reading the blog entry alone:
- The make benchmark passes really just one board with two moves to an executable. Just like the time commands before. So what we're measuring is the startup time of the process (native vs. managed/clr) and the cost to find a single/first move.
- The whole engine thing is designed around the concept of 'I pass the whole board as arguments'. So the driver seems to compute a string representation of the board after each move and _create a new process of the engine_. So - yes, this is a bad idea for managed code. Or anything that could otherwise use JIT.
- You are right about measuring the startup time of the binaries with "time"; but in this case, where the execution time for F# is measured in the order of ten seconds, the comparisons are still valid and useful, especially in the context of seeing what kind of an impact switching to imperative-style has (10->8, 1.7->1.4, etc)
- Passing the whole board as arguments has little (if any) effect on the execution time: e.g. you can see for yourself that if you pass NO arguments (i.e. clean board) the time ratios between languages remain the same. In a game like Score4, the human has to think anyway - and you can see that using C++, even the nasty cmd-line interface leads to response times of less than a second.
- About F# exceptions: in the absence of "break"... can I do anything else to abort a loop early?
Thanks for your feedback, much appreciated.
Although, one way to give some advantage back to F# would be to parallelize your code. Last I checked F# allows you to do this more easily due to constructs like async and agents and .net parallel stuff and in a better way since OCaml has a GIL.
List.map (abMinimax (not maximizeOrMinimize) (otherColor color) (depth-1))
|> List.map snd
You can replace (map |> filter) with a fold.
allData |> List.sortBy getScore |> List.rev |> List.head
I also replaced the sort with a fold... and there was no speed improvement (the lists are so small it made no difference).
Oh well, what can you do? :-)
The minimax looks like:
evaluate = maximize . maptree static . prune 5 . gametree
The alphabeta is more complicated, but also fits well within a pageful.
sh -c "time ./bin.release/score4 o53 y43 -debug"
Depth 7, placing on 0, score:2
Depth 7, placing on 1, score:8
Depth 7, placing on 2, score:8
Depth 7, placing on 3, score:8
Depth 7, placing on 4, score:8
Depth 7, placing on 5, score:8
Depth 7, placing on 6, score:2
Your Haskell code:
sh -c "time ./score4.bin o53 y43 -debug"
Depth 7, placing on 0, score 0
Depth 7, placing on 1, score 8
Depth 7, placing on 2, score 8
Depth 7, placing on 3, score 8
Depth 7, placing on 4, score 8
Depth 7, placing on 5, score 8
Depth 7, placing on 6, score 0
I'll try to see why your code miscalculates on the two borders, but I don't speak Haskell so don't expect much :-)
Results (with gcc 4.2 -03 -mtune=native -DNDEBUG, head Go -B, Ocamlopt 3.12.0 -unsafe -rectypes -inline 1000):
This is on a Macbook Pro 2.53 Ghz Core 2 Duo running OS X 10.5.8.
Go source: http://pastie.org/2199969
(With a trivial optimization, the Go version can be improved by ~13%, but I decided not to do that because it'd be a slightly different algorithm.)
May want to use that instead.
My scoring function must have been weak though. I remember being frustrated that I could still beat the algorithm. It was completely blind to traps that didn't matter in the near term, but that decided the game later when the board was filling up.