
Stockfish Wins Chess.com Computer Championship - wglb
https://www.chess.com/news/view/stockfish-wins-chess-com-computer-championship
======
tromp
Unfortunately, Stockfish is not doing too well in the more recognized TCEC
unofficial world computer chess championship at
[http://tcec.chessdom.com/live.php](http://tcec.chessdom.com/live.php), due to
lackluster performance against the weaker engines. With just over 3 rounds to
go, it looks like it's out of the final:

    
    
      N Engine           Rtng  Pts  Gm     SB Ho   Ko   St   Fi   Ch   Gi   Bo   An  
    
     1 Houdini 6.02     3184 17.0  25 187.25 ···· =01  ===  ==1= 111= 1=1= 1=1  =1==
     2 Komodo 1959.00   3232 17.0  25 183.25 =10  ···· ==0  ==1= =1=1 11=  =1=1 1=11
     3 Stockfish 051117 3228 15.0  24 173.25 ===  ==1  ···· =1=  ==== =11= =1=  1===
     4 Fire 6.2         3112 13.5  25 152.25 ==0= ==0= =0=  ···· ==1  ===1 =1=  1==1
     5 Chiron 251017    3013  9.5  25 114.00 000= =0=0 ==== ==0  ···· ===  ==== === 
     6 Ginkgo 2.01      3052  9.5  25 110.00 0=0= 00=  =00= ===0 ===  ···· ==== 1== 
     7 Booot 6.2        3091  9.0  24 104.75 0=0  =0=0 =0=  =0=  ==== ==== ···· === 
     8 Andscacs 0.921   3100  8.5  25 107.25 =0== 0=00 0=== 0==0 ===  0==  ===  ····
    

It had better win its current game against Booot to keep some very slim
chances. Which it now seems to be doing...

~~~
tephra
Worth noting that Stockfish is actually the only engine that hasn't lost a
game. The lackluster performance is it just not beating the weaker engines,
something that Komodo and Houdini have done (but they have also lost against
stronger engines).

~~~
umanwizard
So can we conclude that Stockfish just plays more "drawishly" than the other
top-tier engines? Or is this effect just random chance so far?

~~~
thom
This is true, Stockfish expects its opponent to play as well as itself
(outside of the 'contempt' setting, not sure if that was used or tuned per
game). So it won't play edgy lines that invite blunders if they are even a
centipawn worse than what it thinks is the 'correct' line.

Funnily enough, I feel there's a lesson for startup people in this. I have
seen many people (myself included) talk themselves out of big, bold bets just
because they are very good at thinking up reasons something should never work.

~~~
edraferi
Oh man Now I have to add a “contempt” parameter to everything I make!

------
glinscott
Congratulations to Stockfish! The community is amazing, and the patches keep
on flowing. The sheer number of ideas is pretty incredible. If you are
interested in contributing, head over to
[http://tests.stockfishchess.org/tests](http://tests.stockfishchess.org/tests).
You can submit a test, and it will be run by the virtual cluster of user
donated machines.

It's been over four years since I put fishtest up, and in that time, there
have been over 20,000 tests submitted. The really cool thing is that this
distributed testing framework is only possible with an open source engine. So
instead of being a disadvantage (everyone can read your ideas), it turns into
an advantage!

~~~
octalmage
This is super cool! Are there any posts about how the fishtests work? I love
reading about how people solve interesting testing problems.

~~~
glinscott
There are a ton scattered around :).

Here is the announcement of fishtest on the talkchess forum:
[http://talkchess.com/forum/viewtopic.php?t=47885&highlight=s...](http://talkchess.com/forum/viewtopic.php?t=47885&highlight=sprt)

Initial discussion of the introduction of SPRT into fishtest, which led to a
dramatic increase in our ability to measure improvements in self-play, in a
statistically sound manner:
[https://groups.google.com/forum/?fromgroups=#!searchin/fishc...](https://groups.google.com/forum/?fromgroups=#!searchin/fishcooking/sprt|sort:date/fishcooking/cgMSMBUC7JQ/fkdjLTiDuwMJ)

SPRT background here:
[https://en.wikipedia.org/wiki/Sequential_probability_ratio_t...](https://en.wikipedia.org/wiki/Sequential_probability_ratio_test)

Basically, we use a two-phase test to maximize testing resources. First a
short time control test (15s/game), using more lenient SPRT termination
criteria, then, a long time control (60s/game) test using more stringent
criteria. That combined with setting the SPRT bounds to allow us to measure
2-3 ELO improvements has allowed the progress of Stockfish to be almost only
improvements. Previously when developing an engine, you'd make 10 changes, and
if you were lucky, 2 or 3 would be good enough to make up for the other bad or
neutral ones.

If you look at the graphs on [http://www.sp-cc.de/](http://www.sp-cc.de/), you
can see that it just keeps getting better, one small improvement at a time.

------
sethbannon
Incredible that now the best engine (Stockfish) is open source engine and the
best server / app (Lichess) is also open source. Well done chess community!

~~~
a13n
Why is Lichess better than chess.com? I've used both and can't really say one
is the "best".

~~~
colmvp
I really wish I could customize Lichess to have an aesthetic similar to
Chess.com chessboard/pieces. I prefer Lichess, but what I have found with
Chess.com's aesthetic it sounds and looks similar to an old classic board but
not in a super cheesy way whereas the iconography used on Lichess feel very
stockish.

~~~
thom
The Lichess mobile app, at least, has themes that can make it as ugly as
Chess.com, if that's what you're after. :)

------
Someone
_”One major facet of the tournament format was that there were no opening
books”_

Code and data are equivalent (for example, instead of an opening book that
says “if you play white, open with pawn E4” one could have an evaluation
function that says “after the first move of white, having a pawn on E4 is
worth a million points”), so how did they enforce that rule?

~~~
chki
Hardcoding an opening would be considered playing with an opening book.

Regarding the enforcement of this rule: As all chess engines that competed are
pretty well known (and sometimes even open source), I'm pretty sure that they
(rightfully) trust the programmers not to do anything against the rules.

~~~
gamegoblin
I wonder if someone were to come up with an neural network based, AlphaGo-
style engine that was competitive with these other top engines, how they'd
deal with these sort of rules. An "opening book" of sorts would likely be
baked into the neural nets, unable to be turned on or off.

AlphaGo, for example, has definitely played well-known human openings (and
innovated on them) despite having no opening book.

~~~
Recursing
All these top chess engines definitely play well-known human openings and
innovate on them, they just happen to be the best sequences of moves in
certain situations (as far as we know)

The rule is against "explicit" hardcoded opening books, to force the engines
to calculate the best move each time, to encourage variation and (I think) to
allow the developers to focus on building stronger engines instead of managing
huge opening books.

The encourage variation part didn't really succeed in this format, as they all
tend to converge to the same openings

~~~
thom
It's sort of funny to consider that our opening books are the product of
several hundred years of incredibly slow, inefficient Monte Carlo tree search
done by humans playing over wooden boards. Seems odd to deny that to engines
that can knock it all out in a weekend anyway.

~~~
slphil
This isn't true, though. There are a lot of openings where the humans turn out
to be right anyway, even though the computer thinks it's found a marginally
better move. Humans have a better intuitive understanding of how certain
openings create endgame possibilities, and if I remember correctly the
combination of a Super GM and an engine is still markedly stronger than an
engine alone.

Engines have contributed substantially to modern opening books but they
haven't supplanted the existing knowledge. Humans turned out to be wrong about
many sharp lines (which were refuted by computer) and the computer can find
really interesting ideas in many positions (which would be nearly impossible
for a human to find) but the old human-approved Best Openings are still
standing tall after the engine revolution.

~~~
gamegoblin
> I remember correctly the combination of a Super GM and an engine is still
> markedly stronger than an engine alone.

This may have been true when engines were still only marginally stronger than
humans, but I haven't seen any evidence that is currently true. A few years
ago Nakamura + Rybka (previous best program) lost to Stockfish.

~~~
Recursing
At the time Rybka was not one of the strongest engines anymore, and
"correspondence chess" (human + computers) is still played.

The strongest players are not GMs, as far as I know, and a very important part
of those games is trying to force positions where the opponent's engine might
make a slight mistake

Here is an interview with the world champion
[https://en.chessbase.com/post/better-than-an-engine-
leonardo...](https://en.chessbase.com/post/better-than-an-engine-leonardo-
ljubicic-1-2)

~~~
thom
Do you know of any evidence that these players+engines can beat engines alone,
instead of each other?

------
chesscom
Keep in mind, this was a “Rapid” event with quicker than normal time controls.
It was meant to be an exciting event for fans and to test the engines at
quicker speeds that humans can both follow and enjoy. It certainly produced
exciting games (in my opinion)!

------
Jabbles
Without wishing to denigrate the achievements of all the people involved in
this tournament, nor the entertainment aspect I'd like to ask:

What is the statistical power of a 90-game round robin? Would this be a
publishable result with p < 0.05 (or the new 0.005) against the null
hypothesis that Stockfish and Houdini (2nd place) were of equal skill?

~~~
ykler
I don't know if you can really put a p-value on this result without a more
specific null hypothesis, but anyway it looks like this tournament result
provides extremely weak evidence that Stockfish is better than Houdini. In the
round robin component, Stockfish and Houdini played two games against each
other, each winning one and losing one. In the "superfinal", they had 15
draws, Stockfish won 3 games, and Houdini won 2 games.

------
nindalf
> That said, there was some serious computer science behind the event, as each
> engine played from a powerful Amazon Web Services computer.

Love the subtle product placement :D

------
tim333
I wonder how a machine learning approach like AlphaGo zero would do.

~~~
ZirconiumX
We actually had an engine which was based on neural networks, called Giraffe.
It used an alpha-beta search with a neural network evaluation. The Computer
Chess Rating List put it at 2500 Elo, which is a strong human level. It was
very slow, searching thousands of positions per second compared to the
millions that even weak programs can do, but it's widely agreed that the NNs
were worth about 300-400 elo - if Giraffe could search at Stockfish speeds,
and given the rule of thumb that a doubling of speed is worth 70 elo, that's
2500 + 70 * log2(3000000/3000) = 3270 elo. That puts it in the top 20.

Sadly the developer was pinched by Google to work in DeepMind. We suspect this
was to help work on AlphaGo.

~~~
supermdguy
Just curious, who's "we"?

~~~
ZirconiumX
I was referring to the computer chess community as "we". My apologies.

------
gregorymichael
If you're into chess, I'd encourage you to check out the last of three
annotated games in that article (Stockfish v Houdini). Stockfish mates on move
174. Game would have been declared a draw by human players by move 40.

~~~
sleepyandlazy
When I read your comment, I thought the reason Stockfish won was because the
computer found an innovative tactic that a human would not see.

Watching the game, it looks like if there was a human player instead of the
Houdini computer, the human would be easily able to force the game into a draw
instead of a loss.

Nonetheless, as an intermediate chess player, the games were fun to watch
because of how different the play style computers can sometimes have versus
humans.

------
openfuture
According to the commentators Houdini had more impressive wins and the game
that decided the tournament (the third game shown in this article) was
supposed to be a draw but somehow got blundered away due to the contempt rule.

Basically it was very close and Houdini may actually be slightly better,
especially at the abstract heuristic stuff. Although Houdini creates good
positions for itself, Stockfish excels at brute force and somehow always
manages to turn things around as it transitions into the endgame.

------
josephernest
Is it only this code and nothing else?

[https://github.com/mcostalba/Stockfish/tree/ONE_PLY/src](https://github.com/mcostalba/Stockfish/tree/ONE_PLY/src)

Is there no DB or machine learning inside?

~~~
ZirconiumX
It's actually [https://github.com/official-
stockfish/Stockfish/tree/master](https://github.com/official-
stockfish/Stockfish/tree/master)

No, SF doesn't use machine learning, it's entirely heuristic.

~~~
josephernest
Doesn't it use a database? From README:

> If the engine is searching a position that is not in the tablebases (e.g. a
> position with 7 pieces)

and it speaks a lot about tablebases. What is it?

~~~
ZirconiumX
Okay, I'll concede that one, my apologies. A tablebase is essentially a
precomputed endgame solution for all positions with that many pieces.
Likewise, it has an opening book, which is a database that contains moves for
the first few turns of the game.

It's still a heuristic searcher when it's not using those though.

------
znpy
Where did the good old Fritz go ?

~~~
noir_lord
Fritz the engine hasn't been very strong for a few years, Fritz the UI (which
also allows you to use other engines) is still the best thing out there for
learning/play.

They seem to have focussed more on making the Fritz engine a good tool for
human analysis rather than the strongest possible engine.

Not sure what the new stuff is like the last one I bought was Fritz 12 some
time ago.

------
rurban
Houdini is still my favorite engine though. It plays much more interesting
than Stockfish, which plays like a, well, a stockfish.

------
TwoBit
$1000 prize money? Can't they crowd source something more than that?

~~~
mattnewton
These people are all skilled programmers doing high speed analytics. They
aren’t doing it for the money.

------
tranv94
Good to know I’ve been using the best to destroy my opponents on chess.com

~~~
ZenoArrow
Isn't that cheating?

~~~
slphil
Yes, and I've already reported him to chess.com. :)

