I think one major piece is consistently missed in those articles. Yes, the chances of him hitting 2900 do not depend only on his performance against other players, but from their elo as well. If the following 25 players in the world where all 2800+ Carlsen would already be 2900+.
Thus, Carlsen's best chances largely depend on elo inflation to widen the elo distribution itself.
One example of this: Louis de Bourdonnais which was #1 in the world in the 1800s skyrocketed his elo rating from 2600 to 2650 mostly as a result of the following 10 players moving their average elo from 2350 to 2450.
His winrate against them never changed. And what is required for 2350 players to go higher? That the players below they skill also raise their elos giving the top 10 more points when beating them. The more players will be fide rated the more Carslen's chances will increase but without a sufficient supply of 2700s, those at 2750 won't raise in elo, etc, etc
The Elo system didn't exist until 100 years after Labourdonnais. There are several attempts to extend it backwards, but they're not well defined. It doesn't make sense to say Labourdonnais went "from 2600 to 2650" without changing his playing strength: it's an artifact of some particular guy's methodology to estimate historical ratings.
Elo takes match results and spits out ratings. Why does it matter that it was invented after the matches took place, as long as the game records are accurate? The timing of the games is irrelevant.
I would also argue that there is a source of ELO deflation going on. As the top level players all get better, theory evolves, people generally play better chess.
A player with a rating of 2750 today might have beat Kasparov in his prime, simply because chess has evolved.
In a way, it's similar to sports. Plenty of people today might have beat Carl Lewis at 100m 30 years ago, but they would probably not be considered greater athletes.
So in order to beat 2900, it would help if several other players got rankings of 2800-2850, but only if they did so without actually playing much better chess.
Or he could just have a temporary lucky/confident streak, and gain 40 rating points in a few tournaments.
> In a way, it's similar to sports. Plenty of people today might have beat Carl Lewis at 100m 30 years ago, but they would probably not be considered greater athletes.
Elo doesn't measure absolute but relative strength. It is not relevant whether the top players get better, what matters is their winrates against each other.
The fact that chess has evolved is irrelevant because it applies to all players from carlsen to beginners everyone has the same tools.
"If the following 25 players in the world where all 2800+ Carlsen would already be 2900+."
Why are you assuming that Carlsen would always win against other players if the other players are that much stronger? Maybe if the following 25 players were all 2800+ then Carlsen would take more losses.
Because, as already explained elo measures relative strength not absolute one. If he's a better player he's going to beat them more often than not while not losing all the points he does now for drawing.
It's worth noting that one of the authors of the cited paper [1] (Nick Polson) is infamous for claiming (incorrectly) a proof of the Riemann hypothesis [2].
This case is somewhat strange because he's not the usual mathematical crank; he's actually a quite well-recognized economics researcher who is clearly mathematically competent. But for some reason he persists. (If you search his name, you can find his YouTube channel where he lectures on his RH work...)
"A Hadamard factorization of the Riemann Xi-function is constructed to characterize the zeros of the zeta function."
Like, dude, you are claiming to solve a Millennium problem, you're gonna have to put a little more effort into explaining, ya know, the breakthrough...
We care about the Riemann Hypothesis not for the theorem itself, but because the methods used to solve it will surely be revolutionary and it's those methods that are important, not the theorem.
That's what makes it a Millennium problem. So a paper like this immediately shows the author doesn't even understand why the problem is important, let alone its solution.
> Like, dude, you are claiming to solve a Millennium problem, you're gonna have to put a little more effort into explaining, ya know, the breakthrough...
In the abstract? If papers needed to cover everything in the abstract they would need to have an abstract for the abstract and they you’d just end up complaining about the abstract abstract not being complete…
It's odd that people can't work on something in which they're interested and talk about it publicly without being labeled a "crank".
Sorry, there's nothing holy about the Riemann Hypothesis; anyone can add their input to the problem, especially on ArXiV and/or YouTube. You know, that's how discussions can start, and sometimes people want to discuss things with others.
It seems like you might think only certain people should work on these special problems, and only if they do it correctly according to you.
Your comment is just an ad hominem, so why don't you tell us why you're actually mentioning this? Do you think the math is wrong in the relevant material from the article because of his RH-related musings on ArXiV, or do you have an anti-RH-researcher bias that you want to share with the world? Are you just hating on someone?
It's not the discussion that makes you a crank. It's publishing a paper claiming you have solved something that you haven't even begun to understand. It's bypassing peer review. It's ignoring all the literature out there and claiming that is somehow a virtue.
You have an interesting idea? Go ahead, please share it! But it's not productive to just start with "Look everybody, I solved it!"
It just wastes everyone's time. For example, this paper. It would take a lot of time to sit down and work through the math until I find a specific error in it. But the barren abstract and reference sections strongly suggest that would be a huge waste of my time and energy.
Mochizuki pulled this stunt with the ABC conjecture. Wasted YEARS of mathematicians' time just to arrive at the conclusion it was all elaborate mathematical smoke and mirrors. The guy is an egotistical asshole with a messiah complex, and managed to burn a lot of PhD students by chasing his red herring. Not cool.
I think a lot of people don't appreciate how open mathematicians are, in general, to work "outside the field" - it's especially baked into the DNA through lessons like e.g. Ramanujan.
This guy though, knows what the right way to present the idea would be, but for some reason (ego? knowing it's weak/broken? pique?) chose not to.
Further, in certain cases (P =? NP is one I know of) it has been proven that certain types[0] of proofs cannot possibly work -- and yet people keep offering "proofs" of those bound-to-be-wrong types.
[0] Not sure if 'types' is the right word? Using it colloquially. Maybe 'classes' would be more accurate?
It's subtle, because some of the impossibility proofs are effectively saying that any real proof needs to be somehow sensitive to whether it's in an imaginary universe with oracles, so they can _fail_ in them!
It really is mind-bending stuff -- I have only the vaguest idea of how these proofs could work, but you summarized it beautifully.
As an aside, this is also perhaps also one of the best motivating examples of the idea of "oracles" in math. ("Ok, assume we can do $IMPOSSIBLE_THING... now what?" :) )
> It's not the discussion that makes you a crank. It's publishing a paper claiming you have solved something that you haven't even begun to understand. It's bypassing peer review. It's ignoring all the literature out there and claiming that is somehow a virtue.
You've effectively said that people can't post things on ArXiV unless they're up to your unstated standards; otherwise, they're just "cranks".
Also, no one is "bypassing peer review" by posting on ArXiV and/or YouTube, nor have I seen anyone claim "a virtue" of any sort. Where are you getting all this? From the abstracts?
> It just wastes everyone's time. For example, this paper. It would take a lot of time to sit down and work through the math until I find a specific error in it. But the barren abstract and reference sections strongly suggest that would be a huge waste of my time and energy.
Well, since you've now admitted that you haven't read his work, it seems like everything you said earlier really must be coming from the abstracts alone.
> It just wastes everyone's time. For example, this paper. It would take a lot of time to sit down and work through the math until I find a specific error in it. But the barren abstract and reference sections strongly suggest that would be a huge waste of my time and energy.
Who's time is wasted? The random people who volunteer to read his ArXiV submissions and/or YouTube videos? Really, the only waste of time I've seen is your ad hominem comment in a HN post that isn't even about the author you're blatantly criticizing.
> Mochizuki pulled this stunt with the ABC conjecture. Wasted YEARS of mathematicians' time just to arrive at the conclusion it was all elaborate mathematical smoke and mirrors. The guy is an egotistical asshole with a messiah complex, and managed to burn a lot of PhD students by chasing his red herring. Not cool.
And that's why we must all attack Mochizuki whenever we see his name, right? Do you know Nick? Is he egotistical? Does he have a messiah complex? Are you just going on a tangent now? Are you just math trolling?
Not trolling, I am serious. It took Peter freakin Scholze to finally settle the debate over Mochizuki's work. What came out of it? What else could Scholze have been working on instead of spending time finding the incredibly subtle gaps in logic that he used to construct his false theory?
What Mochizuki did was wrong is because he was utterly uncooperative with the mathematics community, he would answer questions only with more papers that never addressed the concerns being raised, and he was offended that so much scrutiny was applied.
Has Nick made an effort to educate people on the incredible breakthrough he has made? How many lectures has he given on it? Any conference videos on YouTube? Because I'll watch them. Mochi is an extreme example, so I'm sure he's not trying to do anything wrong. He should just retract the paper and keep working on it. An interested volunteer can tell him what is specifically wrong, and that's all that needs to happen.
> It is bad practice to post incorrect results and not retract them when this is pointed out.
And it's good practice to take shots at people whenever you see their name?
Also, no, there's no "practice" that says you can't keep a mistake posted on ArXiV or YouTube. That just sounds like something you've made up to justify attacking someone.
Peer review isn't a great modern invention. It's better to publish things out in the open and allow discussion of them. If we perform code reviews out in the open and it has proven exceptionally successful then I don't see why we wouldn't encourage the same for scientific research. The fact that peer reviews happen in private makes the whole process opaque to outsiders, and it simply creates artificial barriers.
If you have valid criticisms of something, you can make them in public and help everyone involved grow.
Unfortunately, if you aren't already part of the in-crowd, getting peer review at all can be very difficult, if you have any type of unorthodox idea. Because, you guessed it, you get labelled as a "crank", so why bother wasting time reviewing your paper?
He has continued to update the paper (as recently as last year), it has not been retracted, and he appears to maintain the proof is correct on his YouTube channel.
We all make mistakes, and no one would really care if he admitted this. What makes the case notable is that he has enough mathematical training that he should be able recognize the proof is wrong. Especially after the errors are uncovered by others and communicated to him. Continuing to assert the proof is correct after errors have been found is bizarre.
> He has continued to update the paper (as recently as last year), it has not been retracted, and he appears to maintain the proof is correct on his YouTube channel.
You know that ArXiV and YouTube aren't considered "journals" and don't have a similar requirement for "retractions". It sounds like you think he should hide his shame; otherwise he deserves your attacks.
> We all make mistakes, and no one would really care if he admitted this. What makes the case notable is that he has enough mathematical training that he should really be able recognize his proof is wrong, especially after the errors are uncovered by others. Continuing to assert the proof is correct after errors have been found is bizarre.
People probably shouldn't, and don't, care because he's not affecting them in any way with his RH-related ArXiV and YouTube musing.
More importantly, where are there people showing him how his proof is wrong, and where is he outright denying their points/proofs? It seems like I'm missing a link or two, because I haven't seen any of these things, yet you're referring to them as though they're apparent and damning.
My guess is that he simply doesn't get feedback about this stuff, and he probably doesn't even care that much, because this is just a set of ideas with which he likes to work. I've seen no signs of a charged or "high stakes" math community engagement, and definitely not the kind that deserves these unwarranted personal criticisms.
You know that ArXiV and YouTube aren't considered "journals" and don't have a similar requirement for "retractions".
My guess is that he simply doesn't get feedback about this stuff, and he probably doesn't even care that much, because this is just a set of ideas with which he likes to work.
Having every right to be a jackass doesn't make you not a jackass.
I believe knowingly disseminating incorrect results is wrong. This is a violation of fundamental academic standards; we would not tolerate it in any other scientific field. Do you disagree?
Arxiv's submission policy says: "Submissions to arXiv should be topical and refereeable scientific contributions that follow accepted standards of scholarly communication." (https://arxiv.org/help/submit)
It against the prevailing norms of scholarly communication to publish results with serious errors known to the author.
I agree YouTube does not have this policy. I still find incorrectly claiming a proof of RH on YouTube distasteful, for similar reasons.
I personally know a mathematician who has pointed out the mistakes to him. But also, Polson good enough at math himself that he should be aware of these points. I would criticize him just the same if he knowingly published false economics results.
> Arxiv's submission policy says: "Submissions to arXiv should be topical and refereeable scientific contributions that follow accepted standards of scholarly communication." (https://arxiv.org/help/submit)
If you're implying that he violated ArXiV's submission policy, then you're going to need to stretch the definitions of those words a bit, as you attempted to do.
> I agree YouTube does not have this policy. I still find incorrectly claiming a proof of RH on YouTube distasteful, for similar reasons.
I get that you have all these personal opinions/takes, but ArXiV and YouTube don't appear to be justifying them.
> I personally know a mathematician who has pointed out the mistakes to him. But also, Polson good enough at math himself that he should be aware of these points. I would criticize him just the same if he knowingly published false economics results.
OK, so where did they post these discussions with Nick so that we can all clearly see that he's a bad or stupid person, as you're implying? Your comments assume that this is all common knowledge or apparent, but it isn't.
Are we just supposed to take your word for it? Well, I happen to know Nick, and, from my experiences with him, I have no reason to believe any of the things you're saying and/or implying.
I don't think Nick Polson is a bad or stupid person. I never said this. In fact, as I said above, I believe he is quite mathematically competent. But even competent people make mistakes.
Norms of scholarly communication are not a personal opinion.
You have also curiously avoided my question about whether it is improper to knowingly disseminate incorrect results.
The whole 2900 thing is so strange. ELO is a relative scale. The number only makes sense in relation to the ratings of other. You could shift everyone or scale it differently and it wouldn’t change anything. What a poor goal.
I view the whole thing more as a criticism of the sorry state of FIDE and the world championship that an actual serious goal.
One person reaching 2900 when it's not by changing the scale shows their relative position to others is a new record. it's a worthy psychological goal, like breaking a 4 minute mile even though a minute and a mile are made up quantities. Or scoring 300 in bowling, when 10 rounds is arbitrary.
Pretty much every record in every sport is arbitrary, yet people like to set goals for motivation.
> their relative position to others is a new record.
This doesn't seem necessarily true to me. If Magnus was in his prime in say 1980s it's not clear if he could ever get so close to 2900 since every other super-GM had lower ELO which would make his progress slower than now. Not only do we not know how an average 2022-super-GM compares to a 1980-super-GM, we also don't know how Magnus compares to an 1980-super-GM. E.g. Tal achieved his peak ELO 2705 in 1980, but at that time Korchnoi had an ELO of 2695 (diff of 10), whereas, Magnus currently has 2864 and #2 Ding Liren has 2808 (diff of 56). But Magnus' peak rating of 2872 is 30 less than Caruana's peak rating of 2842 during the same time.
> Not only do we not know how an average 2022-super-GM compares to a 1980-super-GM
I suspect this is false. Chess is one of the few endeavors where we have decent historical records. Combine this with the fact that computers can now analyze lines a very long way, and I suspect that you can do things like compare how often GMs missed a superior line of play.
By most measures, the GMs of the past played weaker chess than our current GMs. The current GMs benefit from having started at younger ages with a larger body of chess theory and super-strong computers to work with and against. I suspect that the GMs of the past would be as strong as the GMs of the present given those. However, they didn't have those and, consequently, the chess, itself, was weaker.
Side note: I adore Mikhail Tal's "This position looks interesting and complex. Banzai!" games. However, most of them would get him absolutely creamed against the current top players. And to be fair--even the top players of his own time tended to refute them. However, Tal's games are often way more exciting than everybody else's.
ELO is a relative measure though, so it's more like winning a highly competitive mile race by 5 or 10 seconds, when the usual gap is usually under a second.
It's still impressive as heck.
One tempting comparison is Usain Bolt, but it feels a little different to me because he dominates both objectively (WR) and versus his competitors.
Isn’t a world record also relative? I mean, it’s literally “the lowest time (greatest height, longest distance, highest score, etc) that’s ever been achieved, compared to other human beings in the same event”. 2900 for Carlson is basically a world record elo score. I don’t see any arguments in T&F that a world record decathlon score, for example, wouldn’t be a worthy goal.
A 4-minute mile represents the same milestone regardless of the other competitors in the race, not relative to their performance. You can win said mile by a minute, or come last; if your time is 3:xx.xx, then you've broken that milestone.
The decathlon works the same way- the points scoring there isn't relative like Elo is, it's done by performance in an individual event. You don't get any extra points for winning a given decathlon event, there's a scoring scale for your marks regardless of position.
The exact same performance will differ in how it affects two players' Elo depending on both players ratings. If all players' ratings were to be 100 points higher with all else equal, the top players ratings would rise as well. If everyone in my Mile runs 10 seconds faster, my time doesn't also get 10 seconds faster.
Basically, one record is an individual performance mark. The other is a ranking, which is a different kind of metric entirely.
With all that in mind, chasing a world record Elo is still a worthy goal IMO. The competitive chess scene is old and robust enough that it remains a meaningful benchmark, even if it is a relative measure.
Yes but the examples you mentioned are (in practice) absolute and can be reached independently of how others are performing. In that sense, ELO _is_ different (as also explained in another answer, he can reach 2900 more easily if others start performing worse).
Ok, but the point is the same - a huge amount of human records and skills are scored against a moving backdrop of human scores that are current competitors.
Federer getting grand slams - all against local in time competitors - is impressive. Had he played 50 years ago, he'd have likely gotten more. Had he played in 50 years, likely less, expecting competition to improve.
Same for NBA records - all against a moving, relative backdrop.
Same for wrestling, baseball, hockey, and on and on and on.
Just because your score is relative to others does not make reaching some milestone, especially this one, less impressive.
Also, it's easier to have a higher ELO if others do too, not the other way around.
Actually, he can reach his goal more easily if the people he wins against are closer to his score, so it helps if a few start perform better.
I wonder if it is a valid strategy if he drops out of World Championship title contests to allow others to increase their score by not losing against him. I would have thought that wouldn't move the needle much but I guess he needs everything there is to get to 2900.
The point being made is that it's a relative to others metric and so you can't use it to make an absolute ranking.
There are going to be team compositions that didn't win an NBA championship that are stronger than ones that did win a team composition simply because at the time the competition was stronger and so they weren't the best. So, if you use #ofChampionships to generate a ranking of all team compositions there's going to be a lot of debate that the teams aren't actually ranked by strength since some of the teams that racked up championships did so when other teams sucked (I know less about NBA but for NHL's original 6 this becomes the case, the competition was so weak that some people didn't know they were drafted because it wasn't something you cared about).
To get back to the original point, your ELO rating is in relative to your peers. The idea is that for a given point differential you have a x% chance of beating them. So the point being made is that you can't use ELO to compare anything besides two players who can physically can play each other as taking a playing in the past with n elo and having them play a current player with n elo won't match the x% chance.
To counter some of the other points (bowling and the mile). These are precise rankings not relative. If somebody runs a 4min mile in the past you can expect them to run a 4min mile in the future. This doesn't mean that they'll win the same %of races though as they did which is what elo is more about. Same thing with bowling, if somebody constantly bowled a 280 in the past you can expect them (ignoring lane/oil changes) to bowl 280 in the future but if the competition now bowls a 290 that player is going to suck instead of being a legend. So the bowler of the past may have had a very high ELO but now when taken into the future will (after many losses) have a lower ELO despite having the same peformance.
I don't think your comment is in conflict with the other's points, but Magnus Carlsen seems to be arguing that he wants to be the .9th best basketball player in the world rather than the 1st best basketball player in the world.
And the ambition to reach unprecedented strength in a sport is emphatically not an 'unserious' or 'poor' goal: it's the motivation of all sporting champions.
2900 might seem arbitrary, but it's a symbolic level under the rating system we have.
Is the top end deflating? If deflation comes from young high-variance players does that affect the players at very top? As they might be good enough to consistently beat the young players.
It's comparing players of the same rating today vs. the past using the chess engine Stockfish to analyze how well players are a certain ELO performed. For example a player with an ELO score of 2000 today is stronger than a player with an ELO score of 2000 back in the 90's.
I still can't understand how chess ratings work --even after 'RTFM' and having people try and explain it to me.
Far far below the lofty heights of Carlsen &co., I play a fair bit on Lichess. Sometimes I'll play someone with a similar ranking to myself and beat them and will earn 3 or 4 points. Then we'll have a re-match which I lose and they'll gain 12 or 15 points.
I just can't fathom how them beating me earns them more points than I earn from beating them --if we were on the same or similar rankings to start with. But it almost always seems to pan out this way.
[And I'm not talking about opponents with a '?' after their ranking, which signifies they've not played enough games for the ranking to be accurate yet. I know that, in those circumstances, the rankings can move hugely in either direction].
I've also noticed that there doesn't seem to be any relation between the circumstances of a win and the points awarded. For example, shouldn't a win or draw when playing black earn more points than when playing white --given that white had the advantage of moving first?
And what about the manner of victory? It seems there's no more ranking points to be gained from; snatching a victory where both players ended up down to a single pawn each and the winner just managed to queen one square ahead of the loser... than there are from completely decimating your opponent.
> Far far below the lofty heights of Carlsen &co., I play a fair bit on Lichess. Sometimes I'll play someone with a similar ranking to myself and beat them and will earn 3 or 4 points. Then we'll have a re-match which I lose and they'll gain 12 or 15 points.
I just can't fathom how them beating me earns them more points than I earn from beating them --if we were on the same or similar rankings to start with. But it almost always seems to pan out this way.
Lichess works on a different system than professional ratings. It takes into account the RD (sort of a standard deviation on your rating). Playing against a player with a more stable rating will cause the rating to change more and vice-versa. Professional ratings are always assumed to be stable.
> I've also noticed that there doesn't seem to be any relation between the circumstances of a win and the points awarded. For example, shouldn't a win or draw when playing black earn more points than when playing white --given that white had the advantage of moving first?
This is something that is sometimes used as a tiebreaker in tournaments. Sort of an away goals rule, if you're familiar with soccer. But as far as ratings go, in the long run you're bound to play ~50% of the games with white and ~50% with black, so it smooths out itself.
> And what about the manner of victory? It seems there's no more ranking points to be gained from; snatching a victory where both players ended up down to a single pawn each and the winner just managed to queen one square ahead of the loser... than there are from completely decimating your opponent.
I'm sorry but this makes absolutely no sense. There are three results: Win - Loss - Draw, everything else is just discourse around the game.
Endgames are something players trade into, reducing to a won endgame is part of the skill involved in the game. Why would that count as less of a win?
Also, material is a pretty stupid way to determine how close a game was. Should I get less rating points because I won with a queen sac?
Lichess doesn't use ELO rating but a different rating system. And yes, I don't understand that difference either.
About white versus black, there is some consensus that after about 20 moves, white's advantage is gone. I doubt that statement is supported by statistics. I guess the rating just assumes everyone plays white and black just as often, so it will even out in the end. (edit) By the way, if you are strong with white and weak with black, how should your ELO rating be calculated? It is only one number, not two. Assuming an average seems fine with me.
And the way a win is made, those are for the beauty contests :)
A simple calculation for ELO is just to see it as both players placing a bet, the K factor is the amount of points in the jar. If the K factor is 25, you have 1600 ELO and I have 1400 ELO, you might put in the jar 16 points, while I put in 9 points. Winner takes the jar, on a draw it is plit in half. Difference between win and draw is always half the K factor. When both players have a different K factor, the calculation for both players will be different, it's not zero sum.
They say that Glicko 2 is more accurate than Glicko 1, and they reference a kaggle leaderboard. On that private leaderboard, I only see "Glicko benchmark" and "Real Glicko," so I don't see how they made that claim. I find it dubious that Glicko 2 would be more accurate out of sample.
Yes!!! I've long been frustrated by this. I'm around 1800 if I play at certain times, but quickly get down to 1700 when playing at others. For some reason 1700s are stronger during night.
And something even worse: sometimes it seems that lower rated players are consistently stronger, so if you fall below some threshold, you get stuck there. On the other hand, if you can get above it, you suddenly win and win...
> About white versus black, there is some consensus that after about 20 moves, white's advantage is gone. I doubt that statement is supported by statistics.
That statement is supported by statistics, however, the problem with that statement is that the average chess game is 25 moves.
except in top-level classical play.
But it's not relevant, because virtually all experts agree that chess is a theoretical draw. It's extremely difficult to prove it, however.
I wonder if there's even a way to be sure of a minimum length? Like suppose I'm White, I know how to play Chess perfectly and so does my opponent. So (if the popular assumption is correct) we will eventually draw, but how many moves can I draw the game out for, assuming my opponent isn't willing to concede, but is trying to shorten the game ?
depends on the position, from not a lot, to 100+ moves.
but don't forget that there are forced draws in chess, namely, repeated position 3 times (doesn't have to be consecutive), 50 move rule (50 moves without a pawn move or a capture), and insufficient material.
> Sometimes I'll play someone with a similar ranking to myself and beat them and
> will earn 3 or 4 points. Then we'll have a re-match which I lose and they'll
> gain 12 or 15 points.
It's probably a sign that they have a new account.
If a player has played very few games, then his rating is only approximate.
It's likely his true rating is far from his current rating.
The rating added or subtracted due to a win/loss is on purpose made large,
so he quickly can get to his true rating. When he gets to his true rating,
he will presumably win and lose the same amount of games and stay at the rating.
Lichess and Chess.com both use (variants of) Glicko (instead of ELO).
The first step is to calculate the rating deviance.
A high deviance means that the player hasn't settled on a given rating.
This factors into the calculation of the new rating.
The sites have chosen this system so that higher (or lower) rated players that make a new account quickly get close to their "true" rating.
The Glicko[0] rating system is different from Elo[1], and it's what Lichess and Chess.com use
In Elo, the rating gain for one player equals the rating loss for another, because each player's rating is the only independent variable.
Glicko calculates a kind of 'certainty', aka 'rating deviation' which is based on the number of games played in some period of time. The idea is that the ratings of players who have not played in a while are more likely to be wrong.
About the manner of victory, I make an example from the game of go, which being points based makes this case more straightforward to evaluate. In go it's either win or loss, there is no draw. You can win by one point or by one hundred and you still score only one point in the tournament table. Why is that?
Because if you count to be losing by five points you could either resign (I'm assuming a game between strong players) or start playing more adventurous moves to turn the game around. Adventurous moves usually make for a larger win, let's say fifty points. Should players be punished for that and only play games that minimize the difference between the scores of the players? How boring. I guess this applies also to chess. Sacrifice, sacrifice, win. Or sacrifice, sacrifice, mistake, loss. I won't punish a mistake so much more than any other one.
In Baduk gambling in Korea, you pay for points, so you might expect people to just minimize differences to pay out less, but they end up fighting all game instead. This is because it's capped for resignation, it might be cheaper to resign if you're too far behind (why count and pay more?)
On chess websites gains and losses swing more extreme until you play enough games to stabilise. It's a modified elo system or elo-adjascent. This way people reach their "true" rating much quicker e.g. a grand master who signs up will reach 2400 over a smaller amount of games than if he could only gain 2 or 3 points for every game. The same goes for the beginner who starts at 1500 but should be 800, he'll reach that quickly and then play against people around his ability.
If you have roughly the same rating both of you should be gaining the same amount on a win. However many rating systems have some kind of aging factor. Meaning the more games you play the more stable your rating gets.
So it could be that your opponent has played significantly less games than you. However if he loses he also should lose more points than you would.
The color you play has no influence on the rating numbers. And also the type of win/draw/loss does not matter at all. Ratings are just statistics of your results.
Elo also depends on how many games you have played in a time format.
It is likely that the person with 12/15 point gains has less games, thus his delta variation is higher than yours. It takes tens of games before you take "normal" points.
Even with the hot streak in 2019, his performance rating was 2889, which is not enough. It could have continued for a thousand games and it would not have been enough. He will need more wins against high rated players, even more wins than in 2019.
Quote from the article: "...won every tournament he took part in, scored 32 wins, 47 draws and not a single loss, with a rating performance of 2889."
> In this simulation. Magnus reached the rating in 1600 simulated trajectories out of 2000. If Magnus Carlsen continues showing the same performance as he did during the 2019 period, he has 80% chance of reaching 2900.
Yes, I read that too. There are not many details, like what kind of simulations there have been made.
Also, "the same performance" has a different meaning from TPR performance I guess. With a TPR of 2889 you will not reach 2900, afaik. Unless 1 tournament is played with a TPR of 2920 and the next one with a TPR of 2840. But then it's a fleeting 2900 :) I guess that will do for Carlsen :)
He is fighting an uphill battle though, because nobody has a higher rating than him, and there's only so many points you can gain from winning games against lower rated players.
Giving up the world championship improves his (very slim) chances of reaching 2900, because he doesn't have to save his preparation for those 12 important games. Magnus often plays offbeat (bad, basically) openings, to take his real prep completely out of the picture.
Not mentioned in the article—> if Carlsen improves his play. Presumably that’s totally possible since he will free up enormous amounts of focus previously spent researching the few possible challengers to his championship title.
I think the 5% odds are an underestimation, because Carlsen will change his focus now and won't spend time preparing for the World Championship. He will likely also prioritize risky wins vs safer draws.
> He will likely also prioritize risky wins vs safer draws.
It seems kind of unfair that this is the strategy needed to increase ELO. If the safest strategy to guarantee a win in a tournament is winning a single game and drawing the rest, then you shouldn't be penalized for that in your ELO, versus playing risky and winning more but also losing some.
It's not the strategy needed to increase your rating on average. It's the strategy needed to target high variance in search of reaching a far-outlying value.
Considering there are quite a few players from young generation who reached 2700+ recently I don't see him reaching his goal and even 5% looks optimistic.
Problem is new generation probably are less affected psychologically against him and I suspect Carlsen has higher. risk of losing against them compared to old guard.
ELO is always a fascinating thing. I waste far too much time on boardgamearena.com which has a system where people start at 0, 100 is, I think, equivalent to the chess 1500, except that it becomes a floor rating. I “retire” from a game when I can get to 300. I got up to 285 recently on cribbage and was the 86th rated player at the game, while in other games, being a rated player requires a score of 400 or more.
"Elo suggested scaling ratings so that a difference of 200 rating points in chess would mean that the stronger player has an expected score (which basically is an expected average score) of approximately 0.75, and the USCF initially aimed for an average club player to have a rating of 1500."
I guess that means that Magnus has expected score of roughly 0.25^((3585 - 2864)/200) = 0.00675 against Stockfish 15, which is basically 1 in 200 games?
That is not quite the right calculation. To see this, try plugging 0.75 into the same formula to get Stockfish's expected score. The result is about 0.3545. If this were the correct formula, then the two expected scores should sum to 1, but in this case we only get 0.3612.
Instead, you should convert the 0.25 to "odds" form. 0.25 is 1:3 odds, represented by the number 1/3. (1/3)^((3585 - 2864)/200) is about 0.01905 (still in odds form). To convert this back to an expected score you would take 0.01905 / (1 + 0.01905) = 0.0187. So Magnus Carlsen's expected score is 0.0187.
Applying the same method to Stockfish, we have 3:1 odds, which is represented by the number 3. 3^((3585 - 2864)/200) is about 52.48. Converting back to expected score we get 52.48 / (1 + 52.48) = 0.9813. So Stockfish's expected score is 0.9813.
Our sanity check is to add 0.0187 + 0.9813. The result is 1.0, as it should be.
ELO is dependent upon the pool of players that one plays in. Engine ELO's have no relationship to human ratings because no humans play computers under normal conditions. For an example of this phenomena taken to extremes, Claude Bloodgood [1] was a strong amateur who ended up as officially rated as one of the top players in the world (and #2 in the US) simply because he was only playing against a pool of other prison inmates who were in turn playing only against each other. So all his rating reflected was his relative strength in the prison pool.
Computers are definitely much stronger than humans, but not 3600 better. Magnus would certainly be able to eek out plenty of draws, if not only because white can create "simplified" (as a euphemism for dead) positions in just about any variation if he really wants. And Magnus regularly plays these sort of positions literally at the level of supercomputers.
I'd also add that much of the dominance of computers is not based just on raw ability alone, but more psychological issues. Humans can become tilted, intimidated, frustrated, tired, and so on. One of the last major human vs computer events was Kramnik vs Fritz. Kramnik, in a relatively simple position, ended up blundering mate in 1 with plenty of time on his clock. It's unlikely he would have ever made the same mistake against a human. It's just very difficult to get in the same mindset when playing against a human as when playing against a computer. Chess, in spite of being a game of complete information, is still extremely influenced by psychology.
Winning chance is basically 0.0% against current strong engines for a human. There are possible draw lines but if engine is configured correctly draw chance is also 0.0%
As some other commenter calculated Stockfish wins in about ~>98% of the cases. The other ~<2% of the cases aren't Magnus winning, rather them drawing. I think no GM is known to beat a modern competitive AI in chess, however there are known/recorded instances of draws.
Machine and human ELO ratings are decorrelated. No games are played between a full strength engine and a human anymore. Even Magnus can’t win against a top engine. He might draw if he is lucky.
This is somewhat misleading because there have also been no classical time control human-computer matches since 2006 when Kramnik played against Fritz. That match more or less ended interest in these matches.
Every game was drawn until Kramnik randomly blundered a trivial mate-in-1 in a simple position. That mistake was undoubtedly driven by psychological reasons. Playing against an engine is nothing like playing against a human and it's difficult if not impossible to put oneself in the proper mindset to play well.
The following games were then also drawn until Kramnik decided to do the chess equivalent (when playing against a computer) of going for a hail mary, with black, in the final game to try to even the match score. Suffice to say, that failed and the match ended as a generally disappointing, for everybody, 4-2.
There have been a variety of gimmicky matches since (fast time controls, various odds matches, etc), but nothing serious.
> ...assuming... the mix of opponents are the same in the two periods (2019 and 2020-2022)
This seems like a big issue. There are currently a bunch of really promising young players coming up the ranks who will likely be continuing to improve and be underrated for a while. That's going to make it even harder.
Hopefully the whole takeover of his (~8% stake) company "Play Magnus" (owner of chess24, chessable, etc.) by chess.com does not affect him.
There is a cringey announcement of the takeover on YouTube, including a TikTok video of Danny Rentsch dancing (to put it mildly ...).
The plan seems to be to draw Carlsen into that toxic sphere of influence. Chess streaming and the culture surrounding chess.com seem to be for the lowest common denominator.
Carlsen fit much better into the European culture of chess24, I hope he can ignore the circus and it won't affect his play.
The issue is that he's going to be trying to maximize his chance for a win, and not his expected points. So the model likely underestimates his chance of success
I wonder why Magnus has fixated on 2900. It's sort of a dumb goal for someone at his level because it's out of his control. He can win a lot of games but if his opponents don't win their games that aren't against him then he's not going to get there.
He's bored not exactly by being the world champion, but by the format, where he has to prep for a long series of games all against just one opponent. He wants to play a variety of players. He wants the world championship to be a tournament with many entrants, rather than last year's champion against just one opponent.
Not by playing on the club level, as pointed out. He'll loose with every draw to such low ranked players. He needs to play a lot of high ranked tournaments, and don't take draws. Excitíng
It sounds like focusing so hard on the rating might be bad for his quality of life, by preventing him playing 'fun' tournaments.
Edit : He can try to game the ratings system by looking for high-rated opponents who he knows will not play as well as their ratings suggest for some reason (like illness or personal issues). But he doesn't seem the sort to enjoy this
It is. If the tournament is not organized by FIDE, or a national chess federation, it won't affect your FIDE rating.
It will also not affect your FIDE rating if the tournament is not organised to the required standard (which is basically nonexistent for under-1800 rated players, and going all the way to the required presence of International Arbiters, metal detectors and blood tests for top level play)
has Adderall been shown to enhance performance or is this a preventative measure to protect player health?
what else do they check for? caffeine is probably helpful and not banned. LSD (microdosing)? Galantamine? If there are nootropics that can legitimately enhance high level chess play why aren't we using them on our top scientists and engineers?
Technically he could, I believe FIDE has a minimum rating increase of 1 point for the winner of a game with > 400 ELO advantage on their opponent. But if he broke 2900 by playing in minor tournaments against low rated amateur opponents no one would take it seriously that he broke 2900. Realistically many, maybe even all of the top players could push for record ELOs by only playing against players well below their level. But not only are the optics bad and the records would be looked at as illegitimate, it would take a lot of time playing long classical chess games against inferior opponents. Which is a major opportunity cost since their time would normally be spent playing in and studying for professional tournaments.
So.. he has rating 2861 now and 39 points left to 2900.
In theory he plays one 2500 player every month, and wins, for 39 months in a row and we're done? I'd be impressed if he won all of them (he doesn't usually do that.)
His most recent game on 2700chess is a draw vs 2490 rated player Schitco.
The post I was replying to asked if he could play against 1800's and gain rating. Players in the 2500s are much more capable of forcing a draw. It'd be unlikely he'd gain much rating playing against them because wins wouldn't move the needle much and draws would be very costly.
No, because he wouldn't get any points for beating opponents that much lower than him. There is no such thing as a fractional rating point, so if the points gain gets below 0.5 it just rounds down to zero.
They would have to be demotivated, and Carlsen would have to never make a mistake causing huge loss of ratings points. And in reality they would be more motivated than by most of their other games, and Carlsen makes mistakes.
Thus, Carlsen's best chances largely depend on elo inflation to widen the elo distribution itself.
One example of this: Louis de Bourdonnais which was #1 in the world in the 1800s skyrocketed his elo rating from 2600 to 2650 mostly as a result of the following 10 players moving their average elo from 2350 to 2450.
His winrate against them never changed. And what is required for 2350 players to go higher? That the players below they skill also raise their elos giving the top 10 more points when beating them. The more players will be fide rated the more Carslen's chances will increase but without a sufficient supply of 2700s, those at 2750 won't raise in elo, etc, etc