đž Archived View for dioskouroi.xyz âş thread âş 29414065 captured on 2021-12-04 at 18:04:22. Gemini links have been rewritten to link to archived content
âŹ ď¸ Previous capture (2021-12-03)
âĄď¸ Next capture (2021-12-05)
-=-=-=-=-=-=-
________________________________________________________________________________
Centipawn loss (or simply the engine's evaluation of a position) doesn't take into account how realistically a human could hold a position.
During yesterday's WCC Game 6 the computer evaluation meant little when players were in time trouble. Anything could have happened going into the first time control, despite the game being dead drawn for the first 3.5 hours.
In the final stages the computer again evaluated the game as drawn, but presumed Nepo could defend _perfectly_ for tens of moves without a single inaccuracy. Super GMs can't do that given hours or days, let alone minutes.
Last thought: did anyone else assume this was written in R/ggplot2 at first glance? Seaborn and/or matplotlib look strikingly like ggplot2 now days!
You are definitely right about the evaluation; I switched between several streams and I don't think there was anyone saying that they didn't prefer white after around move 39, despite a 0.0 eval for a lot of those positions. But part of the reason the eval is misleading is because it might be reflecting a sequence of "only moves" - where only one move can hold that evaluation and it may be very hard to find some of those moves for black, while white has lots of good moves in each position. While that is a problem with human interpretation of eval, I do not see how it invalidates use of ACPL which is an average across the entire game.
I was thinking a bit about this during the game too.
Perhaps alongside centipawn loss (a measure of how many hundredths of a pawn a player loses by making the non-optimal move as determined by a chess AI engine) we could also measure the difficulty of any position.
Stockfish (a popular chess engine) roughly works by constructing a tree of possible moves and evaluating the score according to some heuristic at its maximum depth. The best result at depth n (25 I believe) is considered the best move and incurs 0 centipawn loss.
Perhaps we can define the difficulty of a position by the relative centipawn loss at each preceeding depth in the tree? The difficulty of a position is then determined by the depth at which the best move no longer changes.
This is an interesting thought! A couple of other scattered thoughts I had about this:
- Engine evaluation of a leaf of the tree will always be different and more sophisticated than human heuristics. So there's a problem where a human can't be expected to follow down some lines. Of course, this is always changing, as humans seek to understand engine heuristics better. Carlsen's "blunder" at move 33 was a good example of this, from my memory.
- Maybe there's a difficulty metric like "sharpness", some function of the number of moves which do not incur a significant centipawn loss. Toward the end of game 6, Carlsen faced a relatively low sharpness on his moves, whereas Nepomniachtchi faced a high sharpness, and despite the theoretical draw, this difference will prove to be decisive between humans. This seems like it could interact in interesting ways with your difficulty metric - for example, what does it mean if sharpness is only revealed at high depth?
- It would be interesting to take the tree generated by stockfish, and weight the tree at each node by the probability that a human player would evaluate the position as winning. Then you could give a probability of ending up at each terminal position of the tree. Maybe some sort of deep learning model trained on players previous games? Time controls add such a confounding factor to this, but it would be so interesting to see "wild engine lines" highlighted in real-time.
In my opinion, neural network engines like Alpha Zero or Leela Zero do a much better job of assessing how difficult a position is to hold. They also report the evaluation in a completely different manner, win/draw/loss probability as opposed to centipawn loss.
For example, in yesterdayâs game Stockfish was often giving a drawn evaluation (0.00) where Leela Chess gave a win probability of 30%+. I was posting about this during the game.
https://twitter.com/nik_king_/status/1466794534214504454?s=2...
Stockfish also uses a neural network for evaluation.
Yes, but at its heart it's still classical alpha-beta search rather than monte carlo.
True, itâs works differently though. And Stockfish does not compute win/draw/loss probabilities as part of its eval. It converts cp to wdl using an exponential fit based stockfish vs stockfish games. So draw percentage in Leela is a lot more interesting and useful.
> In the final stages the computer again evaluated the game as drawn, but presumed Nepo could defend perfectly for tens of moves without a single inaccuracy.
I agree that itâs not very useful to compare with table bases, especially given the â30 seconds time added per moveâ regime this was played under by the time they reached the position.
However, I donât think the table bases even have enough information to indicate how close to losing a theoretically drawn position is. So, i donât think this required perfect accuracy to defend against (defining âinaccuracyâ as any move for black that either makes it take longer to reach a draw or moves to a losing position. That, I think, is the most reasonable definition)
and exhaustion also starts to play a role after playing for like 7 hours, at some point mistakes will start to slip in.
Why isn't the time left of each player an input to the evaluator It shouldn't assume that everyone has plenty of time!
Iâve thought about this, too. Sadly, times only started being recorded in the last 20 years and last time I tried I couldnât find a large dataset with the times included.
Chess.com did do one study and found a large percentage of mistakes occurs in moves 36-40 because in some time controls additional time is added at move 40.
This is merely the seaborn âdarkgridâ style option. You need to set it explicitly if you want this effect.
Precision is a murky concept in chess because it is not a solved game. First, if the move doesnât change the best play result, can it really be called imprecise? Only in terms of practical chances.
And if we are talking about practical chances, why should we rely on computer-centric evaluation? If a human has to choose between a move that leads to the win but they have to find 40 best moves or they will lose and a move that is a theoretical draw but now the opponent has to find 40 moves or they will lose, what should a human choose?
What is even the ACPL of a move from a tablebase? There is no value, it is either a win, a draw or a loss. So while the whole idea behind this exercise is intuitively appealing and certainly captures some sense behind the idea of accuracy, it should be taken with a grain of salt.
>Precision is a murky concept in chess because it is not a solved game
it's ironically also a murky concept for the opposite reason. In some openings the analysis of GM's goes so deep that they can fairly often play almost exclusively computer-aided prep. There's a big difference between a 40-move game in that kind of theoretical position vs off-beat games.
Depending on the style of the players and how open the games are it gets pretty complicated to figure out what precision actually implies for the games in any real sense.
Yep, agree with this. It was surprising to me to find out how large the error bars were in computer evaluations of their own games. In the last TCEC superfinal, for example, the majority of the draws saw at least one position with an evaluation higher than +/-1.0, and 3 of the 26 decisive games came after a 0.0 evaluation. I assume that margin should be even bigger in human games, so itâs hard to see what there is to learn (outside some fun lines) from looking at these numbers outside of opening prep.
As for the tablebase question, it would be nice to see win/forced-draw probabilities from engines instead of the increasingly artificial material evaluation.
_If weâd used a different chess engine, even a weaker version of the same one â such as Stockfish 12 â it may have found the 2018 World Championship the most accurate in history (assuming both players prepared and trained using Stockfish 12 in 2018)._
This would be a really good follow-up experiment. If the theorized result really happens, we would have strong evidence that players are "overfitting" to their training chess engine. It would also be interesting to see how stable the historical figures look between different engines.
Yes "Alan Turing, was the first recorded to have tried, creating a programme called Turochamp in 1948."
But also
"Since 1941 Zuse worked on chess playing algorithms and formulated program routines in PlankalkĂźl in 1945."
https://www.chessprogramming.org/Konrad_Zuse
I'm most impressed by Capablanca's jump in highly accurate play back in 1921, that would not be surpassed for another 60 years.
Capablanca preferred playing quiet, positional chess, often patiently nursing tiny endgame advantages into a win. It's generally much easier to play accurately in simple endgames than complex middlegames, and that's a big factor in his low ACPL score. (I don't mean to take anything away from him - he was probably the best ever at that style of play.)
In part high accuracy means that the opponents are approximately of equal strength. The way people win games in chess is by complicating the position so much that it forces your opponent to make a mistake while you are able to handle it.
If the accuracy is high, not only it means that the players are good, it also means that they donât ask each other serious questions. Put any human against Stockfish and, I am sure, their ACPL will increase dramatically.
Capablanca without any doubt. Also Morphy.
In chess ACPL roughly works like goals scored (conceded) in football. Goals are made when the defending team makes mistakes. A team that is a master of defense will concede few goals. But will also score few goals since defending well requires playing cautiously. Its the same with attacking, aggressive teams. They both score and concede more goals than the average.
Why not use something similar to alphago zero to carefully analyze chess games of a deceased player until it is able to mimic its decisions?
It could bring many players "back to life". It would be even possible to watch "impossible matches" like Kasparov vs Capablanca!
Youâd be training purely on games with outdated theory, in which case the engine would lose to those trained from more modern repertoires. Or youâd let it learn through self play after initially showing it the human games, in which case it would probably quickly lose the identifiable stylistic aspects of its initial training.
The point isn't to make an unbeatable chess player, it's to 'bring them back to life'.
But what I mean is Kasparov would destroy Capablanca. Even outside of what one might consider âraw chess talentâ, he was drawing on decades of better theory and would deploy that knowledge. It would be hard to simulate Kasparov as if he were taught chess in Capablancaâs time (maybe not impossible and a fascinating project, I just donât see how youâd do it).
> It would be hard to simulate Kasparov as if he were taught chess in Capablancaâs time (maybe not impossible and a fascinating project, I just donât see how youâd do it).
I don't think they were suggesting that's the result they wanted - if you could somehow magically reanimate Capablanca in real life and pit him against peak Kasparov, he might lose badly.
A neural net having the same outcome is essentially what's being asked for. Kasparov raised on Capablanca's era chess or vice versa would be unrecognizably different players, and I don't think anybody expects an AI to simulate their soul.
Fair enough. I donât think this is as interesting an experiment as people think though. Nobody wants to see Morphy on zero points but thatâs what would happen.
I already want to take this back because Morphy probably would pick up points if we ran a tournament on the basis of all official and unofficial world champions, plus people with openings named after them. But the correlation between date of peak and performance would be extremely high.
That seems like a complex problem. People donât play that many chess games in their life to get a dataset on the scale required for neural networks. You would need to train a general chess engine and then to tweak it using few-shot learning. But I doubt it could capture high level ideas behind player styles unless someone comes up with a smart architecture for that.
Chess.com has the "personality bots" that supposedly play with the style of various well-known players, streamers, and GMs.
But I remember watching Hikaru Nakamura stream once playing through each of these bots (and beating them fairly easily). He commented that several of the bots were doing things the real players would never do, both in style and even the opening move (1.e4 for a player that almost always opens 1.d4)
It was fairly early after the personality bots came out, so maybe they've fixed it by now.
Chessmaster 3k had that feature. But I was never good enough in chess to evaluate how well it worked. Still, I thought about the simplest method:
- get a chess playing algorithm (I think it will probably well with minimax or mcts) with many tunables, - use a genetic algorithm to adjust the tunables of the first algorithm; use how similar it plays (make it choose a move on positions from a database of games from said player) as a goal function.
Doesn't seem terribly complicated to do, but don't know how similar to a human it would play.
My guess on the personality bots is that they set the bot to play at the players' current rating, not training ml based on the games.
I think it would be worth looking at a playerâs accuracy in terms of their cohortâs standard deviation, given that theory is more or less shared across all players. Even then, the best players now have the best teams and computers, so a lot of Magnusâs accuracy in this game is a credit to Jan Gustafsson et al. Iâve been thinking how you might capture the playerâs accuracy out of their prep, that seems a better measure, but even then youâre so often choosing between five +0.0 moves by the middle-game, and you could easily play many totally accurate moves if you didnât feel like agreeing a draw. I know some have looked at Markov models of a playerâs likelihood of a blunder to analyse this instead.
Personally Iâve never felt Magnus enjoyed the modern game with as much opening preparation as we have now. It seems like heâs only in the last few years invested the time in this, instead of relying on his technique to win even from losing positions. I hope AlphaZero proving that fun positional ideas like pawn sacrifices and h4 everywhere reinvigorated him somewhat during his dominant first half of 2019, so thereâs still hope the machines havenât just drained the romance from the game, even if their ideas remain dominant.
One of the reasons that Poker players prefer tournaments is because it induces them to move away from perfect Nash equilibrium play and into being exploitable, as someone who plays unexploitable play simply doesn't make it to the money as someone who does. Winning 51% of the time means nothing when you need to be in the top 10% to earn anything back.
It seems like just looking at ACPL isn't looking at this correctly. If someone makes a mistake, and loses some centi-pawn, but it induces an even larger mistake in their competitor, that wasn't a mistake, it was a risk.
I'm not sure how meaningful these numbers are. I get around 40-50 ACPL in my games, and I certainly wouldn't have been anywhere near a match for Botvinnik.
Is there a risk that this measure is telling us as much about how likely a match was to contain difficult positions as about how skilled the players were?
For example, Karpov and Kasparov sometimes agreed short draws. I wonder if that is flattering their figures.
Definitely - if there are lots of good moves to be found accuracy will be higher. This is why when you analyze games for suspicion of cheating you cannot look only at the accuracy figure - you have to take into account how challenging the positions are. Lichess and chess.com both do this but they do not tell us how, for obvious reasons.
> Lichess and chess.com both do this but they do not tell us how, for obvious reasons.
Isn't lichess open source?
At the time of publishing, the last decisive game in the World Championship was game 10 of the World Championships 2016 â 1835 days ago, or 5 years and 9 days. Is the singularity being reached, with man and machine minds melding towards inevitable monochromatic matches?
Very very unfortunate timing but still a valid question.
If âaccuracyâ measures how well a player matches computer chess, then as players continue to study more and more with chess programs, you would expect their play to match the programs more and more.
Personally I find it odd to measure how well the players match the computer program and call it accuracy. The computers do not open the game tree exhaustively so they give only one prediction of true min-max accuracy.
When Lee Sedol made move 78 in game 4 against AlphaGo, it reduced his accuracy but won him the game.
I don't know if this is a thing, but chess players might also steer the game in a direction/position which their opponent hasn't studied much, but they have. There's a "social" side to this seemingly "mathematical" game, no?
Fabiano Caruana (previous World Championship challenger) has said that heâs happy to find lines where the machines have you slightly behind, purely because theyâre less like to have been studied in detail by your opponent. Even with perfect recall of the first 20/30 moves in various lines, players are still going to steer away from some lines based on their and their opponentâs strengths (tough against super GMs with few weaknesses though). So youâre definitely right, I think thereâs a lot of game theory here, albeit much of it settled by your team ahead of the actual match.
This is more so in the opening (the beginning of the game, and separately where engines tend to be a bit less informative) but yes it is definitely part of the chess metagame, and you'll often see commentators talk about whether someone is "still in prep" or has gotten out of it. It often can lead to time advantages if one gets an opponent out of prep.
Move 78 was humanity's last great stand against AI in a board game. Lee Sedol, tired and inspired, reddens AlphaGo's ears with a move plucked from a higher dimension.
It now seems humorous that Kasparov once accused people of helping computers behind the scenes. Now chess masters have been caught huddled in bathroom stalls with their smart phones. Chess commentators choose to willfully ignore chess engines in their presentations, in order to enable our understanding of the analysis. The torch has been passed.
We should be clear: move 78 didnât really work, except that the engine got confused. Other humans and later versions of Go engines can refute it.
This is addressed in the article by the way.
It's strange how many times the article says 'chess software' has improved since (Turing's day, the 1990s, whenever). Sure, the software is better, but six orders of magnitude in hardware performance haven't hurt either.
The hardware improvement has been huge, but on the other hand if you pit Stockfish NNUE against top 1990s software on equal modern hardware, Stockfish would win handily. It's really been both hardware and software improving.
For historical human-to-human games, it would be more interesting to see how well players targeted with weaknesses of their opponents. That skill likely mattered more than absolute accuracy as measured by computers.
Just in case you, like me, were wondering what the word "accurate" means in this context:
https://support.chess.com/article/1135-what-is-accuracy-in-a...
Sorry i am highjacking this thread. I am on a quest to find the rules of the chess variant finesse by GM walter Browne. If anyone knows them :
https://lookingforfinesse.github.io/lookingforfinessevariant...
I would have liked to see this go back far enough to include Morphy, whom Fischer considered "the most accurate player who ever lived." I would be surprised if Stockfish agreed, but it would be interesting to see.
Kenneth Regan's work on Intrinsic Performance Ratings includes estimated ratings for Morphy, which vary widely from event to event but average around 2300, which I think matches the intuitive perception of his strength that modern strong players have.
https://cse.buffalo.edu/~regan/papers/pdf/Reg12IPRs.pdf
(Of course, as with all historical players, he would be stronger if he were re-animated today and exposed to modern principles and openings.)