X-Act
np: Biffy Clyro - Shock Shock
I have been researching a fair but simple rating system for the upcoming Competitor program, and here it is.
It is based heavily on the Glicko system, created by Professor Mark E. Glickman. However, I added a little detail: time, so I'll unashamedly call this rating system the Timed Glicko Rating system, or TGR for short.
Notice that the ShoddyBattle rating system is not Glicko, but Glicko-2. Glicko-2 is an improvement on Glicko since it also implements volatility, while Glicko doesn't. The volatility measures the degree of consistency of the player — the more consistent, the lower it is. However, since Pokemon is a game in which it is practically impossible to be consistent (because it is easy to win or lose unexpectedly due to luck), I deemed that volatility is basically superfluous, and stuck to the simpler Glicko rating.
I think a short explanation of how the Glicko system works would be welcome by lots of people, and hence here it is.
The Glicko system assumes that a player has a rating R and a rating deviation RD. The player would normally perform roughly as expected by his rating R, but sometimes, he has a good day, performing better than his rating would suggest, and sometimes he has a bad day, performing worse than his rating. Glickman assumed that these ‘performance fluctuations’ are logistically distributed as per the rating deviation RD. The logistic distribution is very similar to the normal distribution but is preferred because, from observation of chess games, the logistic distribution follows the probability of a player beating another one better than the normal distribution. As a consequence of the logistic distribution, a player has about a 72% chance of playing at a level within one rating deviation from the rating (i.e. at a rating between R – RD and R + RD), a 14% chance of playing better than that level (i.e. at a rating better than R + RD) and a 14% chance of playing worse than that level (i.e. at a rating worse than R – RD).
It can be seen from the above that if the player’s RD is large, then his performance is more uncertain than if his RD is small. For example, suppose there are two players, Albert and Ben, both having a rating of 1500, but, whereas Albert has a RD of 200, Ben has a RD of 50. This would mean that Albert is expected to play at a level of rating between 1500 – 200 and 1500 + 200, or between 1300 and 1700, whereas Bob would be expected to play at a level between 1500 – 50 and 1500 + 50, or between 1450 and 1550. It can be clearly seen here that Bob’s playing performance is much more certain than that of Albert, even though they have the same rating.
In the Glicko system, the RD becomes smaller the more games that player plays. By playing games, the rating can consequently become more certain, thus lowering the RD. However, if the player stops playing for a long time, then his performance when he returns playing will be more uncertain. Thus, the Glicko system increases the RD of players that are inactive for long periods of time.
Having a low RD also results in a player’s rating changing more slowly, especially if he plays against a player with a high RD. Conversely, having a high RD results in that player’s rating changing more quickly, especially if he plays against a player with a low RD. Consider, for example, that Albert plays against Ben and wins. Albert’s rating would increase by a whopping 86, becoming 1586, while Ben’s rating would decrease only by 6, becoming 1494. If Albert loses, his rating would decrease by 86, becoming 1414, while Ben’s rating would increase by 6, becoming 1506. The reason for why this happens is the following. Since Albert’s rating is much more uncertain than Ben’s, beating Ben would mean that Albert’s rating is supposed to be much more than 1500. On the other hand, Ben’s rating wouldn’t lower by much after Albert beats him because Albert’s performance is very uncertain, and hence little information can be gained from such a loss. The reverse argument would follow if Albert loses to Ben.
The rating system that I am proposing basically follows the same line of thought as what’s been said above. The main differences are in the way the RD changes. The normal Glicko system always increases the RD slightly by the same amount, governed by a constant c, after every match, and subsequently lowers it according to both the player and the opponent’s ratings and RDs. The Timed Glicko Rating system does not increase the RD by the same amount, but by an amount proportional to the time passed between a battle and the one before it. If a player plays frequently, his RD would thus be lowered by more than for a player that plays rarely, because his RD would first only be increased by a relatively small amount and then lowered accordingly.
Another difference is that players whose RD is at least 100 have their rating listed as provisional. Ratings cannot feature in the ladder leaderboard until they are not provisional (or, alternatively, are placed at the very bottom of the leaderboard). Furthermore, every player’s RD is updated once per day depending on how long it took them to play their last battle, so that players that are not playing would see their rating turn provisional.
As was said before, having a low RD makes a player’s rating change slowly. Sometimes, it is so low that the rating does not change appreciably even if that player starts to win or lose a lot of games. However, with SC = 20, the RD won't lower a lot anyway.
The final difference is not exactly related to the Glicko system per se, but to the way in which players play against each other. As in ShoddyBattle, players who wish to play on the ladder wait in a queue and get assigned a player to play against. In the new system, however, the opponent that will be assigned to a player will have between 15% and 85% chance of beating him, governed by an exact formula. This roughly translates to the opponent having a difference in rating of 300 or less, and prevents huge mismatches from occurring.
Here is the TGR algorithm in pseudocode. I have made an implementation of it on Excel, and I know it works well:
The following are the constants used for TGR. SC is the factor by which your deviation increases over time. With this number, the rating of the most ardent of players would still take at most 20 days of inactivity to become provisional. Q is a number that is multiplied by the ratings deviation to convert it to the real standard deviation used in the logistic distribution. PI is also involved in the logistic distribution, since the standard deviation of a logistic distribution is equal to s * sqrt(3) / pi, where s is the parameter of the distribution. P is a constant equal to the expected probability that two equally skilled players' battle result has a deserved outcome.
The following is executed whenever the current time as a fraction of the whole day needs to be found.
The following is called whenever the expected probability that Player1 wins against Player2 needs to be known, given their ratings and deviations.
The following is executed whenever a new player enters the ladder. A new player entering the ladder has rating R=1500 and rating deviation RD=350. The time he joined the ladder is also noted.
The following is the queue system. A player that wishes to play on the ladder is put in a queue, and only plays if there's another player waiting on the ladder whose probability of winning against him is between 15% and 85%. If there's no such player after waiting for 5 minutes in the queue, the player wishing to play will still have his time updated, otherwise his deviation would unfairly increase by the same amount as for the other players who didn't attempt to play.
The following is executed when a battle ends. When a battle ends, both players' ratings are updated.
This is called to update a player’s rating and rating deviation. It is based heavily on Glicko, but takes also into account the time passed from the previous game to the one just played and the luck influence that the battle could have had.
The following is executed every midnight. Every midnight, all the players' rating deviation is increased depending on how long it has been since they last played a game during the last day. It will increase the most if no game was played during the last day.
The following is executed when a player's rating needs to be displayed. This uses the GLIXARE system, unless the deviation is greater than 100, in which case the rating is displayed as 'provisional'.
It is based heavily on the Glicko system, created by Professor Mark E. Glickman. However, I added a little detail: time, so I'll unashamedly call this rating system the Timed Glicko Rating system, or TGR for short.
Notice that the ShoddyBattle rating system is not Glicko, but Glicko-2. Glicko-2 is an improvement on Glicko since it also implements volatility, while Glicko doesn't. The volatility measures the degree of consistency of the player — the more consistent, the lower it is. However, since Pokemon is a game in which it is practically impossible to be consistent (because it is easy to win or lose unexpectedly due to luck), I deemed that volatility is basically superfluous, and stuck to the simpler Glicko rating.
I think a short explanation of how the Glicko system works would be welcome by lots of people, and hence here it is.
The Glicko system assumes that a player has a rating R and a rating deviation RD. The player would normally perform roughly as expected by his rating R, but sometimes, he has a good day, performing better than his rating would suggest, and sometimes he has a bad day, performing worse than his rating. Glickman assumed that these ‘performance fluctuations’ are logistically distributed as per the rating deviation RD. The logistic distribution is very similar to the normal distribution but is preferred because, from observation of chess games, the logistic distribution follows the probability of a player beating another one better than the normal distribution. As a consequence of the logistic distribution, a player has about a 72% chance of playing at a level within one rating deviation from the rating (i.e. at a rating between R – RD and R + RD), a 14% chance of playing better than that level (i.e. at a rating better than R + RD) and a 14% chance of playing worse than that level (i.e. at a rating worse than R – RD).
It can be seen from the above that if the player’s RD is large, then his performance is more uncertain than if his RD is small. For example, suppose there are two players, Albert and Ben, both having a rating of 1500, but, whereas Albert has a RD of 200, Ben has a RD of 50. This would mean that Albert is expected to play at a level of rating between 1500 – 200 and 1500 + 200, or between 1300 and 1700, whereas Bob would be expected to play at a level between 1500 – 50 and 1500 + 50, or between 1450 and 1550. It can be clearly seen here that Bob’s playing performance is much more certain than that of Albert, even though they have the same rating.
In the Glicko system, the RD becomes smaller the more games that player plays. By playing games, the rating can consequently become more certain, thus lowering the RD. However, if the player stops playing for a long time, then his performance when he returns playing will be more uncertain. Thus, the Glicko system increases the RD of players that are inactive for long periods of time.
Having a low RD also results in a player’s rating changing more slowly, especially if he plays against a player with a high RD. Conversely, having a high RD results in that player’s rating changing more quickly, especially if he plays against a player with a low RD. Consider, for example, that Albert plays against Ben and wins. Albert’s rating would increase by a whopping 86, becoming 1586, while Ben’s rating would decrease only by 6, becoming 1494. If Albert loses, his rating would decrease by 86, becoming 1414, while Ben’s rating would increase by 6, becoming 1506. The reason for why this happens is the following. Since Albert’s rating is much more uncertain than Ben’s, beating Ben would mean that Albert’s rating is supposed to be much more than 1500. On the other hand, Ben’s rating wouldn’t lower by much after Albert beats him because Albert’s performance is very uncertain, and hence little information can be gained from such a loss. The reverse argument would follow if Albert loses to Ben.
The rating system that I am proposing basically follows the same line of thought as what’s been said above. The main differences are in the way the RD changes. The normal Glicko system always increases the RD slightly by the same amount, governed by a constant c, after every match, and subsequently lowers it according to both the player and the opponent’s ratings and RDs. The Timed Glicko Rating system does not increase the RD by the same amount, but by an amount proportional to the time passed between a battle and the one before it. If a player plays frequently, his RD would thus be lowered by more than for a player that plays rarely, because his RD would first only be increased by a relatively small amount and then lowered accordingly.
Another difference is that players whose RD is at least 100 have their rating listed as provisional. Ratings cannot feature in the ladder leaderboard until they are not provisional (or, alternatively, are placed at the very bottom of the leaderboard). Furthermore, every player’s RD is updated once per day depending on how long it took them to play their last battle, so that players that are not playing would see their rating turn provisional.
As was said before, having a low RD makes a player’s rating change slowly. Sometimes, it is so low that the rating does not change appreciably even if that player starts to win or lose a lot of games. However, with SC = 20, the RD won't lower a lot anyway.
The final difference is not exactly related to the Glicko system per se, but to the way in which players play against each other. As in ShoddyBattle, players who wish to play on the ladder wait in a queue and get assigned a player to play against. In the new system, however, the opponent that will be assigned to a player will have between 15% and 85% chance of beating him, governed by an exact formula. This roughly translates to the opponent having a difference in rating of 300 or less, and prevents huge mismatches from occurring.
Here is the TGR algorithm in pseudocode. I have made an implementation of it on Excel, and I know it works well:
The following are the constants used for TGR. SC is the factor by which your deviation increases over time. With this number, the rating of the most ardent of players would still take at most 20 days of inactivity to become provisional. Q is a number that is multiplied by the ratings deviation to convert it to the real standard deviation used in the logistic distribution. PI is also involved in the logistic distribution, since the standard deviation of a logistic distribution is equal to s * sqrt(3) / pi, where s is the parameter of the distribution. P is a constant equal to the expected probability that two equally skilled players' battle result has a deserved outcome.
Code:
SC = 20;
Q = ln(10) / 400;
PI = 3.14159265359;
P = ...
Code:
Subroutine DayFrac():
Get current Hours, Minutes, Seconds;
return (Hours * 3600 + Minutes * 60 + Seconds) / 86400;
Code:
Subroutine WinProb(Player1, Player2):
G = 1 / sqrt(1 + 3 * Q^2 * (Player1.RD^2 + Player2.RD^2) / PI^2);
return (1 / (1 + 10^(-G * (Player1.Rating - Player2.Rating) / 400)));
Code:
Create NewPlayer;
NewPlayer.Rating = 1500;
NewPlayer.RD = 350;
NewPlayer.Time = DayFrac();
Code:
Time = DayFrac();
P = 0.35;
While P = 0.35 and DayFrac() < Time + 300/86400 do { [I][300 refers to 300 seconds = 5 mins. This can be changed.][/I]
For every Opponent waiting for a ladder match do {
ProbWin = WinProb(Player, Opponent);
If abs(ProbWin - 0.5) < P then {
Opp = Opponent;
P = abs(ProbWin - 0.5);
}
}
}
If P = 0.35 then {
Player.Time = DayFrac();
display("Sorry, no player is available to battle");
}
else Play Match versus Opp;
Code:
If Player1 won the battle against Player2 then Win = 1 else Win = 0;
Time = DayFrac();
UpdatePlayer(Player1,Player2,Time,Win);
UpdatePlayer(Player2,Player1,Time,1-Win);
Code:
Subroutine UpdatePlayer(Player1,Player2,Time,Win):
PTime = Time - Player1.Time;
PRD = min(sqrt(Player1.RD^2 + PTime * SC^2), 350);
PG = 1 / sqrt(1 + 3 * Q^2 * Player2.RD^2 / PI^2);
PE = (1 / (1 + 10^(-PG * (Player1.Rating - Player2.Rating) / 400)));
XD = abs(Win - WinProb(Player1, Player2)); [I][Deviation from the apriori expected probability of winning and the real outcome after the battle.][/I]
PMERIT = 1 - (1 - P) * XD * (1 + 2 * XD); [I][Quadratic Weighting's assumed probability that the result was on merit.][/I]
V = 1 / (PG^2 * PE * (1 - PE) * Q^2);
Player1.Rating = Player1.Rating + Q * PG * (Win - PE) * (2 * PMERIT - 1) / (1 / PRD^2 + 1 / V);
Player1.RD = 1 / sqrt(1 / PRD^2 + 1 / V);
Player1.Time = Time;
Code:
For every Player on the ladder do {
PTime = 1 - Player.Time;
Player.RD = min(sqrt(Player.RD^2 + PTime * SC^2), 350);
Player.Time = 0;
}
UpdateLeaderboard;
Code:
If Player.RD > 100 then display "Rating is Provisional"
else display "Rating is " + round(10000 / (1 + 10^(((1500 - Player.Rating) * PI / sqrt(3 * ln(10)^2 * Player.RD^2 + 2500 * (64 * PI^2 + 147 * ln(10)^2)))))) / 100 + "%"