Home > Misc

Acceleration in Swiss tournaments with small fields

(Skip to the main results table.

Download Python code; assumes bbpPairings is installed; at times assumes a Linux-like operating system. The file run_swisses.py is the "main" one.)

This page is primarily aimed at people who already understand the title, but I've written a general introduction for those interested in a vaguely mathsy way about constructing such tournaments, which are very common in chess. Show intro.

Summary conclusions

I ran 10,000 simulated Swiss tournaments of various playing fields and with various acceleration methods. Of the non-permanent accelerations tested, all had at least two rounds with no acceleration at the end.

*Peculiarities of a particular rating distribution in a field of players can throw up some odd quirks. e.g., with the WA Championship field, no acceleration, the top seed wins 47% of the simulated tournaments, while the second seed, only 17 rating points weaker, wins just 40%. With any sort of acceleration applied, these two percentages become much closer, which to my eye is more appropriate. But I can fiddle with the ratings and make it so that the second seed, under acceleration, wins more often than the top seed. I would not draw any general conclusions from quirks like these.

Analysis intro

I've taken my inspiration from IA Otto Milvang, whose analysis (PDF) of the statistics of many simulated Swiss tournaments with differing acceleration methods led to the recommendation by FIDE to use 1 fictitious point for the first three rounds, and 0.5 fictitious points for rounds 4 and 5 (this is now called the Baku system).

Milvang's primary area of study was very large 9-round Swisses where there is a possibility of IM or GM norms being earned. In WA we don't have enough strong players for norms to be possible, and my attention is on small 6-round weekenders. One statistic that I look at which is not covered in Milvang's report is the inequality of the mismatches that commonly arise after the acceleration is switched off.

I ran 10,000 simulated Swiss tournaments, mostly with real or almost-real playing fields from some recent weekend tournaments held in Perth. The pairings were generated by bbpPairings (a command-line tool) using the Dutch option. I wrote a Python script to generate an initial tournament (trfx) file, and then call bbpPairings on that file to generate the pairings. The script then simulates results for those pairings, generates an updated tournament file, calls bbpPairings again for the next round, and so on.

Simulating results requires a model of the probability of a draw – the rating difference merely tells us the expected score, but doesn't say if an expected score of 0.5 means that all games should be drawn, or if all games should be decisive with both players having a 50% chance of winning. I used Milvang's model, whose derivation is described in this document (PDF), which comes complete with C# code that I lifted and adapted for my purposes.

Measuring inequality

Even though it feels very intuitive that switching off the acceleration leads to mismatches distributed unequally across the top players, I found it surprisingly difficult to come up with a statistic to describe it. For several days, as I was working on this project, my best guess was that such inequality was an illusion, and that unaccelerated Swisses had the same sort of inequality, just hidden better across different rounds.

Eventually though I did think of an inequality measure that made enough sense for me to code it, and once I did so it became immediately obvious that the conventional wisdom is correct, and that acceleration leads to more unequal distributions of easy games for the top players. I am sure that it is possible to improve upon the statistic that I used, which is a hackish, aesthetically displeasing thing, which I will now slowly describe.

The basic unit of the analysis is easy games played by the top players; to save on characters, I call these games "gimmes". What constitutes a gimme? I'm not too bothered here, and I coded several options – an expected score greater than 0.85; an expected score similar to what you'd get in round one of an unaccelerated Swiss; an opponent in the bottom half of the field (by rating). The hope is that, whichever definition of "gimme" is used, the results will generally point in the same direction.

But a conceptual hurdle that immediately arises is that, in any sort of Swiss, losing a game usually means getting an easier opponent in the following round. So analysing the distribution of gimmes as defined above will capture two distinct processes, mixed up together – an unequal distribution of mismatches, and the Swiss draw working as intended. So, I added an an extra condition of a gimme: it only counts as one if the player is within half a point of the lead.

The main concern is about inequality of gimmes amongst the top players, and I chose to consider the top six, since in WA-sized events, there are usually about six players who have a realistic shot at winning the tournament. (This is one of the aesthetically displeasing parts of my analysis – ideally I wouldn't hard-code the number of relevant players into it.)

It is tempting at this stage to borrow from economics and calculate a Gini coefficient (often used as a measure of income inequality) for the distribution of gimmes over the top six players. But the Gini coefficient used on the number of gimmes does not satisfy one important property that we'd like the inequality measure to have. Suppose one distribution of gimmes is [1, 0, 0, 0, 0, 0] (i.e., one gimme for one player, none for the others), and another distribution is [2, 0, 0, 0, 0, 0] (two gimmes for one player, none for the others). The second one is clearly more unequal, but the Gini coefficient for both is identical, since in both cases, one player gets 100% of the gimmes. (The Gini coefficient for income inequality doesn't change if everyone's income increases by a constant factor – it's about the fraction of total income earned by each person.)

So, in another somewhat displeasing move, I tried to convert the distribution of gimmes into something that a Gini coefficient would be useful for. I idealised away from a tournament involving head-to-head games, and instead modelled my collection of six players as having a coin-flipping contest: whoever gets the most heads in six (number of rounds) coin flips is the winner (possibly tied), and a gimme is a guaranteed heads. (Another way this is un-chess-like is that it treats all the top six players as equal in strength.)

If the number of gimmes is equal for all six coin-flippers, then this is a fair competition, and the distribution of the probability of winning is [1/6, 1/6, 1/6, 1/6, 1/6, 1/6]. The Gini coefficient for this distribution is zero. But if the distribution of gimmes is [1, 0, 0, 0, 0, 0], then (a million simulations tell me) the distribution of the probability of winning becomes [0.389, 0.251, 0.250, 0.251, 0.251, 0.250], and its Gini coefficient is 0.071.

(Note that the sum of probabilities is greater than 1, because often there is a tie for the win. Also, obviously the probabilities should be equal if the number of gimmes is equal, but I didn't try to correct the probabilities in this way, figuring that the differences are small. I don't know if it's feasible to calculate the probabilities exactly, rather than by simulating millions of coin flips.)

If the distribution of gimmes is [2, 0, 0, 0, 0, 0], then the distribution of probability of winning is [0.534, 0.218, 0.217, 0.218, 0.218, 0.218], and the Gini coefficient is 0.162. This is promising! It is a bigger number than the 0.071 from earlier.

[1, 1, 1, 1, 1, 0] gives 0.056.
[1, 1, 1, 0, 0, 0] gives 0.113.
[2, 1, 1, 0, 0, 0] gives 0.191.

Your intuitions may differ on which distributions of gimmes ought to be considered more or less unequal, but I think these results are good enough to work with. I let my code run overnight to generate a file of Gini coefficients for all possible distributions of gimmes up to six rounds, so that during the main analysis of the simulated Swiss tournaments, I could count the gimmes for the top six players and then look up the Gini without having to do more calculation.

The output file in the download is inequality_ginis.csv, and the code to generate it is in estimate_ginis.py.

Main simulation results

I've run the simulations on four different playing almost-real playing fields from Perth tournamnets played on weekends, and one linear distribution of players. The fields are only "almost" real because I manually changed or entered some ratings – I wanted to avoid the issue of incorrectly seeded players (covered in the next section) to isolate the effects of different acceleration systems even in the ideal case of correctly seeded players.

All simulations were for a six-round tournament (even though the WA Champs goes for seven rounds). In the drop-down menu showing acceleration options, the numbers refer to the number of fictitious points added in each round. One method has a "(0.5)" in it; the brackets mean that the acceleration is only applied if there is at least one bottom-half player still on a perfect score (it is a slight modification of the method proposed at the start of this Chess Chat thread). The fictitious points are added to the top 2*ceil(N/4) players.

Winning percentages can add up to more than 100% because of ties.

There are various definitions available for "gimmes". The "based on round 1" method is defined as follows: take the top six boards of an unaccelerated Swiss, giving Black to the stronger player on each board; take the minimum expected score for the stronger players over those six boards; then substract 0.05 to define a threshold. The "bottom-half opponent" and "non-accelerated opponent" can be different because sometimes not precisely half the field gets the acceleration (e.g., a 22-player field has 12 players accelerated).

The players table shows the winning percentage across all simulations, and then in subsequent columns isolates those simulated tournaments in which the player had 0, 1, 2, or 3+ gimmes. By looking at how the win percentage goes up with the number of gimmes, you can get a feel for how important the inequality of the gimmes is, but I haven't tried to summarise this with a single statistic.

Instead the inequality indices, being the mean Gini coefficient as described in the previous section, are shown in a third table below. The inequality indices for the first two gimme definitions (expected score threshold of 0.85, and threshold based on round one) are typically higher than the third and fourth gimme definitions. I expect (without having investigated) that this is simply due to the very top player(s) being much stronger than their opponents and having perfectly fair pairings resulting in a high expected score for the strong player.

For reasons I haven't worked out (I hope it's not a bug), sometimes the number of gimmes a player gets is the same across a couple of different gimme definitions.

Gimme definition:
Playing field:
Player statistic:

Mean rating difference across games

Robustness to incorrectly seeded players

The main concern in this context is unrated or severely under-rated strong players being seeded into the bottom half of the field. Intuitively, we would expect that acceleration gives such players a free ride through round one and a freer ride than ideal in round two, and that these easy points give them an advantage relative to an unaccelerated Swiss (where, by contrast, they face an unusually tough draw from round one onwards).

To test this, I replaced the fourth- and fifth-bottom seeds' true strengths by 1800 and 2000, while keeping their original ratings used for seeding. In the results table below, the players are sorted by true strength, so that it's easy to compare the misrated players with correctly-seeded players of similar strength.

My summary is that the mis-seeded players get a small penalty from no acceleration (e.g., the true-2000 player is up to 5 percentage points less likely to finish top 3 than a correctly-seeded 2000 player), and a bonus from acceleration that gets larger the longer the acceleration is maintained. In particular, the gains for the true-2000 player in the 1-1-1-0.5 acceleration are very large, 10 or even 15 percentage points more likely to finish top 3 than a correctly-seeded 2000 player.

Meanwhile, the true-1800 player gets average scores around about those of the correctly-seeded 1900 players with the 1-1-1-0.5 acceleration.

Gimme definition:
Playing field:
Player statistic:

Posted 2018-07-14.

Home > Misc