Manual.dvi MANUAL

User Manual:

Open the PDF directly: View PDF PDF.
Page Count: 19

DownloadManual.dvi MANUAL
Open PDF In BrowserView PDF
ORDO 1.2.6
Ratings for

hess and other games

Miguel A. Balli ora

∗

Ordoa is a program designed to al ulate ratings of individual hess engines (or players). It
has a similar on ept than the Elo ratingb, but with a dierent model and algorithm. Ordo
keeps onsisten y among ratings be ause it al ulates them onsidering all results at on e. In
that respe t, it behaves similarly to BayesElo . Ordo is distributed under the GPL li ense
and binaries are available for GNU/Linux, Windows® , and OS X. In addition, the sour es
are portable and ould be easily ompiled for other systems.

©

a Copyright
2015 Miguel A. Balli ora
b http://en.wikipedia.org/wiki/Elo_rating_system

http://remi. oulom.free.fr/Bayesian-Elo/

Pre ompiled Files
In this distribution, you may nd versions for GNU/Linux (32 and 64 bits) or Windows® (64 and
32 bits). For onvenien e, you an rename the proper le for your system to ordo (GNU/Linux)
or ordo.exe (Windows® ). As an input example, a publi ly available le games.pgn is in luded1 .
A bat h le (ordo_example.bat) is in luded in the Windows® distribution2. It is a qui k and
great start for users of that operating system.

GNU/Linux ompilation and installation
After unzipping the ontents, you an type
make
make install

or in Ubuntu
sudo make install

Usage
The input should be a le that adheres to the PGN standard3 . Based on the results in that le, Ordo
automati ally al ulates a ranking . The output an be a plain text le and/or a omma separated
e-mail: mballi ora (at gmail dot om)
1 Taken from the re ently dis ontinued Ingo
2 Kindly prepared by Adam Hair

∗

Bauer's

IPON rating list

3 http://en.wikipedia.org/wiki/Portable_Game_Notation

1

value 4 (. sv) le. The . sv le is an interesting option be ause it an be opened/imported by
most spreadsheet programs. On e imported, the user an hoose to format the output externally.
The simplest way to use Ordo is typing in the ommand line:
ordo -p games.pgn

whi h will take the results from games.pgn and output the text ranking on the s reen. If you want
to save the results in a le ratings.txt, you an run:
ordo -p games.pgn -o ratings.txt

By default, the average rating of all the individuals is 2300. If you want a dierent overall average,
you an use the swit h -a to set it. For instan e to have and average of 2500, you an do:
ordo -a 2500 -p games.pgn -o ratings.txt

or if you want the results in . sv format, use the swit h - .
ordo -a 2500 -p games.pgn - rating. sv

If you want both, you an use:
ordo -a 2500 -p games.pgn -o ratings.txt - rating. sv

Multiple input les
Ordo an use several pgn input les at the same time (limited by the memory of the system),
whi h means that it is not needed to ombine them beforehand. Ea h le ould be listed after the
swit h -- like this
ordo -a 2500 -p g1.pgn -o ratings.txt - rating. sv -- g2.pgn g3.pgn "Other games.pgn"

For that reason, the swit h -p an be omitted and all les ould be listed after -ordo -a 2500 -o ratings.txt - rating. sv -- g1.pgn g2.pgn g3.pgn "Other games.pgn"

Another option to input multiple pgn les is to list them in one text le and provide the name
with the swit h -P 
ordo -a 2500 -o ratings.txt - rating. sv -P list.txt

where list.txt ontains for instan e
g1.pgn
g2.pgn
g3.pgn

Name synonyms
Sometimes, the same player (engine) has been named dierently in tournaments. The user an
spe ify what names are a tually synonyms, so Ordo will onsider them one. The swit h -Y
 indi ates that the le (. sv format) will ontain a list of synonyms. Ea h line has the
following information: main,syn1,syn2 et . An example of a synonym le with two lines would be:
"Gaviota 1.0","Gaviota v1","gaviota v1.0"
4 http://en.wikipedia.org/wiki/Comma-separated_values

2

"Sto kfish 6","Sto kfish 6.0"

In this example, Gaviota 1.0 and Sto ksh 6 would be the names used by Ordo. The other ones
will be onverted.

Ex luding games
In ertain situations, the user may want to in lude/dis ard in the al ulation only a subset of the
games present in the input le/s. Swit hes -i  and -x  are used for this purpose.
Swit h -i in ludes only the games of parti ipants listed in . In this le, ea h parti ipant
name has to be in dierent lines. Also, ea h of those names may or may not be surrounded by
quotes. Both are a epted. For this reason, if a . sv le is provided as a list of parti ipants, only
the rst olumn is read. In addition, -x ould be use to ex lude games of parti ipants listed in
.

Output information ( olumns)
The user an sele t what information is displayed with the swit h -U. For instan e, -U "0,1,3"
will sele t and print "rank and name", "rating", and "points total" olumns in that order (ea h
olumn has a predened number see below). The default output is "0,1,2,3,4,5".
0 rank and name
1 rating
2 error
3 points total
4 games total
5 s ore per ent
6 onfiden e for superiority
7 won games
8 draw games
9 losses
10 draw rate
11 opponent average rating
12 opponent average error
13 number of opponents
14 diversity (effe tive number of opponents based on information theory)

Option 14 is will be equal to option 13 if the number of games among opponents is equally
distributed.
−

OppDiv = e

N
P

i=1

!

fi ln(fi )

(1)

where fi = ni /N . N is the total number of games played by a given parti ipant, and ni is the
number of games played agains opponent i.
Another swit h (-b ) ontrols the width of ea h olumn in the output and the header of ea h
olumn. This  onsists of a text in whi h ea h line has ,,"HeaderName".
3

For instan e, if you want to have the item 4 (games total, see above), to be 5 hara ters wide and
be named "Points", you will have to in lude a line like this:
4,8,"Points"

In a le named olumns.txt (or whatever name you hoose) and then in lude the swit h -b
olumns.txt Note that the width sele ted for item 0 (rank and name), will be ingnored sin e it
is automati ally adjusted. A default le would look like this (spa es are ignored):
0, 0, "PLAYER"
1, 6, "RATING"
2, 6, "ERROR"
3, 7, "POINTS"
4, 7, "PLAYED"
5, 5, "(%)"
6, 7, "CFS(next)"
7, 4, "W"
8, 4, "D"
9, 4, "L"
10, 5, "D(%)"
11, 7, "OppAvg"
12, 7, "OppErr"
13, 5, "OppN"
14, 7, "OppDiv"

De imals
The swit h -N provides the ability to ontrol the pre ision displayed for the ratings (and their
errors). For instan e, -N2 will output ratings with two de imals. If an extra parameter is provided
separated by omma, it will ontrol the number of de imals for the s ore and draw per entage.
For example, the default is a tually -N0,1 or -N "0,1". If a se ond parameter is not provided,
the default is used.

An hor
The swit h -A will x the rating of a given player as a referen e (an hor ) for the whole pool.
ordo -a 2800 -A "Deep Shredder 12" -p games.pgn -o ratings.txt

That will al ulate the ratings from games.pgn, save it in ratings.txt, and an hor the engine
Deep Shredder 12 to a rating of 2800. Names that ontain spa es should be surrounded by quote
marks as in this example.

White advantage
The swit h -w sets the rating advantage for having white pie es in hess. Alternatively, the (highly
re ommended) swit h -W lets Ordo al ulate it automati ally. With this swit h we an omplete
the above example:
4

ordo -a 2800 -A "Deep Shredder 12" -p games.pgn -o ratings.txt -W

If the user knows that the white advantage is a value within a ertain range, this un ertainty ould
be given by the swit h -u. A ombination of swit hes -w  and -u  may
provide a prior information that Ordo will use to al ulate the white advantage. As the number
of games played in reases, the prior information will be less and less relevant. Ordo assumes a
Gaussian distribution entered in  with a standard .

Simulation and errors
The swit h -s  instru ts Ordo to perform  simulations, virtually replaying the games 
times. The results will be randomly re-assigned for ea h game a ording to the probabilities
al ulated from the ratings. After running the simulations, and based on all those dierent results,
Ordo al ulates standard deviations (errors ) for the ratings. For this purpose, an optional swit h
is -F value, where value is the % onden e level (The default is 95.0, whi h is roughly equivalent
to ± 2 standard deviations). The errors displayed are relative to the pool average. However, if one
of the players is an hored, the rest of the errors will be relative to that an hor. In this ase, the
an hor error will be zero sin e it is the point of referen e. To get the errors for rating dieren es
between a given pair of players, the swit h -e file. sv should be added. It will generate an error
matrix saved in file. sv.
A minimum reasonable number for the simulations is about -s 100. The more simulations, the
longer it takes to omplete the runs. The errors al ulated will be more a urate, but more than
1000 simulations is probably not needed. This is an example to use these swit hes:
ordo -a 2800 -A "Deep Shredder 12" -p games.pgn -o ratings.txt -W -s1000 -e errs. sv

It is important to emphasize that the errors displayed in the output are always against the referen e
(an hor). For example, if the an hor is Engine X (Deep Shredder 12 in the example above) set
at 2800, and Engine Y is 2900 with an error of 20, then the interpretation is that the dieren e
between Y and X is 100 ± 20.
As mentioned above, when no engine is set as an hor the hidden referen e is the average of the
pool. For instan e, if the average is set to 2500 (default is 2300) and the rating output for Engine
X is 2850 ± 20, the dieren e between Engine X and the average of the pool is 350 ± 20. That is
how the output should be interpreted. It is in orre t to use this error to estimate relative values
against other engines. For that purpose, the swit h -e needs to be provided to obtain a matrix
with every single error for every engine-engine mat h ups.
If an an hor (referen e) is provided, but the user wants relative errors to the average of the pool,
the swit h -V should be used. This is what other rating software has as default.
ordo -a 2800 -A "Deep Shredder 12" -p games.pgn -o ratings.txt -W -s1000 -e errs. sv -V

In this ase, you will see that the rating of Deep Shredder 12 will not have an error of zero.

Parallel al ulation of simulations
If the swit h -n  is used, Ordo will use  number of pro essors in parallel for the
simulations. This may be a signi ant speed-up.
5

Superiority onden e
If simulations have been run, using the swit h -C will output a matrix with the onden e for
superiority (CFS) between ea h of the players. Ea h of the numbers is an answer to the question
"What is the maximum onden e I an set the test to show that player x is not inferior to player
y and still obtain the same positive answer?". The matrix le is in omma separated values format,
and it ould be opened by any spreadsheet program if it was saved with the *. sv extension. In
addition, if the user provides the swit h -J, the CFS values between the player and the next one
in the ranking will be displayed in the output.

Draw rate (for equal opponents)
By default, Ordo onsiders that the draw rate for evenly mat hed players is 50%. Internally, it
al ulates the draw for mat hes in whi h a player is stronger than the other. This parameter
does not hange the rating results, but it will ae t the errors al ulated after simulations. Two
swit hes an ontrol this parameter. First, -d sets the draw rate (whi h is assumed to be onstant
throughout the database). Alternatively, the (highly re ommended) swit h -D lets Ordo al ulate
it automati ally. It makes sense if the user wants to al ulate more a urate errors, or just for
informative purposes. For instan e:
ordo -a 2800 -A "Deep Shredder 12" -p games.pgn -o ratings.txt -W -s1000 -e errs. sv -V
-D

Will al ulate the draw rate and outputs it at the end of
ratings.txt

(or the s reen, if the swit h -o is omitted). Similarly to the al ulation of the white advantage, the
user an provide prior information for the draw rate. A ombination of swit hes -d  and
-k  will do that. Ordo assumes a Gaussian distribution entered in  with a
standard . When as the number of games in reases, this information will have less
and less impa t on the nal result.

Ignore draws (-X swit h)
This swit h internally ignores all draws from the database as they have not been played. This is
only present for experimentation, not for a serious rating al ulation.

Minimum games
In ertain ases, the user may not want to in lude ertain players with very few games played in the
rating. For that reason, the swit h -t  provides to the program a threshold of minimum
games played for a parti ipant to be in luded in the nal list. The games are still in luded for
al ulation.

6

Perfe t winners and losers
Players who won all games (perfe t winners ) or lost all of them (perfe t losers ) reate problems in
the rating al ulation. It is impossible to estimate those rating a urately be ause winning all or
losing all orrespond to a +∞ or −∞ rating, respe tively. In addition, the al ulation slows down
onsiderably be ause of the impossibility to onverge. Ordo removes these players automati ally
during the al ulation, and pla e them ba k after onvergen e has been rea hed. The rating
assigned to them is a minimum ("oor") for perfe t winners and a maximum (" eiling") for perfe t
losers. This is indi ated by a > or < symbol in the output text. These limits are established
by al ulating the rating they would have had if one of the games was a draw. For example, if
player had a performan e of 10/10, a proper rating estimation lays between (+∞) and the one
orresponding to a performan e of 9.5/10. A ni e side ee t of this te hnique is that distinguishes
players with perfe t s ore that had dierent type of opposition or played dierent number of games.
It is not the same to have been undefeated for three games than twenty.

Group onne tions and pathologi al data
Sometimes, a data set ontains players or groups/pools of players that did not play enough games
against the rest. These isolated groups produ e meaningless ratings when ompared to the general
pool. The -g swit h saves a report of how many groups are in this situation. The information
in this report may guide the user to properly link those groups with extra games. Doing so will
stabilize the whole ranking. When the data set is "ill" onne ted, Ordo will attempt to run by
purging perfe t winners and perfe t losers. Their eiling or oor rating will be estimated at the
end (see above). However, a warning will be diplayed. When purging those players is not enough
to guarantee a proper onne tion, a se ond warning will be issued. But, this time the program will
stop and exit with an error ode (i.e. non-zero). To for e the al ulation even in these onditions,
the swit h -G should be used. Be areful, this ould be slow and the algorithm may not onverge.

Multiple an hors
When several players are known to have very a urate ratings, it is possible to assigned xed values
to them. In that ase, they will behave like multiple an hors. An example will be:
ordo -p games.pgn  -m an hors. sv

where an hors. sv is a le that ontains lines like this
"Gull 1.1", 2350.0
"Glaurung 2.2 JA", 2170
"Crafty 23.1 JA", 2000

telling Ordo to x Gull 1.1, Glaurung 2.2 JA, and Crafty 23.1 JA to 2350, 2170, and 2000,
respe tively. The name of the an hors should be present in games.pgn.

Mat h up information
The swit h -j will output to a le information about all dierent mat hes that have been played.
It shows the rating dieren e (Di) between those parti ular players and the standard deviation
7

(SD) for that dieren e. These values ome from the simulations performed with the swit h -s, so
everything is taken into a ount, not only the information about a parti ular mat h. In addition,
there is a olumn with the onden e that would be needed in order to be able to laim superiority
based on Di and SD. The olumn is CFS, onden e for superiority, whi h plays the same role
as the likelihood of superiority 5 . A fragment of the output is:
3) Critter 1.4 SSE42

2562 :

vs.
Houdini 2.0 STD
Komodo 4 SSE42
Deep Rybka 4.1 SSE42
Sto kfish 2.1.1 JA

:
:
:
:
:

2400 (+1467,=772,-161),
games
100
100
100
100

(
(
(
(
(

+,
23,
33,
28,
25,

=,
54,
44,
48,
66,

-),
23),
23),
24),
9),

77.2 %
(%)
50.0
55.0
52.0
58.0

:
:
:
:
:

Diff,
-35,
+6,
+23,
+44,

SD, CFS (%)
9,
0.0
11,
71.0
10,
99.0
8, 100.0

In this example, we an say that Critter 1.4 SSE42 is superior to Deep Rybka 4.1 SSE42 with a
99% onden e. We an only say that it is better than Komodo 4 SSE42 with a 71% onden e.
The reason is be ause the rating dieren e is 6 and the standard deviation is 11.

Loose an hors with prior information (-y)
Ordo oers an alternative approa h to al ulate ratings with previous knowledge from the user
(using Bayesian on epts). With the swit h -y, the user an provide a le with a list of players
whose ratings will oat around an estimated value. Those players will work as loose an hors in
the list. This strategy is useful when the data is s ar e and, as a onsequen e, wild swings ould
appear in the ratings. This is what happens at the beginning of a new rating list or tournament.
Ordo a epts an estimated rating for a player, but takes into a ount how un ertain that value is.
In other words, the user also has to provide the standard error for the estimated value. That means
that the value will be 68% of the time between ± the un ertainty value provided. It is assumed
that the estimated rating will follow Gaussian distribution. In Bayesian terms, that onstitute
the prior distribution for the rating of that parti ular player. For instan e, if one line of the le
provided with the swit h -y ontains
"Houdini 3", 3200, 50

That means Houdini's initial rating is 3200 with an un ertainty of 50. With this approa h the user
should have the best edu ated guess possible, otherwise, the ranking will suer. Using information
from a previous well established rating lists an add stability to the new list and, as games are
added, the ontribution of the "previous information" will fade away.

Relative an hors (-r)
Another problem in some engine tournaments is that version upgrades enter with no previous
ratings. However, we know in ertain situations that the new versions annot have very dierent
ratings from the previous one. Therefore, the user an make a good edu ated guess about the
rating of the new version. For instan e, if you know that the new version is within 20 points of
the previous one you an use the -r swit h to provide a le with lines like this:
5 https://

hessprogramming.wikispa es. om/Mat h+Statisti s

8

"Bouquet 1.8a", "Bouquet 1.8", 0, 20

That means version 1.8a ame after 1.8 and it is estimated to have the same rating (0) with an
un ertainty of 20. With dierent versions, you an have dierent lines. An example with Sto ksh
may be:
"Sto
"Sto
"Sto
"Sto

kfish
kfish
kfish
kfish

160913",
4", "Sto
250413",
120413",

"Sto kfish 4", 0, 20
kfish 250413", 0, 50
"Sto kfish 120413", 0, 20
"Sto kfish 250313", 0, 20

This onstitute dierent relative an hors. When two versions are radi ally dierent, you an say
nothing and they will be treated as dierent engines, or for instan e
"Komodo 1063", "Komodo 4534", 0, 1000

The rst is a omplete rewrite with a parallel sear h. Thus, the un ertainty of 1000 ree ts this
fa t and make both versions virtually dis onne ted. If you want to in lude more spe i info, you
ould say
"Komodo 1063", "Komodo 4534", 160, 100

Here, 160 is the estimation of how mu h improvement you have by going from 1 ore to 16 and
100 represents how un ertain that is.

Swit hes
The list of the swit hes provided are:
usage: ordo [-OPTION℄
-h,
-H,
-v,
-L,
-q,

--help
--show-swit hes
--version
--li ense
--quiet
--silent
--terse
--timelog
--average=NUM
--an hor=
--pool-relative
--multi-an hors=FILE

print this help and exit
print swit h information and exit
print version number and exit
print li ense information and exit
quiet mode (no progress updates on s reen)
same as --quiet
-Q,
same as --quiet, but shows simulation ounter
outputs elapsed time after ea h step
-a,
set rating for the pool average
-A,
an hor: rating given by '-a' is fixed for 
-V,
errors relative to pool average, not to the an hor
-m,
multiple an hors: file ontains rows of
"An horName",An horRating
-y, --loose-an hors=FILE loose an hors: file ontains rows of
"Player",Rating,Un ertainty
-r, --relations=FILE
relations: rows of
"PlayerA","PlayerB",delta_rating,un ertainty
-R, --remove-older
no output of older 'related' versions (given by -r)
-w, --white=NUM
white advantage value (default=0.0)
-u, --white-error=NUM
white advantage un ertainty value (default=0.0)
-W, --white-auto
white advantage will be automati ally adjusted
-d, --draw=NUM
draw rate value % (default=50.0)
-k, --draw-error=NUM
draw rate un ertainty value % (default=0.0 %)
-D, --draw-auto
draw rate value will be automati ally adjusted
-z, --s ale=NUM
set rating for winning expe tan y of 76% (default=202)
-T, --table
display winning expe tan y table
-p, --pgn=FILE
input file, PGN format
-P, --pgn-list=FILE
multiple input: file with a list of PGN files

9

-o, --output=FILE
- , -- sv=FILE
-E, --elostat
-j,
-g,
-G,
-s,
-e,
-C,
-J,
-F,
-X,
-t,
-N,
-M,
-n,
-U,
-Y,
-i,
-x,
-b,

output file, text format
output file, omma separated value format
output files in elostat format (rating.dat,
programs.dat & general.dat)
--head2head=FILE
output file with head to head information
--groups=FILE
outputs group onne tion info (no rating output)
--for e
for e program to run ignoring isolated-groups warning
--simulations=NUM
perform NUM simulations to al ulate errors
--error-matrix=FILE save an error matrix (use of -s required)
-- fs-matrix=FILE
save a matrix ( omma separated value . sv) with
onfiden e for superiority (-s was used)
-- fs-show
output an extra olumn with onfiden e for superiority
(relative to the player in the next row)
-- onfiden e=NUM
onfiden e to estimate error margins (default=95.0)
--ignore-draws
do not take into a ount draws in the al ulation
--threshold=NUM
threshold of games for a parti ipant to be in luded
--de imals=
a=rating de imals, b=s ore de imals (optional)
--ML
for e maximum-likelihood estimation to obtain ratings
-- pus=NUM
number of pro essors used in simulations
-- olumns=
info in output (default olumns are "0,1,2,3,4,5")
--synonyms=FILE
name synonyms ( omma separated value format). Ea h
line: main,syn1,syn2 or "main","syn1","syn2"
--aliases=FILE
same as --synonyms FILE
--in lude=FILE
in lude only games of parti ipants present in FILE
--ex lude=FILE
names in FILE will not have their games in luded
--no-warnings
supress warnings of names from -x or -i that do not
mat h names in input file
-- olumn-format=FILE format olumn output, ea h line form FILE being
< olumn>,,"Header"

Memory Limits
Currently, the program an handle almost un unlimited number of games and players. It is only
limited by the memory of the system.

Exit ode
When Ordo ran su essfully, it will exit with a ode = 0. When problems arose (insu ient
memory, database not well onne ted, empty input, wrong parameters, et .), Ordo will return a
number that is guaranteed to be non-zero. This ould be used in s ripts to know whether the
pro ess rea hed its goal or not. For instan e, the following s ript in bash (linux) will at h if
pro essing games.pgn was orre t or not.
#!/bin/sh
./ordo -p games.pgn
exit_ ode=$?
if [ $exit_ ode = 0 ℄; then
e ho Ordo run properly
else
e ho Ordo returned with error: $exit_ ode

10

fi

Ordoprep
A tool is available in another distribution6 to shrink the PGN le. The output will ontain only
the results of the games. In addition, it ould dis ard players that won all games, or lost all games.
Other swit hes allow the ex lusion of players that do not have a minimum performan e or played
too few games.
Typi al usage is:
ordoprep -p raw.pgn -o shrunk.pgn

Whi h saves in shrunk.pgn a pgn le with only the results. You an add swit hes like this:
ordoprep -p raw.pgn -o shrunk.pgn -d -m 5 -g 20

where -d tells Ordoprep to dis ard players with 100% or 0% performan e, -m 5 will ex lude players
who did not rea h a 5% performan e, and -g 20 will ex lude players with less than 20 games. After
all this, shrunk.pgn ould be used as input for Ordo

Model for rating al ulation
The model assumes that dieren es in strength are analogous to dieren es in levels of energy
(Fig. 1). A lower (more stable) level of energy would represent a stronger player. The analogy
is that a valley is better at attra ting water than a mountain top. In physi s and hemistry, a
parti le or a mole ule that an be in two dierent states an be predi ted to be in one or the other
with a ertain probability.

Figure 1: Energeti levels as strength levels
The probability to be found at ea h level is proportional to the Boltzmann fa tor 7 e−βEi . If Na is
6 https://github.

om/mi higuel/Ordoprep/releases
tor

7 https://en.wikipedia.org/wiki/Boltzmann_fa

11

the number of parti les in level A, and Nb is the number of parti les in level B, their ratio will be:
Na
e−βEa
= −βE = e−β(Ea −Eb )
b
Nb
e

(2)

β is a onstant of the system. The analogy is that we treat the probabilities of a win to land in

level A or B as the probability of a parti le to be in A or B. Therefore, after reordering equation
2, the fra tion of wins (fb,a ) of player B in a mat h vs. A will be:
fb,a =

Nb
1
=
Na + Nb
1 + e−β(Ea −Eb )

(3)

if we dene strength rating R as the negative value of energy, then, Ra = −Ea . For onvenien e,
we ip the s ales with the purpose that higher ratings are represented with higher values (Fig. 2),
and the the fra tion of wins (fb,a ) of player B in a mat h vs. A will be represented by eq. 4.

Figure 2: Rating s ale
fb,a =

1
1+

(4)

e−β(Rb −Ra )

This equation has the same form as the logisti fun tion8. With this equation we an al ulate the
predi ted fra tion of wins between two players. The predi ted performan e Px , or number of wins
of player x among a pool of other players will be the summation of ea h of the predi ted fra tions
f for ea h game.
Px = fx,opp(1) + fx,opp(2) + ... + fx,opp(n) =

n
X

fx,opp(i)

(5)

i=1

where n is the total number of games played by x and opp(i) is the opponent it fa ed in the game
i. Then:
8 http://en.wikipedia.org/wiki/Logisti

_fun tion

12

Px =

n
X
i=1

1
1 + e−β(Rx −Ropp(i) )

(6)

The most likely strength rating values (R) for ea h player are ones that satisfy that ea h predi ted
performan e Px equals the respe tive observed performan e (Ox ) of player x (a tual number of
games won by x). Therefore, the goal is to nd R values so the following untness (U ) s ore equals
zero, where m is the total number of players, and j is ea h individual player.
U=

m
X
j=1

(Pj − Oj )2

(7)

Finding an adequate pro edure to minimize U until rea hes zero is riti al for a proper onvergen e
towards the optimal solution. The way Ordo ts it is in dis rete steps (similar to hill limbing 9),
and making those steps smaller and smaller on e the onvergen e was rea hed. However, those
steps are onstrained to ertain values to avoid big swings during the al ulation. After many
dierent tests, this pro edure was found to be safe and fast.

S ale
Chess players are a ustomed to the Elo rating. Traditionally, it has been based on a normal
(Gaussian) distribution, whi h is the one that the World Chess Federation (FIDE) still uses10 .
Here, the default value of β was hosen to resembles the Elo s ale. For that reason, the rating
dieren e when the winning expe tan y is 76% has been set to 202 rating points. This parameter
ould be modied with the swit h -z, and the overall s ale an be displayed with swit h -T.
The model is valid if the strength assigned to the individual players is additive like energy. If
we know the strength dieren es between A→B and B→C, we should be able to al ulate A→C
as A→B + B→C. Then, this should a urately predi t the results of a mat h between A and C.
Empiri al observations seem to suggests that those estimations are reasonable, at least within a
ertain range.
Certain theoreti al assumptions have be done to a ount the existen e of draws. One of the is that
the a tual draw rate remains similar throughout the rating s ale. Empiri ally, this is a reasonable
approximation for most ases.

White advantage al ulation
The rationale to al ulate the white advantage (Wadv ) is that the expe ted out ome for white
should be as lose as possible to the a tual white performan e. In other words, the number of
points obtained by white (Wp ) should be the same as the number of points expe ted to be obtained
by white (We ).

9 http://en.wikipedia.org/wiki/Hill_

E = (Wp − We )2
limbing

10 http://en.wikipedia.org/wiki/Elo_rating_system

13

(8)

Therefore, the optimum Wadv is the one that minimizes E , whi h is the overall error squared in
equation 8.
We =

n
X

Expectancy(RWi + Wadv , RBi )

(9)

i=1

Here, n is the total number of games, RWi and RBi are the ratings (in game i) of white and bla k,
respe tively. Expectancy is a tually equation 4.
We =

n
X
i=1

Then, ombining 8 and 10
E=

Wp −

1
1+

n
X
i=1

(10)

e−β(RWi +Wadv −RBi )

1
1 + e−β(RWi +Wadv −RBi )

!2

(11)

Wadv is al ulated iteratively, until E is minimized. This al ulation assumes that Wadv is relatively
onstant throughout the database. On e Wadv is obtained, the ratings are re- al ulated. The

pro edure ontinues until the numbers stabilize.

Draw rate model
To estimate the probability of a draw in a single game the model from Fig. 2 needs to be expanded
to have an extra "draw state" (Fig. 3).

Figure 3: Rating s ale introdu ing an extra state for draws
14

The draw rate does not ae t the rating al ulation, or the performan e for ea h player in the
simulations. However, it ae ts the relative distribution of wins, losses, and draws simulated,
whi h has an inuen e on the errors al ulated. Therefore, to have a more realisti simulation and
an a urate estimation of the errors we need to predi t the probability for a draw. But, the draw
rate is not uniform, as it depends on the rating dieren es between the opponents. Thus, draw
rate depends on two parameters, Deq (draw rate when the two opponents are of equal strength)
and ∆R. Ordo assumes that Deq is relatively onstant throughout the database. If we know Deq ,
the following equation
E=

M
X

(Dm − Nm Dexp (∆Rm + Wadv , Deq ))2

(12)

m=1

will give E as the overall error in the estimation of Deq . Here, m is the mat h number, M is the
total number of mat hes, Nm is the number of games played in ea h mat h m, ∆Rm is the rating
dieren e in that parti ular mat h, Dm is the number of draws observed, and Wadv is the white
advantage. Dexp is a fun tion that gives the draw rate expe ted given a ertain ∆R and Deq . Note
that here a mat h is onsidered any series of games between two opponents with the same olors.
In other words, they are any set of games with the same opponents and onditions. With this
equation, Dexp is al ulated iteratively until E is minimized. To apply this algorithm we need the
fun tion Dexp . In the following se tion we show how to al ulate the draw rate when opponents
are of equal strength and later from a given p and Deq . From ∆R, the performan e expe ted (p)
an be dire tly al ulated.

Draw rate between opponents of equal strength
We an model the draw rate by introdu ing an extra draw state (Fig. 3). This is a derivation of
the equation that relates draw rate (D) and δ .
1=W +D+L

(13)

Here, W , D, and L are the respe tive win, draw, and loss rates. Sin e the opponents are of equal
strength, W equals L.
1=2W +D

(14)

Based on the assumptions that the probabilities of the dierent levels are proportional to the
Boltzmann fa tor 11 e−βEi , the following ratio an be established (Ri = −Ei , higher ratings mean
lower "energy levels").
D
eβRD
= βR = eβ(RD −RW ) = eβδ
W
e W

Repla ing into eq. 14
11 https://en.wikipedia.org/wiki/Boltzmann_fa

tor

15

(15)

1 = eβδ W + 2W

W =

eβδ

1
+2

(16)
(17)

Combining with eq. 14 we obtained Deq , whi h is the draw rate when both players are equally
strong. This value depends on δ .
D = Deq = 1 −

2
eβδ
=
eβδ + 2
eβδ + 2

(18)

Draw rate from p (performan e) and Deq
Performan e (p) is the ratio of the total points obtained by a player in a given number of games.
It is dened by this simple relationship.
p = W + D/2; W = p − D/2

(19)

To dene Deq , we are going to assume it is onstant, regardless of the absolute strength of ea h
individual. We then have three possible states, W (win), D (draw), and L (loss), in whi h the
state D is separated by δ from the average of the levels W and L. In this s enario, and reordering
eq. 18 we have:
1 − Deq
= e−βδ
2Deq

(20)

For onvenien e we will all e−βδ = φ then
1 − Deq
=φ
2Deq

(21)

Deq is the rate when RW and RL are at the same level. If RW and RL hange, and δ remains at
the same distan e from the average of RW and RL , the equations that relate the probabilities for

ea h state are:

Ravg =

RW − RL
; x = RW − Ravg = Ravg − RL
2

(22)

W/D = eβ(x−δ) = eβx e−βδ

(23)

D/L = eβ(x+δ) = eβx /e−βδ

(24)

For onvenien e, if we all e−βδ = φ as we did before we get
16

W/D = eβx φ; D/L = eβx /φ

(25)

W L
φ2 D 2
= φ2 ; L =
DD
W

(26)

therefore

ombining this equation with eq. 13 and reordering:
(27)

0 = W 2 + DW − W + φ2 D 2

repla ing W with eq. 19 we obtain
(28)

0 = (p − D/2)2 + D(p − D/2) − (p − D/2) + φ2 D 2

expanding, simplifying, and reordering leads to
(29)

0 = (4φ2 − 1)D 2 + 2D + 4(p2 − p)

repla ing with eq. 21
0=



1 − Deq
Deq

2

!

(30)

− 1 D 2 + 2D + 4(p2 − p)

Solving this quadrati equation, we obtain the predi ted draw rate (D) between two given opponents, as long as we know the predi ted performan e (p) and the draw rate between equally
mat hed opponents (Deq ). This is used to plug it in eq. 12.

Draw rate and win rate relationship
Reordering eq. 27 we obtain
(31)

D 2 = φ−2 W (1 − W − D)

Note that this relationship is equivalent to the basi assumption used by Davidson12 to develop
his draw model
D=ν

√

(32)

WL

Here, ν = φ−2 and L = 1 − W − D. Shawul and Coulom showed that this relationship is superior
for hess engines when ompared to other alternatives13 . Repla ing φ in eq. 31 with eq. 21 we
obtain
12 Equation 2.5 in http://stat.fsu.edu/te hreports/M169.pdf
13 https://dl.dropboxuser ontent. om/u/55295461/elopapers/elopapers/ChessOut

17

omes.pdf

2

D =



2Deq
1 − Deq

2

W (1 − W − D)

(33)

Equation 33 is the one used by Ordo to obtain the draw rate for any pair of opponents as a fun tion
of win probability (W ) and draw rate for equal opponents (Deq ).

Draw rate al ulation
The rationale to al ulate the draw rate for equal opponents (Deq ) is that the expe ted out ome
of number of draws showuld be as lose as possible to the a tual number of draws in the database.
In other words, the number of draws observed (Dobs ) should be the same as the number of draws
expe ted (Dexp ).
E = (Dobs − Dexp )2

(34)

Therefore, the optimum Deq is the one that minimizes E , whi h is the overall error squared in
equation 34.
Dexp =

n
X

Di

(35)

i=1

Here, n is the total number of games, and Di is the probability of a draw for game i. From equation
30, Di ould be solved as
Di =

−1 +

q

eq 2
) − 1)
1 − 4(p2i − pi )(4( 1−D
2Deq
eq 2
) −1
4( 1−D
2Deq

(36)

where pi is the expe ted performan e for white for ea h game, and ould be al ulated from
equation 4 as
pi =

1
1+

e−β(RWi +Wadv −RBi )

(37)

RWi and RBi are the ratings (in game i) of white and bla k, respe tively. On e Deq is estimated,
pi and Di are al ulated (equations 36 and 37) for ea h game to obtain Dexp and E (equations 34
and 35). Optimum value of Deq is the one that minimizes E and it is al ulated iteratively. This
al ulation assumes that Deq is relatively onstant throughout the database. On e Deq is obtained,
the ratings are re- al ulated as it is done with Wadv . The pro edure ontinues until the numbers

stabilize.

Rating al ulation with prior information
When user provides Ordo with either loose an hors, relative an hors, white advantage un ertainty,
or a draw rate un ertainty the al ulation is performed by a maximum-likelihood estimation. In
18

those ases, for ea h game the probability for the given out ome (W, D, or L) is al ulated and
the logarithm of this value is added and a umulated. This will onstitute an untness s ore
that will need to be minimized. In addition, to this s ore, the logarithm of the probabilities for
ea h loose an hor, relative an hor, white advantage, and draw rate are a umulated. An overall
minimization brings optimum values for the ratings of ea h player and ea h of the above mentioned
parameters. Note that adding the logarithm of ea h of the probabilities is analogous to multiplying
the probabilities.

For ing maximum likelihood
Another option to for e Ordo to perform a maximum-likelihood estimation to al ulate the ratings
is by providing the swit h -M. This option is generally a bit slower and probably not ne essary
sin e the output should be nearly identi al with perfe t onvergen e, but it is a good feature for
omparison an debugging.

A knowledgments
Adam Hair has extensively tested and suggested valuable ideas.

Li ense
ordo 1.2.6
Copyright ( ) 2015 Miguel A. Balli ora
Ordo is program for al ulating ratings of engine or

hess players

Ordo is free software: you an redistribute it and/or modify
it under the terms of the GNU General Publi Li ense as published by
the Free Software Foundation, either version 3 of the Li ense, or
(at your option) any later version.
Ordo is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Publi Li ense for more details.
You should have re eived a opy of the GNU General Publi Li ense
along with Ordo. If not, see .

19



Source Exif Data:
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.4
Linearized                      : No
Page Count                      : 19
XMP Toolkit                     : XMP toolkit 2.9.1-13, framework 1.6
About                           : uuid:9947fe70-3e60-11f1-0000-3424e9689bd2
Producer                        : GPL Ghostscript 9.05
Modify Date                     : 2016:04:19 10:33:22-05:00
Create Date                     : 2016:04:19 10:33:22-05:00
Creator Tool                    : dvips(k) 5.98 Copyright 2009 Radical Eye Software
Document ID                     : uuid:9947fe70-3e60-11f1-0000-3424e9689bd2
Format                          : application/pdf
Title                           : manual.dvi
Creator                         : dvips(k) 5.98 Copyright 2009 Radical Eye Software
EXIF Metadata provided by EXIF.tools

Navigation menu