The cross-entropy tuner

The cross-entropy tuner uses the cross-entropy method described in [CE]:

[CE] G.M.J-B. Chaslot, M.H.M Winands, I. Szita, and H.J. van den Herik.
Cross-entropy for Monte-Carlo Tree Search. ICGA Journal, 31(3):145-156.

Caution

The cross-entropy tuner is experimental. It can take a very large number of games to converge.

The tuning algorithm

The algorithm is not described in detail in this documentation. See [CE] section 3 for the description. The tuner always uses a Gaussian distribution. The improvement suggested in section 5 is not implemented.

The parameter model

The parameter values taken from the Gaussian distribution are floating-point numbers known as optimiser parameters.

These parameters can be transformed before being used to configure the candidate (see 3.3 Normalising Parameters in [CE]). The transformed values are known as engine parameters. The transformation is implemented using a Python transform function defined in the control file.

Reports show engine parameters (see the format parameter setting), together with the mean and variance of the corresponding optimiser parameter distribution in the form mean~variance.

Sample control file

Here is a sample control file, illustrating most of the available settings for a cross-entropy tuning event:

competition_type = "ce_tuner"

description = """\
This is a sample control file.

It illustrates the available settings for the cross entropy tuner.
"""

players = {
    'gnugo-l10' : Player("gnugo --mode=gtp --chinese-rules "
                         "--capture-all-dead --level=10"),
    }

def fuego(max_games, additional_commands=[]):
    commands = [
        "go_param timelimit 999999",
        "uct_max_memory 350000000",
        "uct_param_search number_threads 1",
        "uct_param_player reuse_subtree 0",
        "uct_param_player ponder 0",
        "uct_param_player max_games %d" % max_games,
        ]
    return Player(
        "fuego --quiet",
        startup_gtp_commands=commands+additional_commands)

FUEGO_MAX_GAMES = 1000

def exp_10(f):
    return 10.0**f

parameters = [
    Parameter('rave_weight_initial',
              # Mean and variance are in terms of log_10 (rave_weight_initial)
              initial_mean = -1.0,
              initial_variance = 1.5,
              transform = exp_10,
              format = "I: %4.2f"),

    Parameter('rave_weight_final',
              # Mean and variance are in terms of log_10 (rave_weight_final)
              initial_mean = 3.5,
              initial_variance = 1.5,
              transform = exp_10,
              format = "F: %4.2f"),
    ]

def make_candidate(rwi, rwf):
    return fuego(
        FUEGO_MAX_GAMES,
        ["uct_param_search rave_weight_initial %f" % rwi,
         "uct_param_search rave_weight_final %f" % rwf])

board_size = 9
komi = 7.5
opponent = 'gnugo-l10'
candidate_colour = 'w'

number_of_generations = 5
samples_per_generation = 100
batch_size = 10
elite_proportion = 0.1
step_size = 0.8

Control file settings

The control file settings are similar to those used in playoffs.

The competition_type setting must have the value "ce_tuner".

The players dictionary must be present as usual, but it is used only to define the opponent.

The matchups setting is not used. The following matchup settings may be specified as top-level settings (as usual, board_size and komi are compulsory):

All other competition settings may be present, with the same meaning as for playoffs.

The following additional settings are used (they are all compulsory):

candidate_colour

String: "b" or "w"

The colour for the candidates to take in every game.

opponent

Identifier

The player code of the player to use as the candidates’ opponent.

parameters

List of Parameter definitions (see Parameter configuration).

Describes the parameters that the tuner will work with. See The parameter model for more details.

The order of the Parameter definitions is used for the arguments to make_candidate, and whenever parameters are described in reports or game records.

make_candidate

Python function

Function to create a Player from its engine parameters.

This function is passed one argument for each candidate parameter, and must return a Player definition. Each argument is the output of the corresponding Parameter’s transform.

The function will typically use its arguments to construct command line options or GTP commands for the player. For example:

def make_candidate(param1, param2):
    return Player(["goplayer", "--param1", str(param1),
                   "--param2", str(param2)])

def make_candidate(param1, param2):
    return Player("goplayer", startup_gtp_commands=[
                   ["param1", str(param1)],
                   ["param2", str(param2)],
                  ])
number_of_generations

Positive integer

The number of times to repeat the tuning algorithm (number of iterations or T in the terminology of [CE]).

samples_per_generation

Positive integer

The number of candidates to make in each generation (population_size or N in the terminology of [CE]).

batch_size

Positive integer

The number of games played by each candidate.

elite_proportion

Float between 0.0 and 1.0

The proportion of candidates to select from each generation as ‘elite’ (the selection ratio or ρ in the terminology of [CE]). A value between 0.01 and 0.1 is recommended.

step_size

Float between 0.0 and 1.0

The rate at which to update the distribution parameters between generations (α in the terminology of [CE]).

Caution

I can’t find anywhere in the paper the value they used for this, so I don’t know what to recommend.

Parameter configuration

A Parameter definition has the same syntax as a Python function call: Parameter(arguments). Apart from code, the arguments should be specified using keyword form (see Sample control file).

The code, initial_mean, and initial_variance arguments are required.

The arguments are:

code

Identifier

A short string used to identify the parameter. This is used in error messages, and in the default for format.

initial_mean

Float

The mean value for the parameter in the first generation’s distribution.

initial_variance

Float >= 0

The variance for the parameter in the first generation’s distribution.

transform

Python function (default identity)

Function mapping an optimiser parameter to an engine parameter; see The parameter model.

Examples:

def exp_10(f):
    return 10.0**f

Parameter('p1', initial_mean = …, initial_variance = …,
          transform = exp_10)

If the transform is not specified, the optimiser parameter is used directly as the engine parameter.

format

String (default "parameter_code: %s")

Format string used to display the parameter value. This should include a short abbreviation to indicate which parameter is being displayed, and also contain %s, which will be replaced with the engine parameter value.

You can use any Python conversion specifier instead of %s. For example, %.2f will format a floating point number to two decimal places. %s should be safe to use for all types of value. See string formatting operations for details.

Format strings should be kept short, as screen space is limited.

Examples:

Parameter('parameter_1',
          initial_mean = 0.0, initial_variance = 1.0,
          format = "p1: %.2f")

Parameter('parameter_2',
          initial_mean = 5000, initial_variance = 250000,
          format = "p2: %d")

Reporting

Currently, there aren’t any sophisticated reports.

The standard report shows the parameters of the current Gaussian distribution, and the number of wins for each candidate in the current generation.

After each generation, the details of the candidates are written to the history file. The candidates selected as elite are marked with a *.

Changing the control file between runs

Some settings can safely be changed between runs of the same cross-entropy tuning event:

batch_size
safe to increase
samples_per_generation
not safe to change
number_of_generations
safe to change
elite_proportion
safe to change
step_size
safe to change
make_candidate
safe to change, but don’t alter play-affecting options
transform
not safe to change
format
safe to change