SGF support
New in version 0.7.
Gomill’s SGF support is intended for use with version FF[4], which is specified at http://www.red-bean.com/sgf/index.html. It has support for the game-specific properties for Go, but not those of other games. Point, Move and Stone values are interpreted as Go points.
The gomill.sgf module provides the main support. This module is independent of the rest of Gomill.
The gomill.sgf_moves module contains some higher-level functions for processing moves and positions, and provides a link to the boards module.
The gomill.sgf_grammar and gomill.sgf_properties modules are used to implement the sgf module, and are not currently documented.
Page contents
Examples
Reading and writing:
>>> from gomill import sgf
>>> g = sgf.Sgf_game.from_string("(;FF[4]GM[1]SZ[9];B[ee];W[ge])")
>>> g.get_size()
9
>>> root_node = g.get_root()
>>> root_node.get("SZ")
9
>>> root_node.get_raw("SZ")
'9'
>>> root_node.set("RE", "B+R")
>>> new_node = g.extend_main_sequence()
>>> new_node.set_move("b", (2, 3))
>>> [node.get_move() for node in g.get_main_sequence()]
[(None, None), ('b', (4, 4)), ('w', (4, 6)), ('b', (2, 3))]
>>> g.serialise()
'(;FF[4]GM[1]RE[B+R]SZ[9];B[ee];W[ge];B[dg])\n'
Recording a game:
g = sgf.Sgf_game(size=13)
for move_info in ...:
node = g.extend_main_sequence()
node.set_move(move_info.colour, move_info.move)
if move_info.comment is not None:
node.set("C", move_info.comment)
with open(pathname, "w") as f:
f.write(g.serialise())
See also the show_sgf.py and split_sgf_collection.py example scripts.
Sgf_game objects
SGF data is represented using Sgf_game objects. Each object represents the data for a single SGF file (corresponding to a GameTree in the SGF spec). This is typically used to represent a single game, possibly with variations (but it could be something else, such as a problem set).
An Sgf_game can either be created from scratch or loaded from a string.
To create one from scratch, instantiate an Sgf_game object directly:
- class
gomill.sgf.Sgf_game(size, encoding="UTF-8"])[source] size is an integer from 1 to 26, indicating the board size.
The optional encoding parameter specifies the raw property encoding to use for the game.
When a game is created this way, the following root properties are initially set: FF[4], GM[1], SZ[size], and CA[encoding].
To create a game from existing SGF data, use the Sgf_game.from_string() classmethod:
- classmethod
Sgf_game.from_string(s[, override_encoding=None])[source] Return type: Sgf_gameCreates an
Sgf_gamefrom the SGF data in s, which must be an 8-bit string.The board size and raw property encoding are taken from the
SZandCAproperties in the root node (defaulting to19and"ISO-8859-1", respectively). Board sizes greater than26are rejected.If override_encoding is present, the source data is assumed to be in the encoding it specifies (no matter what the
CAproperty says), and theCAproperty and raw property encoding are changed to match.Raises
ValueErrorif it can’t parse the string, or if theSZorCAproperties are unacceptable. No error is reported for other malformed property values. See also Parsing below.Example:
g = sgf.Sgf_game.from_string( "(;FF[4]GM[1]SZ[9]CA[UTF-8];B[ee];W[ge])", override_encoding="iso8859-1")
To retrieve the SGF data as a string, use the serialise() method:
-
Sgf_game.serialise([wrap])[source] Return type: string Produces the SGF representation of the data in the
Sgf_game.Returns an 8-bit string, in the encoding specified by the
CAroot property (defaulting to"ISO-8859-1").See transcoding below for details of the behaviour if the
CAproperty is changed from its initial value.This makes some effort to keep the output line length to no more than 79 bytes. Pass
Nonein the wrap parameter to disable this behaviour, or pass an integer to specify a different limit.
The complete game tree is represented using Tree_node objects, which are used to access the SGF properties. An Sgf_game always has at least one node, the root node.
The root node contains global properties for the game tree, and typically also contains game-info properties. It sometimes also contains setup properties (for example, if the game does not begin with an empty board).
Changing the FF and GM properties is permitted, but Gomill will carry on using the FF[4] and GM[1] (Go) rules. Changing SZ is not permitted (but if the size is 19 you may remove the property). Changing CA is permitted (this controls the encoding used by serialise()).
Convenience methods for tree access
The complete game tree can be accessed through the root node, but the following convenience methods are also provided. They return the same Tree_node objects that would be reached via the root node.
Some of the convenience methods are for accessing the leftmost variation of the game tree. This is the variation which appears first in the SGF GameTree, often shown in graphical editors as the topmost horizontal line of nodes. In a game tree without variations, the leftmost variation is just the whole game.
-
Sgf_game.get_last_node()[source] Return type: Tree_nodeReturns the last (leaf) node in the leftmost variation.
-
Sgf_game.get_main_sequence()[source] Return type: list of Tree_nodeobjectsReturns the complete leftmost variation. The first element is the root node, and the last is a leaf.
-
Sgf_game.get_main_sequence_below(node)[source] Return type: list of Tree_nodeobjectsReturns the leftmost variation beneath the
Tree_nodenode. The first element is the first child of node, and the last is a leaf.Note that this isn’t necessarily part of the leftmost variation of the game as a whole.
-
Sgf_game.get_main_sequence_above(node) Return type: list of Tree_nodeobjectsReturns the partial variation leading to the
Tree_nodenode. The first element is the root node, and the last is the parent of node.
-
Sgf_game.extend_main_sequence()[source] Return type: Tree_nodeCreates a new
Tree_node, adds it to the leftmost variation, and returns it.This is equivalent to
get_last_node().new_child()
Convenience methods for root properties
The following methods provide convenient access to some of the root node’s SGF properties. The main difference between using these methods and using get() on the root node is that these methods return the appropriate default value if the property is not present.
-
Sgf_game.get_size()[source] Return type: integer Returns the board size (
19if theSZroot property isn’t present).
-
Sgf_game.get_charset()[source] Return type: string Returns the effective value of the
CAroot property (ISO-8859-1if theCAroot property isn’t present).The returned value is a codec name in normalised form, which may not be identical to the string returned by
get_root().get("CA"). RaisesValueErrorif the property value doesn’t identify a Python codec.This gives the encoding that would be used by
serialise(). It is not necessarily the same as the raw property encoding (useget_encoding()on the root node to retrieve that).
-
Sgf_game.get_komi()[source] Return type: float Returns the komi (
0.0if theKMroot property isn’t present).Raises
ValueErrorif theKMroot property is present but malformed.
-
Sgf_game.get_handicap()[source] Return type: integer or NoneReturns the number of handicap stones.
Returns
Noneif theHAroot property isn’t present, or if it has value zero (which isn’t strictly permitted).Raises
ValueErrorif theHAproperty is otherwise malformed.
-
Sgf_game.get_player_name(colour)[source] Return type: string or NoneReturns the name of the specified player, or
Noneif the requiredPBorPWroot property isn’t present.
-
Sgf_game.get_winner()[source] Return type: colour Returns the colour of the winning player.
Returns
Noneif theREroot property isn’t present, or if neither player won.
-
Sgf_game.set_date([date])[source] Sets the
DTroot property, to a single date.If date is specified, it should be a
datetime.date. Otherwise the current date is used.(SGF allows
DTto be rather more complicated than a single date, so there’s no corresponding get_date() method.)
Tree_node objects
- class
gomill.sgf.Tree_node[source] A Tree_node object represents a single node from an SGF file.
Don’t instantiate Tree_node objects directly; retrieve them from
Sgf_gameobjects.Tree_node objects have the following attributes (which should be treated as read-only):
-
owner The
Sgf_gamethat the node belongs to.
-
parent The node’s parent
Tree_node(Nonefor the root node).
-
Tree navigation
A Tree_node acts as a list-like container of its children: it can be indexed, sliced, and iterated over like a list, and it supports the index method. A Tree_node with no children is treated as having truth value false. For example, to find all leaf nodes:
def print_leaf_comments(node):
if node:
for child in node:
print_leaf_comments(child)
else:
if node.has_property("C"):
print node.get("C")
else:
print "--"
Property access
Each node holds a number of properties. Each property is identified by a short string called the PropIdent, eg "SZ" or "B". See Property list below for a list of the standard properties. See the SGF specification for full details. See Parsing below for restrictions on well-formed PropIdents.
Gomill doesn’t enforce SGF‘s restrictions on where properties can appear (eg, the distinction between setup and move properties).
The principal methods for accessing the node’s properties are:
-
Tree_node.get(identifier) Returns a native Python representation of the value of the property whose PropIdent is identifier.
Raises
KeyErrorif the property isn’t present.Raises
ValueErrorif it detects that the property value is malformed.See Property types below for details of how property values are represented in Python.
See Property list below for a list of the known properties. Any other property is treated as having type Text.
-
Tree_node.set(identifier, value) Sets the value of the property whose PropIdent is identifier.
value should be a native Python representation of the required property value (as returned by
get()).Raises
ValueErrorif the identifier isn’t a well-formed PropIdent, or if the property value isn’t acceptable.See Property types below for details of how property values should be represented in Python.
See Property list below for a list of the known properties. Setting nonstandard properties is permitted; they are treated as having type Text.
-
Tree_node.unset(identifier) Removes the property whose PropIdent is identifier from the node.
Raises
KeyErrorif the property isn’t currently present.
-
Tree_node.has_property(identifier) Return type: bool Checks whether the property whose PropIdent is identifier is present.
-
Tree_node.properties() Return type: list of strings Lists the properties which are present in the node.
Returns a list of PropIdents, in unspecified order.
-
Tree_node.find_property(identifier)[source] Returns the value of the property whose PropIdent is identifier, looking in the node’s ancestors if necessary.
This is intended for use with properties of type game-info, and with properties which have the inherit attribute.
It looks first in the node itself, then in its parent, and so on up to the root, returning the first value it finds. Otherwise the behaviour is the same as
get().Raises
KeyErrorif no node defining the property is found.
-
Tree_node.find(identifier)[source] Return type: Tree_nodeorNoneReturns the nearest node defining the property whose PropIdent is identifier.
Searches in the same way as
find_property(), but returns the node rather than the property value. ReturnsNoneif no node defining the property is found.
Convenience methods for properties
The following convenience methods are also provided, for more flexible access to a few of the most important properties:
-
Tree_node.get_move() Return type: tuple (colour, move) Indicates which of the the
BorWproperties is present, and returns its value.Returns (
None,None) if neither property is present.
-
Tree_node.set_move(colour, move) Sets the
BorWproperty. If the other property is currently present, it is removed.Gomill doesn’t attempt to ensure that moves are legal.
-
Tree_node.get_setup_stones() Return type: tuple (set of points, set of points, set of points) Returns the settings of the
AB,AW, andAEproperties.The tuple elements represent black, white, and empty points respectively. If a property is missing, the corresponding set is empty.
-
Tree_node.set_setup_stones(black, white[, empty]) Sets the
AB,AW, andAEproperties.Each parameter should be a sequence or set of points. If a parameter value is empty (or, in the case of empty, if the parameter is omitted) the corresponding property will be unset.
-
Tree_node.has_setup_stones() Return type: bool Returns
Trueif theAB,AW, orAEproperty is present.
-
Tree_node.add_comment_text(text) If the
Cproperty isn’t already present, adds it with the value given by the string text.Otherwise, appends text to the existing
Cproperty value, preceded by two newlines.
Board size and raw property encoding
Each Tree_node knows its game’s board size, and its raw property encoding (because these are needed to interpret property values). They can be retrieved using the following methods:
-
Tree_node.get_size() Return type: int
-
Tree_node.get_encoding() Return type: string This returns the name of the raw property encoding (in a normalised form, which may not be the same as the string originally used to specify the encoding).
An attempt to change the value of the SZ property so that it doesn’t match the board size will raise ValueError (even if the node isn’t the root).
Access to raw property values
Raw property values are 8-bit strings, containing the exact bytes that go between the [ and ] in the SGF file. They should be treated as being encoded in the node’s raw property encoding (but there is no guarantee that they hold properly encoded data).
The following methods are provided for access to raw property values. They can be used to access malformed values, or to avoid the standard escape processing and whitespace conversion for Text and SimpleText values.
When setting raw property values, any string that is a well formed SGF PropValue is accepted: that is, any string that that doesn’t contain an unescaped ] or end with an unescaped \. There is no check that the string is properly encoded in the raw property encoding.
-
Tree_node.get_raw_list(identifier) Return type: nonempty list of 8-bit strings Returns the raw values of the property whose PropIdent is identifier.
Raises
KeyErrorif the property isn’t currently present.If the property value is an empty elist, returns a list containing a single empty string.
-
Tree_node.get_raw(identifier) Return type: 8-bit string Returns the raw value of the property whose PropIdent is identifier.
Raises
KeyErrorif the property isn’t currently present.If the property has multiple PropValues, returns the first. If the property value is an empty elist, returns an empty string.
-
Tree_node.get_raw_property_map(identifier) Return type: dict: string → list of 8-bit strings Returns a dict mapping PropIdents to lists of raw values.
Returns the same dict object each time it’s called.
Treat the returned dict object as read-only.
-
Tree_node.set_raw_list(identifier, values) Sets the raw values of the property whose PropIdent is identifier.
values must be a nonempty list of 8-bit strings. To specify an empty elist, pass a list containing a single empty string.
Raises
ValueErrorif the identifier isn’t a well-formed PropIdent, or if any value isn’t a well-formed PropValue.
-
Tree_node.set_raw(identifier, value) Sets the raw value of the property whose PropIdent is identifier.
Raises
ValueErrorif the identifier isn’t a well-formed PropIdent, or if the value isn’t a well-formed PropValue.
Tree manipulation
The following methods are provided for manipulating the tree:
-
Tree_node.new_child([index])[source] Return type: Tree_nodeCreates a new
Tree_nodeand adds it to the tree as this node’s last child.If the optional integer index parameter is present, the new node is inserted in the list of children at the specified index instead (with the same behaviour as
list.insert()).Returns the new node.
-
Tree_node.delete()[source] Removes the node from the tree (along with all its descendents).
Raises
ValueErrorif called on the root node.You should not continue to use a node which has been removed from its tree.
-
Tree_node.reparent(new_parent[, index])[source] Moves the node from one part of the tree to another (along with all its descendents).
new_parent must be a node belonging to the same game.
Raises
ValueErrorif the operation would create a loop in the tree (ie, if new_parent is the node being moved or one of its descendents).If the optional integer index parameter is present, the new node is inserted in the new parent’s list of children at the specified index; otherwise it is placed at the end.
This method can be used to reorder variations. For example, to make a node the leftmost variation of its parent:
node.reparent(node.parent, 0)
Property types
The get() and set() node methods convert between raw SGF property values and suitable native Python types.
The following table shows how SGF property types are represented as Python values:
| SGF type | Python representation |
|---|---|
| None | True |
| Number | int |
| Real | float |
| Double | 1 or 2 (int) |
| Colour | colour |
| SimpleText | 8-bit UTF-8 string |
| Text | 8-bit UTF-8 string |
| Stone | point |
| Point | point |
| Move | move |
Gomill doesn’t distinguish the Point and Stone SGF property types. It rejects representations of ‘pass’ for the Point and Stone types, but accepts them for Move (this is not what is described in the SGF specification, but it does correspond to the properties in which ‘pass’ makes sense).
Values of list or elist types are represented as Python lists. An empty elist is represented as an empty Python list (in contrast, the raw value is a list containing a single empty string).
Values of compose types are represented as Python pairs (tuples of length two). FG values are either a pair (int, string) or None.
For Text and SimpleText values, get() and set() take care of escaping. You can store arbitrary strings in a Text value and retrieve them unchanged, with the following exceptions:
- all linebreaks are normalised to
\n - whitespace other than line breaks is converted to a single space
get() accepts compressed point lists, but set() never produces them (some SGF viewers still don’t support them).
In some cases, get() will accept values which are not strictly permitted in SGF, if there’s a sensible way to interpret them. In particular, empty lists are accepted for all list types (not only elists).
In some cases, set() will accept values which are not exactly in the Python representation listed, if there’s a natural way to convert them to the SGF representation.
Both get() and set() check that Point values are in range for the board size. Neither get() nor set() pays attention to range restrictions for values of type Number.
Examples:
>>> node.set('KO', True)
>>> node.get_raw('KO')
''
>>> node.set('HA', 3)
>>> node.set('KM', 5.5)
>>> node.set('GB', 2)
>>> node.set('PL', 'w')
>>> node.set('RE', 'W+R')
>>> node.set('GC', 'Example game\n[for documentation]')
>>> node.get_raw('GC')
'Example game\n[for documentation\\]'
>>> node.set('B', (2, 3))
>>> node.get_raw('B')
'dg'
>>> node.set('LB', [((6, 0), "label 1"), ((6, 1), "label 2")])
>>> node.get_raw_list('LB')
['ac:label 1', 'bc:label 2']
Property list
Gomill knows the types of all general and Go-specific SGF properties defined in FF[4]:
| Id | SGF type | Meaning |
|---|---|---|
AB | list of Stone | Add Black |
AE | list of Point | Add Empty |
AN | SimpleText | Annotation |
AP | SimpleText:SimpleText | Application |
AR | list of Point:Point | Arrow |
AW | list of Stone | Add White |
B | Move | Black move |
BL | Real | Black time left |
BM | Double | Bad move |
BR | SimpleText | Black rank |
BT | SimpleText | Black team |
C | Text | Comment |
CA | SimpleText | Charset |
CP | SimpleText | Copyright |
CR | list of Point | Circle |
DD | elist of Point | Dim Points |
DM | Double | Even position |
DO | None | Doubtful |
DT | SimpleText | Date |
EV | SimpleText | Event |
FF | Number | File format |
FG | None | Number:SimpleText | Figure |
GB | Double | Good for Black |
GC | Text | Game comment |
GM | Number | Game |
GN | SimpleText | Game name |
GW | Double | Good for White |
HA | Number | Handicap |
HO | Double | Hotspot |
IT | None | Interesting |
KM | Real | Komi |
KO | None | Ko |
LB | list of Point:SimpleText | Label |
LN | list of Point:Point | Line |
MA | list of Point | Mark |
MN | Number | Set move number |
N | SimpleText | Node name |
OB | Number | Overtime stones left for Black |
ON | SimpleText | Opening |
OT | SimpleText | Overtime description |
OW | Number | Overtime stones left for White |
PB | SimpleText | Black player name |
PC | SimpleText | Place |
PL | Colour | Player to play |
PM | Number | Print move mode |
PW | SimpleText | White player name |
RE | SimpleText | Result |
RO | SimpleText | Round |
RU | SimpleText | Rules |
SL | list of Point | Selected |
SO | SimpleText | Source |
SQ | list of Point | Square |
ST | Number | Style |
SZ | Number | Size |
TB | elist of Point | Black territory |
TE | Double | Tesuji |
TM | Real | Time limit |
TR | list of Point | Triangle |
TW | elist of Point | White territory |
UC | Double | Unclear position |
US | SimpleText | User |
V | Real | Value |
VW | elist of Point | View |
W | Move | White move |
WL | Real | White time left |
WR | SimpleText | White rank |
WT | SimpleText | White team |
Character encoding handling
The SGF format is defined as containing ASCII-encoded data, possibly with non-ASCII characters in Text and SimpleText property values. The Gomill functions for loading and serialising SGF data work with 8-bit Python strings.
The encoding used for Text and SimpleText property values is given by the CA root property (if that isn’t present, the encoding is ISO-8859-1).
In order for an encoding to be used in Gomill, it must exist as a Python built-in codec, and it must be compatible with ASCII (at least whitespace, \, ], and : must be in the usual places). Behaviour is unspecified if a non-ASCII-compatible encoding is requested.
When encodings are passed as parameters (or returned from functions), they are represented using the names or aliases of Python built-in codecs (eg "UTF-8" or "ISO-8859-1"). See standard encodings for a list. Values of the CA property are interpreted in the same way.
Each Sgf_game and Tree_node has a fixed raw property encoding, which is the encoding used internally to store the property values. The Tree_node.get_raw() and Tree_node.set_raw() methods use the raw property encoding.
When an SGF game is loaded from a string, the raw property encoding is taken from the CA root property (unless overridden). Improperly encoded property values will not be detected until they are accessed (get() will raise ValueError; use get_raw() to retrieve the actual bytes).
Transcoding
When an SGF game is serialised to a string, the encoding represented by the CA root property is used. This target encoding will be the same as the raw property encoding unless CA has been changed since the Sgf_game was created.
When the raw property encoding and the target encoding match, the raw property values are included unchanged in the output (even if they are improperly encoded.)
Otherwise, if any raw property value is improperly encoded, UnicodeDecodeError is raised, and if any property value can’t be represented in the target encoding, UnicodeEncodeError is raised.
If the target encoding doesn’t identify a Python codec, ValueError is raised. The behaviour of serialise() is unspecified if the target encoding isn’t ASCII-compatible (eg, UTF-16).
Parsing
The parser permits non-SGF content to appear before the beginning and after the end of the game. It identifies the start of SGF content by looking for (; (with possible whitespace between the two characters).
The parser accepts at most 64 letters in PropIdents (there is no formal limit in the specification, but no standard property has more than 2; strings as long as 9 letters have been found in the wild).
The parser doesn’t perform any checks on property values. In particular, it allows multiple values to be present for any property.
The parser doesn’t, in general, attempt to ‘fix’ ill-formed SGF content. As an exception, if a PropIdent appears more than once in a node it is converted to a single property with multiple values.
The parser permits lower-case letters in PropIdents (these are allowed in some ancient SGF variants, and are apparently seen in the wild). It ignores those letters, so for example CoPyright is treated as a synonym for CP and should be retrieved using node.get("CP").
The sgf_moves module
The gomill.sgf_moves module contains some higher-level functions for processing moves and positions, and provides a link to the boards module.
-
gomill.sgf_moves.get_setup_and_moves(sgf_game[, board])[source] Return type: tuple ( Board, list of tuples (colour, move))Returns the initial setup and the following moves from an
Sgf_game.The board represents the position described by
ABand/orAWproperties in the SGF game’s root node.ValueErroris raised if this position isn’t legal.The moves are from the game’s leftmost variation. Doesn’t check that the moves are legal.
Raises
ValueErrorif the game has structure it doesn’t support.Currently doesn’t support
AB/AW/AEproperties after the root node.If the optional board parameter is provided, it must be an empty
Boardof the right size; the same object will be returned (this option is provided so you can use a different Board class).See also the
show_sgf.pyexample script.
-
gomill.sgf_moves.set_initial_position(sgf_game, board)[source] Adds
AB/AW/AEproperties to anSgf_game‘s root node, to reflect the position from aBoard.Replaces any existing
AB/AW/AEproperties in the root node.
-
gomill.sgf_moves.indicate_first_player(sgf_game)[source] Adds a
PLproperty to anSgf_game‘s root node if appropriate, to indicate which colour is first to play.Looks at the first child of the root to see who the first player is, and sets
PLit isn’t the expected player (Black normally, but White if there is a handicap), or if there are non-handicap setup stones.