SGF support
New in version 0.7.
Gomill’s SGF support is intended for use with version FF[4], which is specified at https://www.red-bean.com/sgf/index.html. It has support for the game-specific properties for Go, but not those of other games. Point, Move and Stone values are interpreted as Go points.
The gomill.sgf
module provides the main support. This module is independent of the rest of Gomill.
The gomill.sgf_moves
module contains some higher-level functions for processing moves and positions, and provides a link to the boards
module.
The gomill.sgf_grammar
and gomill.sgf_properties
modules are used to implement the sgf
module, and are not currently documented.
Page contents
Examples
Reading and writing:
>>> from gomill import sgf
>>> g = sgf.Sgf_game.from_string("(;FF[4]GM[1]SZ[9];B[ee];W[ge])")
>>> g.get_size()
9
>>> root_node = g.get_root()
>>> root_node.get("SZ")
9
>>> root_node.get_raw("SZ")
'9'
>>> root_node.set("RE", "B+R")
>>> new_node = g.extend_main_sequence()
>>> new_node.set_move("b", (2, 3))
>>> [node.get_move() for node in g.get_main_sequence()]
[(None, None), ('b', (4, 4)), ('w', (4, 6)), ('b', (2, 3))]
>>> g.serialise()
'(;FF[4]GM[1]RE[B+R]SZ[9];B[ee];W[ge];B[dg])\n'
Recording a game:
g = sgf.Sgf_game(size=13)
for move_info in ...:
node = g.extend_main_sequence()
node.set_move(move_info.colour, move_info.move)
if move_info.comment is not None:
node.set("C", move_info.comment)
with open(pathname, "w") as f:
f.write(g.serialise())
See also the show_sgf.py
and split_sgf_collection.py
example scripts.
Sgf_game objects
SGF data is represented using Sgf_game
objects. Each object represents the data for a single SGF file (corresponding to a GameTree
in the SGF spec). This is typically used to represent a single game, possibly with variations (but it could be something else, such as a problem set).
An Sgf_game
can either be created from scratch or loaded from a string.
To create one from scratch, instantiate an Sgf_game
object directly:
- class
gomill.sgf.
Sgf_game
(size, encoding="UTF-8"])[source] size is an integer from 1 to 26, indicating the board size.
The optional encoding parameter specifies the raw property encoding to use for the game.
When a game is created this way, the following root properties are initially set: FF[4]
, GM[1]
, SZ[size]
, and CA[encoding]
.
To create a game from existing SGF data, use the Sgf_game.from_string()
classmethod:
- classmethod
Sgf_game.
from_string
(s[, override_encoding=None])[source] Return type: Sgf_game
Creates an
Sgf_game
from the SGF data in s, which must be an 8-bit string.The board size and raw property encoding are taken from the
SZ
andCA
properties in the root node (defaulting to19
and"ISO-8859-1"
, respectively). Board sizes greater than26
are rejected.If override_encoding is present, the source data is assumed to be in the encoding it specifies (no matter what the
CA
property says), and theCA
property and raw property encoding are changed to match.Raises
ValueError
if it can’t parse the string, or if theSZ
orCA
properties are unacceptable. No error is reported for other malformed property values. See also Parsing below.Example:
g = sgf.Sgf_game.from_string( "(;FF[4]GM[1]SZ[9]CA[UTF-8];B[ee];W[ge])", override_encoding="iso8859-1")
To retrieve the SGF data as a string, use the serialise()
method:
-
Sgf_game.
serialise
([wrap])[source] Return type: string Produces the SGF representation of the data in the
Sgf_game
.Returns an 8-bit string, in the encoding specified by the
CA
root property (defaulting to"ISO-8859-1"
).See transcoding below for details of the behaviour if the
CA
property is changed from its initial value.This makes some effort to keep the output line length to no more than 79 bytes. Pass
None
in the wrap parameter to disable this behaviour, or pass an integer to specify a different limit.
The complete game tree is represented using Tree_node
objects, which are used to access the SGF properties. An Sgf_game
always has at least one node, the root node.
The root node contains global properties for the game tree, and typically also contains game-info properties. It sometimes also contains setup properties (for example, if the game does not begin with an empty board).
Changing the FF
and GM
properties is permitted, but Gomill will carry on using the FF[4] and GM[1] (Go) rules. Changing SZ
is not permitted (but if the size is 19 you may remove the property). Changing CA
is permitted (this controls the encoding used by serialise()
).
Convenience methods for tree access
The complete game tree can be accessed through the root node, but the following convenience methods are also provided. They return the same Tree_node
objects that would be reached via the root node.
Some of the convenience methods are for accessing the leftmost variation of the game tree. This is the variation which appears first in the SGF GameTree
, often shown in graphical editors as the topmost horizontal line of nodes. In a game tree without variations, the leftmost variation is just the whole game.
-
Sgf_game.
get_last_node
()[source] Return type: Tree_node
Returns the last (leaf) node in the leftmost variation.
-
Sgf_game.
get_main_sequence
()[source] Return type: list of Tree_node
objectsReturns the complete leftmost variation. The first element is the root node, and the last is a leaf.
-
Sgf_game.
get_main_sequence_below
(node)[source] Return type: list of Tree_node
objectsReturns the leftmost variation beneath the
Tree_node
node. The first element is the first child of node, and the last is a leaf.Note that this isn’t necessarily part of the leftmost variation of the game as a whole.
-
Sgf_game.
get_main_sequence_above
(node) Return type: list of Tree_node
objectsReturns the partial variation leading to the
Tree_node
node. The first element is the root node, and the last is the parent of node.
-
Sgf_game.
extend_main_sequence
()[source] Return type: Tree_node
Creates a new
Tree_node
, adds it to the leftmost variation, and returns it.This is equivalent to
get_last_node()
.new_child()
Convenience methods for root properties
The following methods provide convenient access to some of the root node’s SGF properties. The main difference between using these methods and using get()
on the root node is that these methods return the appropriate default value if the property is not present.
-
Sgf_game.
get_size
()[source] Return type: integer Returns the board size (
19
if theSZ
root property isn’t present).
-
Sgf_game.
get_charset
()[source] Return type: string Returns the effective value of the
CA
root property (ISO-8859-1
if theCA
root property isn’t present).The returned value is a codec name in normalised form, which may not be identical to the string returned by
get_root().get("CA")
. RaisesValueError
if the property value doesn’t identify a Python codec.This gives the encoding that would be used by
serialise()
. It is not necessarily the same as the raw property encoding (useget_encoding()
on the root node to retrieve that).
-
Sgf_game.
get_komi
()[source] Return type: float Returns the komi (
0.0
if theKM
root property isn’t present).Raises
ValueError
if theKM
root property is present but malformed.
-
Sgf_game.
get_handicap
()[source] Return type: integer or None
Returns the number of handicap stones.
Returns
None
if theHA
root property isn’t present, or if it has value zero (which isn’t strictly permitted).Raises
ValueError
if theHA
property is otherwise malformed.
-
Sgf_game.
get_player_name
(colour)[source] Return type: string or None
Returns the name of the specified player, or
None
if the requiredPB
orPW
root property isn’t present.
-
Sgf_game.
get_winner
()[source] Return type: colour Returns the colour of the winning player.
Returns
None
if theRE
root property isn’t present, or if neither player won.
-
Sgf_game.
set_date
([date])[source] Sets the
DT
root property, to a single date.If date is specified, it should be a
datetime.date
. Otherwise the current date is used.(SGF allows
DT
to be rather more complicated than a single date, so there’s no corresponding get_date() method.)
Tree_node objects
- class
gomill.sgf.
Tree_node
[source] A Tree_node object represents a single node from an SGF file.
Don’t instantiate Tree_node objects directly; retrieve them from
Sgf_game
objects.Tree_node objects have the following attributes (which should be treated as read-only):
-
owner
The
Sgf_game
that the node belongs to.
-
parent
The node’s parent
Tree_node
(None
for the root node).
-
Tree navigation
A Tree_node
acts as a list-like container of its children: it can be indexed, sliced, and iterated over like a list, and it supports the index method. A Tree_node
with no children is treated as having truth value false. For example, to find all leaf nodes:
def print_leaf_comments(node):
if node:
for child in node:
print_leaf_comments(child)
else:
if node.has_property("C"):
print node.get("C")
else:
print "--"
Property access
Each node holds a number of properties. Each property is identified by a short string called the PropIdent, eg "SZ"
or "B"
. See Property list below for a list of the standard properties. See the SGF specification for full details. See Parsing below for restrictions on well-formed PropIdents.
Gomill doesn’t enforce SGF‘s restrictions on where properties can appear (eg, the distinction between setup and move properties).
The principal methods for accessing the node’s properties are:
-
Tree_node.
get
(identifier) Returns a native Python representation of the value of the property whose PropIdent is identifier.
Raises
KeyError
if the property isn’t present.Raises
ValueError
if it detects that the property value is malformed.See Property types below for details of how property values are represented in Python.
See Property list below for a list of the known properties. Any other property is treated as having type Text.
-
Tree_node.
set
(identifier, value) Sets the value of the property whose PropIdent is identifier.
value should be a native Python representation of the required property value (as returned by
get()
).Raises
ValueError
if the identifier isn’t a well-formed PropIdent, or if the property value isn’t acceptable.See Property types below for details of how property values should be represented in Python.
See Property list below for a list of the known properties. Setting nonstandard properties is permitted; they are treated as having type Text.
-
Tree_node.
unset
(identifier) Removes the property whose PropIdent is identifier from the node.
Raises
KeyError
if the property isn’t currently present.
-
Tree_node.
has_property
(identifier) Return type: bool Checks whether the property whose PropIdent is identifier is present.
-
Tree_node.
properties
() Return type: list of strings Lists the properties which are present in the node.
Returns a list of PropIdents, in unspecified order.
-
Tree_node.
find_property
(identifier)[source] Returns the value of the property whose PropIdent is identifier, looking in the node’s ancestors if necessary.
This is intended for use with properties of type game-info, and with properties which have the inherit attribute.
It looks first in the node itself, then in its parent, and so on up to the root, returning the first value it finds. Otherwise the behaviour is the same as
get()
.Raises
KeyError
if no node defining the property is found.
-
Tree_node.
find
(identifier)[source] Return type: Tree_node
orNone
Returns the nearest node defining the property whose PropIdent is identifier.
Searches in the same way as
find_property()
, but returns the node rather than the property value. ReturnsNone
if no node defining the property is found.
Convenience methods for properties
The following convenience methods are also provided, for more flexible access to a few of the most important properties:
-
Tree_node.
get_move
() Return type: tuple (colour, move) Indicates which of the the
B
orW
properties is present, and returns its value.Returns (
None
,None
) if neither property is present.
-
Tree_node.
set_move
(colour, move) Sets the
B
orW
property. If the other property is currently present, it is removed.Gomill doesn’t attempt to ensure that moves are legal.
-
Tree_node.
get_setup_stones
() Return type: tuple (set of points, set of points, set of points) Returns the settings of the
AB
,AW
, andAE
properties.The tuple elements represent black, white, and empty points respectively. If a property is missing, the corresponding set is empty.
-
Tree_node.
set_setup_stones
(black, white[, empty]) Sets the
AB
,AW
, andAE
properties.Each parameter should be a sequence or set of points. If a parameter value is empty (or, in the case of empty, if the parameter is omitted) the corresponding property will be unset.
-
Tree_node.
has_setup_stones
() Return type: bool Returns
True
if theAB
,AW
, orAE
property is present.
-
Tree_node.
add_comment_text
(text) If the
C
property isn’t already present, adds it with the value given by the string text.Otherwise, appends text to the existing
C
property value, preceded by two newlines.
Board size and raw property encoding
Each Tree_node
knows its game’s board size, and its raw property encoding (because these are needed to interpret property values). They can be retrieved using the following methods:
-
Tree_node.
get_size
() Return type: int
-
Tree_node.
get_encoding
() Return type: string This returns the name of the raw property encoding (in a normalised form, which may not be the same as the string originally used to specify the encoding).
An attempt to change the value of the SZ
property so that it doesn’t match the board size will raise ValueError
(even if the node isn’t the root).
Access to raw property values
Raw property values are 8-bit strings, containing the exact bytes that go between the [
and ]
in the SGF file. They should be treated as being encoded in the node’s raw property encoding (but there is no guarantee that they hold properly encoded data).
The following methods are provided for access to raw property values. They can be used to access malformed values, or to avoid the standard escape processing and whitespace conversion for Text and SimpleText values.
When setting raw property values, any string that is a well formed SGF PropValue is accepted: that is, any string that that doesn’t contain an unescaped ]
or end with an unescaped \
. There is no check that the string is properly encoded in the raw property encoding.
-
Tree_node.
get_raw_list
(identifier) Return type: nonempty list of 8-bit strings Returns the raw values of the property whose PropIdent is identifier.
Raises
KeyError
if the property isn’t currently present.If the property value is an empty elist, returns a list containing a single empty string.
-
Tree_node.
get_raw
(identifier) Return type: 8-bit string Returns the raw value of the property whose PropIdent is identifier.
Raises
KeyError
if the property isn’t currently present.If the property has multiple PropValues, returns the first. If the property value is an empty elist, returns an empty string.
-
Tree_node.
get_raw_property_map
(identifier) Return type: dict: string → list of 8-bit strings Returns a dict mapping PropIdents to lists of raw values.
Returns the same dict object each time it’s called.
Treat the returned dict object as read-only.
-
Tree_node.
set_raw_list
(identifier, values) Sets the raw values of the property whose PropIdent is identifier.
values must be a nonempty list of 8-bit strings. To specify an empty elist, pass a list containing a single empty string.
Raises
ValueError
if the identifier isn’t a well-formed PropIdent, or if any value isn’t a well-formed PropValue.
-
Tree_node.
set_raw
(identifier, value) Sets the raw value of the property whose PropIdent is identifier.
Raises
ValueError
if the identifier isn’t a well-formed PropIdent, or if the value isn’t a well-formed PropValue.
Tree manipulation
The following methods are provided for manipulating the tree:
-
Tree_node.
new_child
([index])[source] Return type: Tree_node
Creates a new
Tree_node
and adds it to the tree as this node’s last child.If the optional integer index parameter is present, the new node is inserted in the list of children at the specified index instead (with the same behaviour as
list.insert()
).Returns the new node.
-
Tree_node.
delete
()[source] Removes the node from the tree (along with all its descendents).
Raises
ValueError
if called on the root node.You should not continue to use a node which has been removed from its tree.
-
Tree_node.
reparent
(new_parent[, index])[source] Moves the node from one part of the tree to another (along with all its descendents).
new_parent must be a node belonging to the same game.
Raises
ValueError
if the operation would create a loop in the tree (ie, if new_parent is the node being moved or one of its descendents).If the optional integer index parameter is present, the new node is inserted in the new parent’s list of children at the specified index; otherwise it is placed at the end.
This method can be used to reorder variations. For example, to make a node the leftmost variation of its parent:
node.reparent(node.parent, 0)
Property types
The get()
and set()
node methods convert between raw SGF property values and suitable native Python types.
The following table shows how SGF property types are represented as Python values:
SGF type | Python representation |
---|---|
None | True |
Number | int |
Real | float |
Double | 1 or 2 (int) |
Colour | colour |
SimpleText | 8-bit UTF-8 string |
Text | 8-bit UTF-8 string |
Stone | point |
Point | point |
Move | move |
Gomill doesn’t distinguish the Point and Stone SGF property types. It rejects representations of ‘pass’ for the Point and Stone types, but accepts them for Move (this is not what is described in the SGF specification, but it does correspond to the properties in which ‘pass’ makes sense).
Values of list or elist types are represented as Python lists. An empty elist is represented as an empty Python list (in contrast, the raw value is a list containing a single empty string).
Values of compose types are represented as Python pairs (tuples of length two). FG
values are either a pair (int, string) or None
.
For Text and SimpleText values, get()
and set()
take care of escaping. You can store arbitrary strings in a Text value and retrieve them unchanged, with the following exceptions:
- all linebreaks are normalised to
\n
- whitespace other than line breaks is converted to a single space
get()
accepts compressed point lists, but set()
never produces them (some SGF viewers still don’t support them).
In some cases, get()
will accept values which are not strictly permitted in SGF, if there’s a sensible way to interpret them. In particular, empty lists are accepted for all list types (not only elists).
In some cases, set()
will accept values which are not exactly in the Python representation listed, if there’s a natural way to convert them to the SGF representation.
Both get()
and set()
check that Point values are in range for the board size. Neither get()
nor set()
pays attention to range restrictions for values of type Number.
Examples:
>>> node.set('KO', True)
>>> node.get_raw('KO')
''
>>> node.set('HA', 3)
>>> node.set('KM', 5.5)
>>> node.set('GB', 2)
>>> node.set('PL', 'w')
>>> node.set('RE', 'W+R')
>>> node.set('GC', 'Example game\n[for documentation]')
>>> node.get_raw('GC')
'Example game\n[for documentation\\]'
>>> node.set('B', (2, 3))
>>> node.get_raw('B')
'dg'
>>> node.set('LB', [((6, 0), "label 1"), ((6, 1), "label 2")])
>>> node.get_raw_list('LB')
['ac:label 1', 'bc:label 2']
Property list
Gomill knows the types of all general and Go-specific SGF properties defined in FF[4]:
Id | SGF type | Meaning |
---|---|---|
AB | list of Stone | Add Black |
AE | list of Point | Add Empty |
AN | SimpleText | Annotation |
AP | SimpleText:SimpleText | Application |
AR | list of Point:Point | Arrow |
AW | list of Stone | Add White |
B | Move | Black move |
BL | Real | Black time left |
BM | Double | Bad move |
BR | SimpleText | Black rank |
BT | SimpleText | Black team |
C | Text | Comment |
CA | SimpleText | Charset |
CP | SimpleText | Copyright |
CR | list of Point | Circle |
DD | elist of Point | Dim Points |
DM | Double | Even position |
DO | None | Doubtful |
DT | SimpleText | Date |
EV | SimpleText | Event |
FF | Number | File format |
FG | None | Number:SimpleText | Figure |
GB | Double | Good for Black |
GC | Text | Game comment |
GM | Number | Game |
GN | SimpleText | Game name |
GW | Double | Good for White |
HA | Number | Handicap |
HO | Double | Hotspot |
IT | None | Interesting |
KM | Real | Komi |
KO | None | Ko |
LB | list of Point:SimpleText | Label |
LN | list of Point:Point | Line |
MA | list of Point | Mark |
MN | Number | Set move number |
N | SimpleText | Node name |
OB | Number | Overtime stones left for Black |
ON | SimpleText | Opening |
OT | SimpleText | Overtime description |
OW | Number | Overtime stones left for White |
PB | SimpleText | Black player name |
PC | SimpleText | Place |
PL | Colour | Player to play |
PM | Number | Print move mode |
PW | SimpleText | White player name |
RE | SimpleText | Result |
RO | SimpleText | Round |
RU | SimpleText | Rules |
SL | list of Point | Selected |
SO | SimpleText | Source |
SQ | list of Point | Square |
ST | Number | Style |
SZ | Number | Size |
TB | elist of Point | Black territory |
TE | Double | Tesuji |
TM | Real | Time limit |
TR | list of Point | Triangle |
TW | elist of Point | White territory |
UC | Double | Unclear position |
US | SimpleText | User |
V | Real | Value |
VW | elist of Point | View |
W | Move | White move |
WL | Real | White time left |
WR | SimpleText | White rank |
WT | SimpleText | White team |
Character encoding handling
The SGF format is defined as containing ASCII-encoded data, possibly with non-ASCII characters in Text and SimpleText property values. The Gomill functions for loading and serialising SGF data work with 8-bit Python strings.
The encoding used for Text and SimpleText property values is given by the CA
root property (if that isn’t present, the encoding is ISO-8859-1
).
In order for an encoding to be used in Gomill, it must exist as a Python built-in codec, and it must be compatible with ASCII (at least whitespace, \
, ]
, and :
must be in the usual places). Behaviour is unspecified if a non-ASCII-compatible encoding is requested.
When encodings are passed as parameters (or returned from functions), they are represented using the names or aliases of Python built-in codecs (eg "UTF-8"
or "ISO-8859-1"
). See standard encodings for a list. Values of the CA
property are interpreted in the same way.
Each Sgf_game
and Tree_node
has a fixed raw property encoding, which is the encoding used internally to store the property values. The Tree_node.get_raw()
and Tree_node.set_raw()
methods use the raw property encoding.
When an SGF game is loaded from a string, the raw property encoding is taken from the CA
root property (unless overridden). Improperly encoded property values will not be detected until they are accessed (get()
will raise ValueError
; use get_raw()
to retrieve the actual bytes).
Transcoding
When an SGF game is serialised to a string, the encoding represented by the CA
root property is used. This target encoding will be the same as the raw property encoding unless CA
has been changed since the Sgf_game
was created.
When the raw property encoding and the target encoding match, the raw property values are included unchanged in the output (even if they are improperly encoded.)
Otherwise, if any raw property value is improperly encoded, UnicodeDecodeError
is raised, and if any property value can’t be represented in the target encoding, UnicodeEncodeError
is raised.
If the target encoding doesn’t identify a Python codec, ValueError
is raised. The behaviour of serialise()
is unspecified if the target encoding isn’t ASCII-compatible (eg, UTF-16).
Parsing
The parser permits non-SGF content to appear before the beginning and after the end of the game. It identifies the start of SGF content by looking for (;
(with possible whitespace between the two characters).
The parser accepts at most 64 letters in PropIdents (there is no formal limit in the specification, but no standard property has more than 2; strings as long as 9 letters have been found in the wild).
The parser doesn’t perform any checks on property values. In particular, it allows multiple values to be present for any property.
The parser doesn’t, in general, attempt to ‘fix’ ill-formed SGF content. As an exception, if a PropIdent appears more than once in a node it is converted to a single property with multiple values.
The parser permits lower-case letters in PropIdents (these are allowed in some ancient SGF variants, and are apparently seen in the wild). It ignores those letters, so for example CoPyright
is treated as a synonym for CP
and should be retrieved using node.get("CP")
.
The sgf_moves
module
The gomill.sgf_moves
module contains some higher-level functions for processing moves and positions, and provides a link to the boards
module.
-
gomill.sgf_moves.
get_setup_and_moves
(sgf_game[, board])[source] Return type: tuple ( Board
, list of tuples (colour, move))Returns the initial setup and the following moves from an
Sgf_game
.The board represents the position described by
AB
and/orAW
properties in the SGF game’s root node.ValueError
is raised if this position isn’t legal.The moves are from the game’s leftmost variation. Doesn’t check that the moves are legal.
Raises
ValueError
if the game has structure it doesn’t support.Currently doesn’t support
AB
/AW
/AE
properties after the root node.If the optional board parameter is provided, it must be an empty
Board
of the right size; the same object will be returned (this option is provided so you can use a different Board class).See also the
show_sgf.py
example script.
-
gomill.sgf_moves.
set_initial_position
(sgf_game, board)[source] Adds
AB
/AW
/AE
properties to anSgf_game
‘s root node, to reflect the position from aBoard
.Replaces any existing
AB
/AW
/AE
properties in the root node.
-
gomill.sgf_moves.
indicate_first_player
(sgf_game)[source] Adds a
PL
property to anSgf_game
‘s root node if appropriate, to indicate which colour is first to play.Looks at the first child of the root to see who the first player is, and sets
PL
it isn’t the expected player (Black normally, but White if there is a handicap), or if there are non-handicap setup stones.