The DOT Language


The following is an abstract grammar defining the DOT language. Terminals are shown in bold font and nonterminals in italics. Literal characters are given in single quotes. Parentheses ( and ) indicate grouping when needed. Square brackets [ and ] enclose optional items. Vertical bars | separate alternatives.
graph : [ strict ] (graph | digraph) ID '{' stmt_list '}'
stmt_list : [ stmt [ ';' ] [ stmt_list ] ]
stmt : node_stmt
| edge_stmt
| attr_stmt
| ID '=' ID
| subgraph
attr_stmt : (graph | node | edge) attr_list
attr_list : '[' [ a_list ] ']' [ attr_list ]
a_list : ID [ '=' ID ] [ ',' ] [ a_list ]
edge_stmt : (node_id | subgraph) edgeRHS [ attr_list ]
edgeRHS : edgeop (node_id | subgraph) [ edgeRHS ]
node_stmt : node_id [ attr_list ]
node_id : ID [ port ]
port : port_location [ port_angle ]
| port_angle [ port_location ]
port_location : ':' ID
| ':' '(' ID ',' ID ')'
port_angle : '@' ID
subgraph : [ subgraph ID ] '{' stmt_list '}'
| subgraph ID

An ID is any alphanumeric string not beginning with a digit, but possibly including underscores; or a number; or any quoted string possibly containing escaped quotes.

An edgeop is -> in directed graphs and -- in undirected graphs.

An a_list clause of the form ID is equivalent to ID=true.

The language supports C++-style comments: /* */ and //. In addition, a line beginning with a '#' character is considered a line output from a C preprocessor (e.g., # 34 to indicate line 34 ) and discarded.

Semicolons aid readability but are not required except in the rare case that a named subgraph with no body immediately preceeds an anonymous subgraph, since the precedence rules cause this sequence to be parsed as a subgraph with a heading and a body.

As another aid for readability, dot allows single logical lines to span multiple physical lines using the standard C convention of a backslash immediately preceding a newline character.