Especially when learning, it is important to have debugging facilities. Luckily, YACC can give a lot of feedback. This feedback comes at the cost of some overhead, so you need to supply some switches to enable it.
When compiling your grammar, add --debug and --verbose to the YACC commandline. In your grammar C heading, add the following:
int yydebug=1;
This will generate the file 'y.output' which explains the state machine that was created.
When you now run the generated binary, it will output a *lot* of what is happening. This includes what state the state machine currently has, and what tokens are being read.
Peter Jinks wrote a page on debugging which contains some common errors and how to solve them.
Internally, your YACC parser runs a so called 'state machine'. As the name implies, this is a machine that can be in several states. Then there are rules which govern transitions from one state to another. Everything starts with the so called 'root' rule I mentioned earlier.
To quote from the output from the Example 7 y.output:
state 0
ZONETOK , and go to state 1
$default reduce using rule 1 (commands)
commands go to state 29
command go to state 2
zone_set go to state 3
By default, this state reduces using the 'commands' rule. This is the aforementioned recursive rule that defines 'commands' to be built up from individual command statements, followed by a semicolon, followed by possibly more commands.
This state reduces until it hits something it understands, in this case, a ZONETOK, ie, the word 'zone'. It then goes to state 1, which deals further with a zone command:
state 1
zone_set -> ZONETOK . quotedname zonecontent (rule 4)
QUOTE , and go to state 4
quotedname go to state 5
The first line has a '.' in it to indicate where we are: we've just seen a ZONETOK and are now looking for a 'quotedname'. Apparently, a quotedname starts with a QUOTE, which sends us to state 4.
To follow this further, compile Example 7 with the flags mentioned in the Debugging section.
Whenever YACC warns you about conflicts, you may be in for trouble. Solving these conflicts appears to be somewhat of an art form that may teach you a lot about your language. More than you possibly would have wanted to know.
The problems revolve around how to interpret a sequence of tokens. Let's suppose we define a language that needs to accept both these commands:
delete heater all
delete heater number1
To do this, we define this grammar:
delete_heaters:
TOKDELETE TOKHEATER mode
{
deleteheaters($3);
}
mode: WORD
delete_a_heater:
TOKDELETE TOKHEATER WORD
{
delete($3);
}
You may already be smelling trouble. The state machine starts by reading the word 'delete', and then needs to decide where to go based on the next token. This next token can either be a mode, specifying how to delete the heaters, or the name of a heater to delete.
The problem however is that for both commands, the next token is going to be a WORD. YACC has therefore no idea what to do. This leads to a 'reduce/reduce' warning, and a further warning that the 'delete_a_heater' node is never going to be reached.
In this case the conflict is resolved easily (ie, by renaming the first command to 'delete heaters all', or by making 'all' a separate token), but sometimes it is harder. The y.output file generated when you pass yacc the --verbose flag can be of tremendous help.