When lex is used with yacc, however, each lex rule may just return an integer which corresponds to a given token type, defined in a header file output by yacc. For example:
File: logic.l
%{ #include "y.tab.h" %} %% not { return NOT; } or { return OR; } and { return AND; } \( { return LPAREN; } \) { return RPAREN; } false { return FALSE; } true { return TRUE; } \n { return 0; } // End marker . ; // Ignore everything else. %%
In this example, the header generated by yacc is included in the header section of the file; each of the rules just returns one of the token IDs defined in
y.tab.h
.Of special note is the special token which has a return value of 0, which indicates the end of input.
The yacc file looks like this:
File: logic.y
%{ #include%} %token TRUE FALSE AND OR NOT LPAREN RPAREN %% expr : literal | unary | binary | parens { printf("Parsed expr.\n"); } ; literal : TRUE | FALSE { printf("Parsed literal.\n"); } ; unary : NOT expr { printf("Parsed not expr.\n"); } ; binary : expr AND expr | expr OR expr { printf("Parsed binary expr.\n"); } ; parens : LPAREN expr RPAREN { printf("Parsed paren expr.\n"); } ; %% int main() { yyparse(); printf("Called yyparse once; terminating.\n"); } yyerror(s) char *s; { fprintf(stderr, "yyerror called with: %s\n", s); }
As with the lex file, the yacc file has a header section which can contain C code, and then a sequence of rules. In this header section we can include libraries like
stdio.h
, which allows us to use printf
below. After the header section (delimited by
%{ %}
) we have a list of token types. Token type names are traditionally written in all upper-case, although this is just a convention.In this example, expr, literal, unary, binary and parens all match a particular types of expressions. The first rule listed is the "start" symbol; if there is a sequence of tokens that can be parsed using the first rule after the "end marker" token (0), then the expression is considered parsed (accepted); otherwise there is a syntax error.
Below the list of rules we have some C code, which gets copied verbatim into the C source file that yacc outputs. The above lex and yacc file can be made into a working binary by running the commands:
$ lex logic.l $ yacc -d logic.y $ cc -c -o lex.yy.o lex.yy.c $ cc -c -o y.tab.o y.tab.c $ cc -o calc lex.yy.o y.tab.o -ly -ll
Lex by default produces lex.yy.c, as before; yacc produces y.tab.c, and since the option -d is passed, it also produces y.tab.h which contains the token definitions. Then lex.yy.c and y.tab.c can be separately compiled into object files, and these object files can be linked together into a binary.
To make this more convenient, it's a good idea to use a Makefile:
File: Makefile
all: calc clean: rm *.c *.h *.o calc calc: lex.yy.o y.tab.o cc -o calc lex.yy.o y.tab.o -ly -ll lex.yy.o: lex.yy.c y.tab.h y.tab.o: y.tab.c y.tab.h y.tab.h y.tab.c: logic.y yacc -d logic.y lex.yy.c: logic.l lex logic.l
Finally, we can run the program:
$ ./calc true and (not false) Parsed literal. Parsed not expr. Parsed paren expr. Parsed expr. Called yyparse once; terminating.
And if there's a syntax error (no complete expression before the end marker token (0)):
false and Parsed literal. yyerror called with: syntax error Called yyparse once; terminating.
The next step would be to build some kind of parse tree and evaluate it.
No comments:
Post a Comment