When lex is used with yacc, however, each lex rule may just return an integer which corresponds to a given token type, defined in a header file output by yacc. For example:
File: logic.l
%{
#include "y.tab.h"
%}
%%
not { return NOT; }
or { return OR; }
and { return AND; }
\( { return LPAREN; }
\) { return RPAREN; }
false { return FALSE; }
true { return TRUE; }
\n { return 0; } // End marker
. ; // Ignore everything else.
%%
In this example, the header generated by yacc is included in the header section of the file; each of the rules just returns one of the token IDs defined in
y.tab.h.Of special note is the special token which has a return value of 0, which indicates the end of input.
The yacc file looks like this:
File: logic.y
%{
#include
%}
%token TRUE FALSE AND OR NOT LPAREN RPAREN
%%
expr : literal
| unary
| binary
| parens
{ printf("Parsed expr.\n"); }
;
literal : TRUE
| FALSE
{ printf("Parsed literal.\n"); }
;
unary : NOT expr
{ printf("Parsed not expr.\n"); }
;
binary : expr AND expr
| expr OR expr
{ printf("Parsed binary expr.\n"); }
;
parens : LPAREN expr RPAREN
{ printf("Parsed paren expr.\n"); }
;
%%
int main() {
yyparse();
printf("Called yyparse once; terminating.\n");
}
yyerror(s) char *s; {
fprintf(stderr, "yyerror called with: %s\n", s);
}
As with the lex file, the yacc file has a header section which can contain C code, and then a sequence of rules. In this header section we can include libraries like
stdio.h, which allows us to use printf below. After the header section (delimited by
%{ %}) we have a list of token types. Token type names are traditionally written in all upper-case, although this is just a convention.In this example, expr, literal, unary, binary and parens all match a particular types of expressions. The first rule listed is the "start" symbol; if there is a sequence of tokens that can be parsed using the first rule after the "end marker" token (0), then the expression is considered parsed (accepted); otherwise there is a syntax error.
Below the list of rules we have some C code, which gets copied verbatim into the C source file that yacc outputs. The above lex and yacc file can be made into a working binary by running the commands:
$ lex logic.l $ yacc -d logic.y $ cc -c -o lex.yy.o lex.yy.c $ cc -c -o y.tab.o y.tab.c $ cc -o calc lex.yy.o y.tab.o -ly -ll
Lex by default produces lex.yy.c, as before; yacc produces y.tab.c, and since the option -d is passed, it also produces y.tab.h which contains the token definitions. Then lex.yy.c and y.tab.c can be separately compiled into object files, and these object files can be linked together into a binary.
To make this more convenient, it's a good idea to use a Makefile:
File: Makefile
all: calc
clean:
rm *.c *.h *.o calc
calc: lex.yy.o y.tab.o
cc -o calc lex.yy.o y.tab.o -ly -ll
lex.yy.o: lex.yy.c y.tab.h
y.tab.o: y.tab.c y.tab.h
y.tab.h y.tab.c: logic.y
yacc -d logic.y
lex.yy.c: logic.l
lex logic.l
Finally, we can run the program:
$ ./calc true and (not false) Parsed literal. Parsed not expr. Parsed paren expr. Parsed expr. Called yyparse once; terminating.
And if there's a syntax error (no complete expression before the end marker token (0)):
false and Parsed literal. yyerror called with: syntax error Called yyparse once; terminating.
The next step would be to build some kind of parse tree and evaluate it.
No comments:
Post a Comment