CSc 453: Supporting Documents
A [f]lex tutorial (Powerpoint slides)
flex (and lex): Overview Scanner generators: Helps write programs whose control flow is directed by instances of regular expressions in the input stream. \definecolor{blue}{HTML}{CCFFFF}
\begin{tikzpicture}[scale=4]
\node[fill=blue!50,draw,minimum width=2.5cm,minimum height=1.5cm]at(0,0)(v){flex (or lex)};
\draw[-stealth](v.east)--+(.5cm,0)node[align=left,xshift=50]{Output: C code\\implementing a\\scanner:\\
\underline{function}: \tt yylex()\\
\underline{file}: \tt lex.yy.c};
\draw[stealth-](v.west)--+(-.5cm,0)node[align=left,xshift=-50]{Input: a set of\\regular expressions\\+ actions};
\end{tikzpicture} Using flex \definecolor{blue}{HTML}{CCFFFF}
\usetikzlibrary {decorations.pathreplacing}
\begin{tikzpicture}[scale=4]
\fontsize{10}{12}\selectfont
\node[fill=blue!50,draw,minimum width=1.42cm,minimum height=1cm]at(2.75,2.17)(v){flex};
\draw[-stealth](v.east)--+(.1cm,0)node[align=left,xshift=35]{\underline{file}: \tt lex.yy.c\\
\\[-1em]
\quad yylex()\\
\quad \{\\
\quad \dots\\
\quad \}};
\draw[stealth-](v.west)--+(-.1cm,0)node[align=left,xshift=-35](l){lex input spec\\(regexps +\\actions)};
\draw[stealth-](l.south)--+(0,-.3cm);
\begin{scope}[shift={(0,-0.12)}]
\draw[line width=.4] (2.143,1.684) circle (.016);
\draw[line width=.4] (2.192,1.684) circle (.016);
\draw (2.1257,1.6377) arc (-140:-40:0.057);
\node (v1) at (2.1689,1.668) {};
\draw (v1) circle (0.0856);
\draw (2.0351,1.5521) -- (2.1701,1.5137) -- (2.2941,1.5556) ;
\draw (2.1707,1.5843) --(2.1689,1.4287)edge(2.0684,1.2857)--(2.2471,1.2851);
\end{scope}
\draw[-stealth] (2.3386,1.5475)--+(.67,0)node[align=left,xshift=30](sd){driver\\code};
\draw[dashed,yshift=-15.5] (3.12,2.32) rectangle (3.46,1.85);
\draw[dashed] (3.12,2.32) rectangle (3.46,1.85);
\draw (3.3349,1.4) -- (3.46,1.1389)node[yshift=-20,xshift=24]{$\left\{\!\!\begin{array}l\text{main() \{…\}}\\
\quad\text{or}\\
\text{parser() \{…\}}
\end{array}\right.$};
\draw[decorate,decoration={brace,raise=5,amplitude=10}] (3.46,2.32) --+(0,-1);
\draw[-stealth](3.59,1.82)--+(.2,0)[xshift=6]node{compiler};
\draw(3.8088,2.1068) rectangle (4.19,1.544);
\draw[-stealth,xshift=17](3.59,1.82)--+(.2,0);
\node at (2.36,1.7351)[align=left] {user\\supplies};
\end{tikzpicture} flex: input format
An input file has the following structure:
\begin{tikzpicture}[scale=4]
\node[minimum height=1.92cm,minimum width=2.2cm,align=left,draw]{definitions\\
\%\%\\
rules\\
\%\%\\
user code
};
\draw[-stealth] (-0.5,0.1)node[left]{required} -- (-0.22,0.1);
\draw[stealth-] (0.0158,-0.0181)--(0.4,0) node [right] {optional};
\draw[-stealth] (0.3977,0) edge (-0.0224,-0.1111)edge(0.1808,-0.2285)--(0.2237,0.1915);
\node at (-1,-0.4137) {Shortest possible legal flex input:};
\node[minimum width=2.2cm,draw]at(0,-.6){\%\%\hskip2.7em\phantom.};
\end{tikzpicture}
Definitions
•A series of:
▶name definitions, each of the form
name definition
e.g.:
DIGIT [0-9]
CommentStart
"/*"
ID [a-zA-Z][a-zA-Z0-9]*
▶start conditions
▶stuff
to be copied verbatim into the flex output (e.g., declarations, #includes):
–enclosed in %{ … }%, or
–indented
Rules
•The rules portion of the input contains a sequence
of rules.
•Each rule has the form
pattern
action
where:
▶pattern describes a pattern to be matched
on the input
▶pattern must be un-indented
▶action must begin on the same line.
Patterns
•Essentially, extended regular
expressions.
▶<<EOF>> to match “end
of file”
▶Character classes:
–[:alpha:], [:digit:], [:alnum:],
[:space:], etc. (see man page)
▶{name}
where name was defined earlier.
•“start conditions” can be used to specify
that a pattern match only in specific situations.
Example
A flex program to read a file of (positive) integers and compute the average:
%{
#include <stdio.h>
#include <stdlib.h>
%}
dgt [0-9]
%%
{dgt}+ return atoi(yytext);
%%
void main()
{
int val, total = 0, n = 0;
while ( (val = yylex()) > 0 ) {
total += val;
n++;
}
if (n > 0) printf(“ave = %d\n”, total/n);
}
注: The C library function int atoi(const char *str) converts the string argument str to type int .
\usetikzlibrary {decorations.pathreplacing}
\definecolor{blue}{HTML}{669999}
\begin{tikzpicture}[scale=4]
\fontsize{10}{12}\selectfont
\node[align=left]{\tt\%\{\\
\tt\#include <stdio.h>\\
\tt\#include <stdlib.h>\\
\tt\%\}\\
\tt dgt [0-9]\\
\tt\color{cyan}\%\%\\
\tt\{dgt\}+ return atoi(yytext);\\
\tt\color{cyan}\%\%\\
\tt void main()\\
\tt\{\\
\tt\ \ \ int val, total = 0, n = 0;\\
\tt\ \ \ while ((val = yylex()) > 0 ) \{\\
\tt\ \ \ \ \ \ total += val;\\
\tt\ \ \ \ \ \ n++;\\
\tt\ \ \ \}\\
\tt\ \ \ if (n > 0) printf("ave = \%d\textbackslash n", total/n);\\
\tt\}
};
\draw (-0.81,0.4225) ellipse (0.2597 and 0.06);
\draw (-0.5711,0.4393) -- (0.3668,0.6585)node[align=left,right]{Definition for a digit\\(could have used builtin definition [:digit:] instead)};
\draw (-0.4116,0.2093) ellipse (0.674 and 0.0963);
\draw (0.2131,0.2448) -- (0.5722,0.302)node[right,align=left]{Rule to match a number and return its value to\\the calling routine};
\draw[decorate,decoration={brace,amplitude=10}] (1.045,0.0471) -- (1.045,-0.8743);
\draw (1.1338,-0.4095) -- (1.3751,-0.3595)node[align=left,right]{Driver code\\(could instead have been in a separate file)};
\draw[decorate,decoration={brace,amplitude=5}] (-1.1,0.3719)--(-1.1,0.8795);
\draw[decorate,decoration={brace,amplitude=5}] (-1.1,0.139)--(-1.1,0.2485);
\draw[decorate,decoration={brace,amplitude=5}] (-1.1,-0.8971)--(-1.1,0);
\node[rotate=90,color=blue] at (-1.22,0.6187) {definitions};
\node[rotate=90,color=blue] at (-1.22,0.1896) {rules};
\node[rotate=90,color=blue] at (-1.22,-0.4496) {user code};
\end{tikzpicture}
\usetikzlibrary {decorations.pathreplacing}
\definecolor{blue}{HTML}{669999}
\begin{tikzpicture}[scale=4]
\fontsize{10}{12}\selectfont
\node[align=left]{\tt\%\{\\
\tt\#include <stdio.h>\\
\tt\#include <stdlib.h>\\
\tt\%\}\\
\tt{\color{magenta} dgt} [0-9]\\
\tt\color{cyan}\%\%\\
\tt\{{\color{magenta} dgt}\}+ return atoi({\color{magenta}yytext});\\
\tt\color{cyan}\%\%\\
\tt void main()\\
\tt\{\\
\tt\ \ \ int val, total = 0, n = 0;\\
\tt\ \ \ while ((val = {\color{magenta}yylex}()) > 0 ) \{\\
\tt\ \ \ \ \ \ total += val;\\
\tt\ \ \ \ \ \ n++;\\
\tt\ \ \ \}\\
\tt\ \ \ if (n > 0) printf("ave = \%d\textbackslash n", total/n);\\
\tt\}
};\draw[decorate,decoration={brace,amplitude=5}] (-1.1,0.3719)--(-1.1,0.8795);
\draw[decorate,decoration={brace,amplitude=5}] (-1.1,0.139)--(-1.1,0.2485);
\draw[decorate,decoration={brace,amplitude=5}] (-1.1,-0.8971)--(-1.1,0);
\node[rotate=90,color=blue] at (-1.22,0.6187) {definitions};
\node[rotate=90,color=blue] at (-1.22,0.1896) {rules};
\node[rotate=90,color=blue] at (-1.22,-0.4496) {user code};
\draw (-0.949,0.4181) ellipse (0.083 and 0.048);
\draw (-0.9,0.2105) ellipse (0.083 and 0.048);
\draw (-0.8714,0.4412) -- (0.32,0.5785)node[right]{defining and using a name} -- (-0.8556,0.2518);
\draw (-0.0091,0.2091) ellipse (0.1461 and 0.0624);
\draw (0.0746,0.2573) -- (0.32,0.3821)node[right,color=magenta](n){\tt char * yytext;};
\node at(n)[below right,xshift=-2em,yshift=-.5em,align=left]{a buffer that holds the input\\characters that actually match\\the pattern};
\draw (-0.1233,-0.3232) ellipse (0.13 and 0.0605);
\draw (-0.0314,-0.2777) -- (0.6414,-0.0808)node[right](i){Invoking the scanner: \color{magenta}\tt yylex()};
\node at(i)[below right,xshift=-5.5em,yshift=-.5em,align=left]{Each time {\tt yylex()} is called, the\\scanner continues processing\\the input from where it last left\\off.\\Returns 0 on end-of-file.};
\end{tikzpicture} Avoiding compiler warnings If compiled using “gcc –Wall” the previous flex file will generate compiler warnings:
lex.yy.c: … : warning: `yyunput’ defined but not used
lex.yy.c: … : warning: `input’ defined but not used
These can be removed using ‘%option’ declarations in the first part of the flex input file:
%option nounput
%option noinput Matching the Input
•When more than one pattern can match the
input, the scanner behaves as follows:
▶the longest match is chosen;
▶if multiple rules match, the rule
listed first in the flex input file is chosen;
▶if no rule matches, the default is
to copy the next character to stdout.
•The text that matched (the “token”) is
copied to a buffer yytext.
\definecolor{r}{HTML}{FF6600}
\definecolor{b}{HTML}{0000FF}
\begin{tikzpicture}[scale=4]
\fontsize{10}{12}\selectfont
\draw[fill=r!50,draw=red] (0.9892,0.052) ellipse (0.06 and 0.08);
\draw[fill=b!20,draw=blue] (-0.1141,0.3859) ellipse (0.06 and 0.08);
\draw[fill=r!50,draw=red] (-1.9594,0.9496) ellipse (0.06 and 0.08);
\draw[fill=b!20,draw=blue] (-2.6673,0.9463) ellipse (0.06 and 0.08);
\node[align=left,draw](c){
\tt\#include <stdio.h> \color{blue}/* definitions */\\
\tt \color{blue}int main(int argc, char * argv[ ]) \{\\
\tt\color{blue}\ \ \ if (argc <= 1) \{\\
\tt\color{blue}\ \ \ \ \ \ \ printf(“Error!\textbackslash n”); /* no arguments */\\
\tt\ \ \ \}\\
\tt\ \ \ printf(“\%d args given\textbackslash n”, argc);\\
\tt\ \ \ return 0;\\
\}
};
\node at (-0.9404,0.5646) {Input:};
\node[align=left] at (-2.0828,1.0287) {Pattern to match C-style comments: /* … */\\[1em]
\qquad\Large\tt"/*"(.|\textbackslash n)*"*/"};
\node[align=left] at (-2.4375,0.5897) {longest match:\\Matched text shown in \color{blue}blue
};
\draw (-1.9175,0.4409) -- (c.west);
\end{tikzpicture}
Using Start Conditions
•Start conditions let us explicitly simulate finite state machines.
•This lets us get around the “longest match” problem for C-style comments.
FSA for C comments:
\usetikzlibrary{automata, positioning}
\begin{tikzpicture}[->,>=stealth,node distance=2cm]
\fontsize{10}{12}\selectfont
\node[state, initial](i){};
\node[state, right of=i] (s1) {S1};
\node[state, right of=s1] (s2) {S2};
\node[state, right of=s2] (s3) {S3};
\node[state, right of=s3, accepting] (f) {};
\draw(i)edge[above]node{/}(s1)
(s1)edge[above]node{*}(s2)
(s2)edge[loop above]node{non-*}(s2)
(s2)edge[above]node{*}(s3)
(s3)edge[loop above]node{*}(s3)
(s3)edge[below, bend left]node{non-\{/,*\}}(s2)
(s3)edge[above]node{/}(f);
\end{tikzpicture} | flex input:
%x S1, S2, S3
%%
"/" BEGIN(S1);
<S1>"*" BEGIN(S2);
<S2>[^*] ; /* stay in S2 */
<S2>"*" BEGIN(S3);
<S3>"*" ; /* stay in S3 */
<S3>[^*/] BEGIN(S2);
<S3>"/" BEGIN(INITIAL); |
Putting it all together
•Scanner implemented as a function int yylex();
▶return value indicates type of token found (encoded as a positive integer);
▶the actual string matched is available in yytext. •Scanner and parser need to agree on token type encodings
▶let yacc generate the token type encodings
–yacc places these in a file y.tab.h
▶use “#include y.tab.h” in the definitions section of the flex input file. •When compiling, link in the flex library using “-lfl”
|