当前位置:网站首页>ZUCC_ Principles of compiling language and compilation_ Experiment 05 regular expression, finite automata, lexical analysis
ZUCC_ Principles of compiling language and compilation_ Experiment 05 regular expression, finite automata, lexical analysis
2022-06-24 08:29:00 【The stars don't want to roll】
Compiling language principle and compiling experiment report
| Course name | Programming language principle and compilation |
|---|---|
| Experimental projects | Regular expressions 、 Finite automaton 、 Lexical analysis |
Experimental content
One 、 Reading Courseware , Read textbook No 2 Chapter
Understand lexical elements token The concept of , Understand the types of lexical elements , Lexical semantic value
- Lexical element is the smallest grammatical unit with independent meaning
- identifier ( Variable name , Function name , Class name ……); keyword (if while for function static); constant ( Int ,Float, Complex,Boolean,Symbol…); Operator (+ - * / ^ *= :=); Delimiter ([ ] {} () ; ,); character string (“asf”)
- The identifier is generally the sequence number in the symbol table , For keywords and operators and delimiters can be empty , For constants and strings, it is the sequence number in itself or symbol table
Understand the basics of regular expressions , And the combination method ( choice 、 Connect 、 repeat )
- Regular expressions represent string patterns that conform to a particular format , Used to describe the structure of lexical elements
- If r and s All regular expressions , Respectively represent language L and L(s), that :
- r|s Is a regular expression , The language of expression is
L(r)∪L(s) - rs Is a regular expression , Presentation language
L(r)•L(s) - r* Is a regular expression , Presentation language
(L(r))*
- r|s Is a regular expression , The language of expression is
understand The longest match , Priority
- When the prefix of the input string is successfully matched with multiple end states , Always select the longest prefix to match
- ‘*’ > ‘.’ > ‘|’, The last two are both left-sided
understand p13 2-1 2-2
understand NFA 、DFA,DFA Minimization method
- Deterministic finite automata (DFA): For a given input symbol , Just do the only action ( No, ε Edge transfer )
- Nondeterministic finite automata (NFA): For the same input symbol , There are many actions to choose from
- Treat the state set as a state . Whether the transition of each state on the subset is equivalent , If equivalent , Then these states can be merged .
Automata.js Enter regular expression , View results
Two 、 Write the following regular expression
1、 tabs \t And spaces \t \t
^(\t|\s)*$
2、C/C++ Language notes // /* a/**/sfasfd */ Look for network resources
/\*[^*]*\*+([^/*][^*]*\*+)*/
https://blog.csdn.net/zhangpeterx/article/details/88115338
3、 String constant “this is a string”
^"[a-zA-Z\s]+"$

4、 Floating point numbers 3.14159
^(-?\d+)(\.\d+)?$
https://blog.csdn.net/weixin_39867296/article/details/111611301

Study URL Examples of matching regular expressions
(https?|ftp|file)://[-A-Za-z0-9+&@#/%?=~_|!:,.;]+[-A-Za-z0-9+&@#/%=~_|]
https://www.cnblogs.com/speeding/p/5097790.html
3、 ... and 、 Please explain what is with practical examples ( p13) The longest matching rule 、 Rule order takes precedence
The longest matching rule : For the input substring ‘if8’, It can be combined with reserved words ‘if’ Match two characters , It can also match the identifier by three characters , Longer matches are preferred here . therefore ‘if8’ Is an identifier. . Another example is using greedy matching in regular matching , For example, expressions : a.*b To match ababab, It will match the longest possible string , Here is ababab.g
Rule order takes precedence : For the input substring ‘if 89’, The first match is the reserved word ‘if’, So change the substring to start with a reserved word .
Four 、 Consider the following regular expression , Draw the corresponding NFA/DFA
( a b ∣ a c ) ∗ ( 0 ∣ 1 ) ∗ 1100 1 ∗ ( 01 ∣ 10 ∣ 00 ) ∗ 11 (ab|ac)^{*}\\ (0|1)^{*} 11001^{*}\\ (01|10|00)^{*} 11 (ab∣ac)∗(0∣1)∗11001∗(01∣10∣00)∗11
1、 structure NFA



2、 structure DFA



3、 To minimize the DFA



5、 ... and 、 Study code/lex.zip understand lex The operation principle of lexical generation tool
In the actual compiler Lexical analyzer
Lexer, Is parsed by the parserParsercall , See the following topic for details calcvar Code
Lex Is a lexical analyzer generation tool , It supports the use of regular expressions to describe the patterns of lexical units , This paper gives a specification of lexical analyzer , And generate the corresponding implementation code . Lex The input method of the program is called Lex Language , and Lex The program itself is called Lex compiler ,Flex yes Lex An alternative implementation of the compiler ( lex,flex Relationship and cc,gcc The relationship is very similar ).
Lex How to use the program As shown in the figure below :

One Lex The program usually consists of the following parts :
- Declaration part : The declaration section usually includes
Variable,Explicit constantsAnd the definition of regular expressions ,Explicit constantsIs an identifier whose value is a number , Used to represent the type of lexical unit . - Conversion rules : The transformation rule has the following form :
Pattern { action }. Each pattern is a regular expression , You can use the regular definitions given in the declaration section . The action part is a code snippet , Usually use C Language writing . - Auxiliary function : This section defines the functions required for each action , It can also include main function , This part of the code will be put in the output C In the code .
Lex How the program works :
Lex The program provides a function int yylex() And two variables yyleng,yytext For external use ( Usually a parser ) Read , call .
When calling yylex When , The program will start from yyin Read in characters one by one in the input stream pointed to by the pointer , The program found the longest connection with a pattern Pi After the matching string , The string will be saved to yytext variable , And set up yyleng The variable is the length of the string , The string is also a lexical unit analyzed by the lexical analyzer .
then , The lexical analyzer will execute the pattern Pi Corresponding action Ai, And use yylex The function returns Ai The return value of ( It is usually the type of lexical unit ).
https://www.bwangel.me/2019/12/15/flex/
6、 ... and 、 Look at the warehouse https://gitee.com/sigcc/plzoofs Catalog calcvar Language program
1、 Compile the program , function ,https://gitee.com/sigcc/plzoofs/blob/master/calcvar/README.markdown Refer to the instructions to run in debug mode

View the lexical elements token, Grammar tree

2、 Understand the binding and storage of variable values in the environment
3、 understand The lexical description file lexer.fsl Code for
Random coding knowledge | Using FSLexYacc, the F# lexer and parser (thanos.codes)
4、 Preliminary understanding of grammar description file parser.fsy Code for
dotnet run -vn see dotnet Call the command of lexical analyzer generation tool , The contents are as follows , The specific path is different
dotnet "C:\Users\gm\.nuget\packages\fslexyacc\10.2.0\build\/fslex/netcoreapp3.1\fslex.dll" -o "Scanner.fs" --module Scanner --unicode Scanner.fsl
...
dotnet "C:\Users\gm\.nuget\packages\fslexyacc\10.2.0\build\/fsyacc/netcoreapp3.1\fsyacc.dll" -o "Parser.fs" --module Parser Parser.fsy
dotnet "C:\Users\shuhe\.nuget\packages\fslexyacc\10.2.0\build\fslex\netcoreapp3.1\fslex.dll" -o "Scanner.fs" --module Scanner --unicode Scanner.fsl

dotnet "C:\Users\shuhe\.nuget\packages\fslexyacc\10.2.0\build\/fsyacc\netcoreapp3.1\fsyacc.dll" -o "Parser.fs" --module Parser Parser.fsy

understand fsproj Project file parameters , These parameters are passed to the above command
<FsLex Include="lexer.fsl">
<OtherFlags>--module Lexer --unicode</OtherFlags>
</FsLex>
<FsYacc Include="parser.fsy">
<OtherFlags>--module Parser</OtherFlags>
</FsYacc>
【 translate 】Python Lex Yacc manual - P_Chou - Blog Garden (cnblogs.com)
7、 ... and 、 Study CLex.fsl structure , understand MicroC Lexical analyzer of
Run in debug mode microC Interpreter , compiler -g Specific view ReadME.md
dotnet build -v n interpc.fsproj

dotnet "C:\Users\shuhe\.nuget\packages\fslexyacc\10.2.0\build\fslex\netcoreapp3.1\fslex.dll" -o "CLex.fs" --module CLex --unicode CLex.fsl

dotnet "C:\Users\shuhe\.nuget\packages\fslexyacc\10.2.0\build\/fsyacc\netcoreapp3.1\fsyacc.dll" -o "CPar.fs" --module CPar CPar.fsy




understand The meaning of the word element , Pay attention to the types of lexical elements , And the corresponding semantic value <Type,Value>
“token” There are many books that translate the word “ mark ”, We think that the more appropriate translation should be “ Word symbols ”. It is a programming language “ The smallest lexical unit with independent meanings ”, In this sense, it is related to our natural language “ word ” Have the same meaning . however , It's not exactly the same , Because of the “token” It's not just about “ word ”, It also includes punctuation marks 、 The operator 、 Separator, etc . Translate it into “ Word symbols ” Just to reflect this – spot . But for the sake of brevity , We use “ word ” The word" .-— translator's note
VOID Semantic value and type are the same
NAME “n” The type is
NAMEThe semantic value is “n”CSTINT 1 The type is
CSTINTThe semantic value is “1”
~ microc>dotnet run -p .\interpc.fsproj -g ex1.c 8
Micro-C interpreter v 1.1.0 of 2021-3-22
interpreting ex1.c ...inputargs:[8]
Token:
VOID, NAME "main", LPAR, INT, NAME "n", RPAR, LBRACE, WHILE, LPAR, NAME "n", GT, CSTINT 0, RPAR, LBRACE, PRINT, NAME "n", SEMI, NAME "n", ASSIGN, NAME "n", MINUS, CSTINT 1, SEMI, RBRACE, PRINTLN, SEMI, RBRACE, EOF

Understand keyword handling
- identifier Name Put it in the keyword function keyword The final treatment of
Comprehension notes To deal with
//A singleSingle line comment processing rules , Call the response handler ,
// Parameter is lexbuf
// After processing lexbuf The content has been updated , Comments section filter
// call Token The rule function continues the processing after the comment section
| “/*” { Comment lexbuf; Token lexbuf } // Multiline comment , call Comment The rules
/* */Multiple linesMultiline comment , call Comment The rules ,
and Comment = parse
| “/*” { Comment lexbuf; Comment lexbuf } // Nested processing of annotations
| “*/” { () } // End of comment processing
| ‘\n’ { lexbuf.EndPos <- lexbuf.EndPos.NextLine; Comment lexbuf } // Comments cross line processing
| (eof | ‘\026’) { failwith “Lexer error: unterminated comment” } // Multiline comments are not enclosed
| _ { Comment lexbuf } // In any other case, continue to process subsequent characters
and EndLineComment = parse
| ‘\n’ { lexbuf.EndPos <- lexbuf.EndPos.NextLine } // Update end of line position , return
| (eof | ‘\026’) { () } // End of file ,26 yes CTRL+Z Of ASCII code , It is also a terminator , () Exit back
| _ { EndLineComment lexbuf } // Continue to read lexbuf Middle next character
Understand the blank Line break To deal with
| '\n' { lexbuf.EndPos <- lexbuf.EndPos.NextLine; Token lexbuf } // Line feed // EndPos It's a built-in type Position Example , Indicates the end position of the current line
Understand the handling of strings (MicroC Only lexical analysis , Not fully implemented )
| ‘"’ { CSTSTRING (String [] lexbuf) } // Call string processing rules
and String chars = parse | '"' { Microsoft.FSharp.Core.String.concat "" (List.map string (List.rev chars)) } // End of string , By character array chars Construct string // Because it is a list when it is constructed cons :: operation // You need to use List.rev Flip character array // :: Is said Add it to the front | '\\' ['\\' '"' 'a' 'b' 't' 'n' 'v' 'f' 'r'] // character string "\a" After reading lexical analyzer To see is "\\a" { String (cEscape (lexemeAsString lexbuf) :: chars) lexbuf } | "''" { String ('\'' :: chars) lexbuf } | '\\' { failwith "Lexer error: illegal escape sequence" } | (eof | '\026') { failwith "Lexer error: unterminated string" } // End of file... In string | ['\n' '\r'] { failwith "Lexer error: newline in string" } // Carriage return in string | ['\000'-'\031' '\127' '\255'] { failwith "Lexer error: invalid character in string" } // Appears in the string ASCII Control characters | _ { String (char (lexbuf.LexemeChar 0) :: chars) lexbuf } // Will read the 1 Characters added to the temporary chars Array
Understand the handling of escape characters (MicroC Only lexical analysis , Not fully implemented )
// String escape handler let cEscape s = match s with | "\\\\" -> '\\' | "\\\"" -> '\"' | "\\a" -> '\007' | "\\b" -> '\008' | "\\t" -> '\t' | "\\n" -> '\n' | "\\v" -> '\011' | "\\f" -> '\012' | "\\r" -> '\r' | _ -> failwith "Lexer error: impossible C escape"
8、 ... and 、 Big job grouping , And determine the language Decaf/MicroC/ Tiger ... Start to implement lexical analyzer ( Choose your own language )
Big homework lexical part
Read the language specification use
fslexImplement its lexical analyzer- Textbook appendix p360 page Tiger language norm
Note the nested annotation implementation
/* /* */ */Note the string implementation
"" '' " ' ' "
Nine 、 Reading books
The nature of computation The first 3 Chapter The automaton part , Understand the manual implementation of table driven method NFA,DFA Technology
https://bb.zucc.edu.cn/bbcswebdav/users/j04014/books/Understanding.ComputationRuby For language use, see The first 1 Chapter
The first 2 Chapter Will be used later , Preview in advance
Ten 、( Optional ) read JFLAP-simple tutorial.pdf use JFLAP Structural problems in 4 Of NFA/DFA, And screenshot , Keep the file . JFLAP stay bb On soft Directory , The operation mode is
java -jar JFLAP.jar
边栏推荐
- etcd备份恢复原理详解及踩坑实录
- 新准则金融资产三分类:AMC、FVOCI和FVTPL
- 51单片机_外部中断 与 定时/计数器中断
- [real estate opening online house selection, WiFi coverage temporary network] 500 people are connected to WiFi at the same time
- Solution of electric education system for intelligent supervision station
- 12-- merge two ordered linked lists
- Opening chapter of online document technology - rich text editor
- Question 4 - datepicker date selector, disabling two date selectors (start and end dates)
- Markdown to realize text link jump
- C语言_字符串与指针的爱恨情仇
猜你喜欢
随机推荐
ZUCC_编译语言原理与编译_实验08 语法分析 LR 分析
Pat 1157: school anniversary
Saccadenet: use corner features to fine tune the two stage prediction frame | CVPR 2020
貸款五級分類
Swift extension chainlayout (UI chain layout) (source code)
Swift 基础 Swift才有的特性
Detailed explanation of etcd backup and recovery principle and actual record of stepping on the pit
The applet reads more than 20 data, and the cloud function reads more than 100 restrictions
Which is the first poem of Tang Dynasty?
ZUCC_编译语言原理与编译_实验05 正则表达式、有限自动机、词法分析
Fund raising, trading and registration
11--无重复字符的最长子串
MAYA重新拓布
3D数学基础[十七] 平方反比定理
有关iframe锚点,锚点出现上下偏移,锚点出现页面显示问题.iframe的srcdoc问题
ZUCC_编译语言原理与编译_实验01 语言分析与简介
2021-03-11 comp9021 class 8 notes
js滚动div滚动条到底部
2021-03-16 COMP9021第九节课笔记
Opencv实现图像的基本变换









