当前位置:网站首页>SQL parsing practice of Pisa proxy
SQL parsing practice of Pisa proxy
2022-06-27 14:26:00 【InfoQ】
One 、 background
About parsing
- LL( From top to bottom )
- LR( Bottom up )
- LALR
About research
- antlr_rust
- sqlparser-rs
- nom-sql
- grmtools

Two 、Grmtools Use
- To write Lex and Yacc file
/%%
[0-9]+ "INT"
\+ "+"
\* "*"
\( "("
\) ")"
[\t ]+ ;
%start Expr
%avoid_insert "INT"
%%
Expr -> Result<u64, ()>:
Expr '+' Term { Ok($1? + $3?) }
| Term { $1 }
;
Term -> Result<u64, ()>:
Term '*' Factor { Ok($1? * $3?) }
| Factor { $1 }
;
Factor -> Result<u64, ()>:
'(' Expr ')' { $2 }
| 'INT'
{
let v = $1.map_err(|_| ())?;
parse_int($lexer.span_str(v.span()))
}
;
%%
- Construct lexical and grammatical parsers
use cfgrammar::yacc::YaccKind;
use lrlex::CTLexerBuilder;
fn main() -> Result<(), Box<dyn std::error::Error>> {
CTLexerBuilder::new()
.lrpar_config(|ctp| {
ctp.yacckind(YaccKind::Grmtools)
.grammar_in_src_dir("calc.y")
.unwrap()
})
.lexer_in_src_dir("calc.l")?
.build()?;
Ok(())
}
- Integrate parsing in the application
use std::env;
use lrlex::lrlex_mod;
use lrpar::lrpar_mod;
// Using `lrlex_mod!` brings the lexer for `calc.l` into scope. By default the
// module name will be `calc_l` (i.e. the file name, minus any extensions,
// with a suffix of `_l`).
lrlex_mod!("calc.l");
// Using `lrpar_mod!` brings the parser for `calc.y` into scope. By default the
// module name will be `calc_y` (i.e. the file name, minus any extensions,
// with a suffix of `_y`).
lrpar_mod!("calc.y");
fn main() {
// Get the `LexerDef` for the `calc` language.
let lexerdef = calc_l::lexerdef();
let args: Vec<String> = env::args().collect();
// Now we create a lexer with the `lexer` method with which we can lex an
// input.
let lexer = lexerdef.lexer(&args[1]);
// Pass the lexer to the parser and lex and parse the input.
let (res, errs) = calc_y::parse(&lexer);
for e in errs {
println!("{}", e.pp(&lexer, &calc_y::token_epp));
}
match res {
Some(r) => println!("Result: {:?}", r),
_ => eprintln!("Unable to evaluate expression.")
}
}
lrpar::NonStreamingLexer
lrlex::LRNonStreamingLexer::new()
3、 ... and 、 Problems encountered
- Shift/Reduce error
Shift/Reduce conflicts:
State 619: Shift("TEXT_STRING") / Reduce(literal: "text_literal")
%nonassoc LOWER_THEN_ELSE
%nonassoc ELSE
stmt:
IF expr stmt %prec LOWER_THEN_ELSE
| IF expr stmt ELSE stmt
literal -> String:
text_literal
{ }
| NUM_literal
{ }
...
text_literal -> String:
'TEXT_STRING' {}
| 'NCHAR_STRING' {}
| text_literal 'TEXT_STRING' {}
...

%nonassoc 'LOWER_THEN_TEXT_STRING'
%nonassoc 'TEXT_STRING'
literal -> String:
text_literal %prec 'LOWER_THEN_TEXT_STRING'
{ }
| NUM_literal
{ }
...
text_literal -> String:
'TEXT_STRING' {}
| 'NCHAR_STRING' {}
| text_literal 'TEXT_STRING' {}
...
- SQL Contains Chinese questions
Four 、 Optimize
- Analyze in the air ( See Appendix for test code ), Don't execute action Under the circumstances , The performance is as follows :
[[email protected] examples]$ time ./parser
real 0m4.788s
user 0m4.781s
sys 0m0.002s


__GRM_DATA
__STABLE_DATA
grm
stable

- reanalysis , Every time you parse , Will initialize a actions Array of , With grammar The increase of grammatical rules in ,actions The array of will also grow , And the array element type is dyn trait References to , There is overhead at runtime .
::std::vec![&__gt_wrapper_0,
&__gt_wrapper_1,
&__gt_wrapper_2,
...
]
match idx {
0 => __gt_wrapper_0(),
1 => __gt_wrapper_1(),
2 => __gt_wrapper_2(),
....
}



[[email protected] examples]$ time ./parser
real 0m2.677s
user 0m2.667s
sys 0m0.007s
5、 ... and 、 summary
appendix
let input = "select id, name from t where id = ?;"
let p = parser::Parser::new();
for _ in 0..1_000_000
{
let _ = p.parse(input);
}
边栏推荐
- 每日3题(1):找到最近的有相同 X 或 Y 坐标的点
- Rereading the classic: the craft of research (1)
- Make a ThreadLocal (source code) that everyone can understand
- Too many requests at once, and the database is in danger
- 【业务安全-02】业务数据安全测试及商品订购数量篡改实例
- 图书管理系统
- Longest substring without repeated characters (Sword finger offer 48)
- 跨境电商多商户系统怎么选
- Design skills of main function of Blue Bridge Cup single chip microcomputer
- 机械硬盘和ssd固态硬盘的原理对比分析
猜你喜欢
CCID Consulting released the database Market Research Report on key application fields during the "14th five year plan" (attached with download)
【业务安全-02】业务数据安全测试及商品订购数量篡改实例
American chips are hit hard again, and another chip enterprise after Intel will be overtaken by Chinese chips
请求一下子太多了,数据库危
Pytorch learning 3 (test training model)
国产数据库乱象
[business security-02] business data security test and example of commodity order quantity tampering
Completely solve the problem of Chinese garbled code in Web Engineering at one time
清华&商汤&上海AI&CUHK提出Siamese Image Modeling,兼具linear probing和密集预测性能!...
Redis持久化
随机推荐
Kyndryl partnered with Oracle and Veritas
基于SSM的Web网页聊天室系统
Four characteristics of transactions
How to solve the problem of missing language bar in win10 system
In the past, domestic mobile phones were arrogant in pricing and threatened that consumers would like to buy or not, but now they have plummeted by 2000 for sale
NAACL 2022 | TAMT:通过下游任务无关掩码训练搜索可迁移的BERT子网络
招标公告:上海市研发公共服务平台管理中心Oracle一体机软硬件维保项目
Tsinghua & Shangtang & Shanghai AI & CUHK proposed Siamese image modeling, which has both linear probing and intensive prediction performance
Summary and Thinking on interface test automation
R language objects are stored in JSON
The global chip market may stagnate, and China's chip expansion accelerates to improve its self-sufficiency rate against the trend
[business security-04] universal user name and universal password experiment
力扣 第 81 场双周赛
【mysql进阶】MTS主从同步原理及实操指南(七)
Domestic database disorder
EventLoop learning
清华&商汤&上海AI&CUHK提出Siamese Image Modeling,兼具linear probing和密集预测性能!...
At a time of oversupply of chips, China, the largest importer, continued to reduce imports, and the United States panicked
enable_if
Calcul de la confidentialité Fate - Prévisions hors ligne