当前位置:网站首页>[principles of database system] Chapter 5 algebra and logic query language: package, extension operator, relational logic, relational algebra and datalog
[principles of database system] Chapter 5 algebra and logic query language: package, extension operator, relational logic, relational algebra and datalog
2022-07-24 06:05:00 【JKL27】
List of articles
The fifth chapter Algebraic and logical query language
5.1 package Multiset
The definition of package
Definition : That is, multiple sets , Multiset. The same tuple can appear multiple times in a relationship .
Advantages of using packages :
- Improve the efficiency of parallel operation and projection operation .
- To aggregate (SUM, COUNT etc. ) Useful ( Aggregate operations You can't Use the collection method , There will be logical errors ).
Relationship operations on packages
All operations on the package allow tuples to repeat before and after the operation .
matters needing attention :
aggregate Package operation on
If for two sets R、S Perform packet based intersection and difference operations , The result is the same as that of set operation .Because there are no duplicate tuples in the two relationships that are processed objects ,
So there is still no duplicate tuple in the result after crossing or subtracting
But for two sets R and S, Conduct Package based merging , The result of its operation may no longer be a set .
- Be careful : See clearly ( Answer the questions )/ Statement ( Issue ) Is it package mode or collection mode
On the algebraic laws of packets
- Commutative law :S ∪ R = R ∪ S.
Relationship operations on packages :
Package and 、 hand over 、 Bad
set up R and S It's a bag , If tuple t stay R and S Appear in m Secondary sum n Time ( there m and n It can be 0), be :
- R ∪ S Package and operation results : Tuples t appear m+n Time ;
- R ∩ S In the results of the outsourcing operation : Tuples t appear min (m, n) Time ;
- R - S The package difference operation result of : Tuples t appear max(0, m-n) Time .
Projection operation of package
It is basically consistent with the set projection , But there's no need to weigh it again .
Package selection
Cartesian product of packages


Connection of packages
Natural join

θ Connect

5.2 Extended operators of relational algebra
Eliminate duplicate δ ( R ) δ(R) δ(R)
Clear the duplicate elements in the packageAggregate operation ( SUM, AVG, MAX|MIN, COUNT \text{SUM, AVG, MAX|MIN, COUNT} SUM, AVG, MAX|MIN, COUNT)
Count some tuples independently of other tuples in the same relationship 、 For maximum 、 minimum value 、 Average value and other operations- SUM Used to generate a column sum , What you get is a numerical value ;
- AVG Used to generate a column of averages , The result is also a numerical value ;
- MAX and MIN When used for numeric value columns , The resulting values are the largest and smallest values in this column ; When applied to character columns, the result is The largest and smallest of the lexicographic order ;
- COUNT Generate “ value ” Number of ( It doesn't necessarily mean different values ). Again ,COUNT When applied to any attribute of a relationship , What is produced is the number of tuples of this relationship , Include duplicate tuples .
grouping γ L ( R ) \gamma_L(R) γL(R)
Group tuples on one or more attributes according to their valuesapplication γ γ γ An attribute of the relationship of operations ,R Use this attribute to group , It is called grouping attribute ;
for example : chart 5-4 γ s t u d i o n a m e ( M o v i e s ) γ_{studioname}(Movies) γstudioname(Movies)
studioName Disney Disney MGM MGM
Applied to an attribute of the relationship Aggregation operator , You can use arrows to name the clustered attribute column ;
for example : γ s t u d i o n a m e , S U M ( l e n g t h ) → s u m O f L e n g t h ( M o v i e s ) γ_{studioname,SUM(length)\rightarrow sumOfLength}(Movies) γstudioname,SUM(length)→sumOfLength(Movies)
studioName sumOfLength Disney 12345 MGM 54321
Examples of applications
example 5.23 To the relationship StarsIn( title, year, starName), Check the movie stars who have acted in at least three films , And the earliest years they appeared in the film .


Sort τ L ( R ) \tau_L(R) τL(R)
Sort tuples of relationships according to one or more attributes .- expression τ L ( R ) τ_L(R) τL(R), among R It's the relationship ,L yes R List of some attributes in . This expression represents the relationship R Of itself , Only all tuples in the result are pressed L To sort .
- set up L By A 1 , A 2 , … , A n A_1,A_2,…,A_n A1,A2,…,An form , that R First press A 1 A_1 A1 Value ordering for , about A 1 A_1 A1 Tuples with equal attributes press A 2 A_2 A2 Value ordering for , By analogy .
Extended projection π L ( R ) \pi_L(R) πL(R)
- Extended projection is to add some calculation operations as projection list on the basis of projection ;
- The projection list can be :
- Relationship R A property of ;
- Form like x→y The expression of : take R Properties in x Rename it to y;
- Form like E→z The expression of , among E It's about R Properties of 、 Constant 、 Arithmetic operator or string operator expression ,z Is an expression E The new name of the attribute of the calculation result .
External connection operation * , * , * *, *, * *,*,*
Variant of join operation , The emergence of floating tuplesDefinition of outer connection
First calculate two relationships R and S The natural connection of ,
And then put it from R or S Floating tuples of are added , use null The representation symbols of complement those attributes that do not have values in the result tuple .
Different variants of outer connection

example :

5.3 Relational logic
Datalog Language
Datalog Language use if-then Rules to represent queries , That is, you can infer the query result through the combination of some tuples in the known relationship .
Datalog Language consists of two basic atoms , namely Relational atoms and Arithmetic atom , Composed of atoms according to certain rules Datalog Query statement .
stay Datalog In language , The relationship passes through “ The predicate ” To express , Each predicate has a fixed number of parameters , Written as P ( x 1 , x 2 , … , x n ) P(x_1,x_2,…,x_n) P(x1,x2,…,xn), In essence, a predicate is a function that returns a Boolean value .
If R Is included n Attributes a 1 , a 2 , … , a n a_1,a_2,…,a_n a1,a2,…,an The relationship between ,
So when tuples ( t 1 , t 2 , … , t n ) (t_1,t_2,…,t_n) (t1,t2,…,tn) In relation R in , be R ( t 1 , t 2 , … , t n ) R(t_1,t_2,…,t_n) R(t1,t2,…,tn) It's true , Otherwise it is false .
Besides , You can use variables as arguments to predicates .
example :
A B 1 2 3 4 For example, if you define R(x,y), Then when x=1 And y=2, perhaps x=3 And y=4 when ,R It's true , Otherwise it is false .
If defined R(1,z), Then when z=2 when ,R It's true , Otherwise it is false .
The basic structure
Arithmetic atom
Comparison of two arithmetic expressions , The value is Boolean .
x < y x < y x<y
x + 1 > y + 5 × z x + 1\gt y + 5\times z x+1>y+5×z
Extended predicates and connotative predicates
Extended predicates (EDB):
If the relation referred to by the predicate Stored in a database in , Call this predicate Extended predicates (EDB), Corresponding to operands in relational algebraic expressions .
Connotative predicate (IDB):
When the relationship referred to by the predicate is Through one or more Datalog The rule calculates Of , Call the predicate Connotative predicate , Corresponds to the relationship calculated using relational algebraic expressions .
The rules
General rules
The rule is Datalog A specification in language that describes the association between atoms , It includes the following three components :
- A relational atom called the head ;
- Left arrow symbol ←, pronounce as if;
- Main part , It consists of one or more atoms called sub targets .
The sub target can be either a relational atom , It can also be arithmetic atom .
Logical operators are used between sub goals AND Connect , And there can be NOT Negation operator .
for example : L o n g M o v i e ( t , y ) ← M o v i e s ( t , y , l , g , s , p ) A N D l ≥ 100 LongMovie(t,y)←Movies(t,y,l,g,s,p)\space AND \space l ≥100 LongMovie(t,y)←Movies(t,y,l,g,s,p) AND l≥100The goal of the rules : Make the head relation atom true .
Safety rules
Because relationship instances are always limited , Therefore, the head relationship guaranteed by rules is also limited .
Each is in the rule Variables that appear anywhere Must appear in Some non negative relational sub goals of the subject in .
In especial Anything in the rule header 、 Negative relationship sub goal 、 Variables that appear in arithmetic sub goals , Must also appear in The non negative relationship sub goal of the subject in .
example :
Examples of safety violations : P(x,y)←Q(x,z) AND NOT R(w,x,z) AND x<y
5.4 Relational algebra and Datalog
Set operations -> Datalog The rules
Intersection operation
You can use one Datalog Rules to show
Because the intersection operation involves two relations , So in Datalog In the rules , Have sub goals corresponding to two relationships ;
In the rules , The corresponding parameters use the same variables .
example :
hypothesis R and S The pattern of the relationship is R(A,B,C) and S(A,B,C),
be R∩S Rules can be used :
I(a, b, c) ← R(a, b, c) AND S(a, b, c)
Union operation
Each rule corresponds to a relation in union operation , And the heads of the two rules have the same IDB The predicate ;
The parameters of the head are exactly the same as those in each sub target .
example :
hypothesis R and S The pattern of the relationship is R(A,B,C) and S(A,B,C),
be R ∪ S Rules can be used :
U(x, y, z) ← R(x, y, z)
U(x, y, z) ← S(x, y, z) { or U(a, b, c) ← S(a, b, c)}Be careful :
Variable names are local to rules , namely Each rule is calculated independently , And provide tuples to its header relationship , It has nothing to do with other rules , Therefore, variable names are rule independent .
Subtraction operation
You can use a rule with negate subgoals to calculate
- That is, if you calculate U-V, The non negative subgoal is a predicate U, The inverse subgoal is a predicate V;
- In this rule , The sub target and header have the same variables for the corresponding parameters .
example :
hypothesis R and S The pattern of the relationship is R(A,B,C) and S(A,B,C),
be R-S Rules can be used :
D(x, y, z) ← R(x, y, z) AND NOT S(x, y, z)
Projection operation
Use a rule with a single sub goal , The parameters of this sub goal are different variables , Each variable represents an attribute of the relationship ;
The head is an atom with parameters , Its parameters correspond to the attributes of the projection list in the desired order .
example :
example 5.25 The relationship Movies (title, year, length, genre, studioName, producerC#) Project to the first three attributes :
P(t, y, l) ← Movies(t, y, l, g, s, p)
Select operation
If the selection condition is one or more arithmetic comparison expressions AND operation , You can create a with the following sub goals Datalog The rules :
- A relationship sub goal , Corresponding to the relationship in which it is selected . Each component of the relationship sub goal has different variables , And each component corresponds to an attribute of the relationship ;
- Multiple arithmetic sub goals , Each arithmetic sub target corresponds to a comparison operation of the selection condition . Use the corresponding variables in the arithmetic sub goal , And keep consistent with the variables established in the relationship sub goal .
example :
S(t,y,l,g,s,p) ← Movies(t,y,l,g,s,p) AND l≥ 100 AND s=‘Fox’
The cartesian product
- R×S You can use a single Datalog Rules to show
- The rule contains two sub goals corresponding to R and S, The two sub goals set different variables corresponding to attributes .
- Head IDB Predicates contain parameters that appear in all sub targets ,R The variables in the sub goals are listed in S In front of the variable .
p(a,b,c,x,y,z) ← R(a,b,c) AND S(x,y,z)
Join operation
Natural join ( R ⋈ S {R}\bowtie{S} R⋈S)
Use an approximate Cartesian product Datalog Rules to show , The difference lies in R and S The public attribute column of uses the same variable , Otherwise, use different variables .
The rule header is a IDB The predicate , Every variable of it appears once .
example :
hypothesis R and S The pattern of the relationship is R(A,B) and S(B,C,D),
Then natural connection R ⋈ S {R}\bowtie{S} R⋈S You can use rules
J(a,b,c,d) ← R(a,b) AND S(b,c,d)
θ Connect
- First transform the Cartesian product , Then transform the selection conditions .
Multiple operation
The steps of simulating multiple operations are as follows :
Draw the expression tree of relational algebra ;
The leaf nodes of the expression tree are represented by the corresponding extended predicates EDB Express ;
Create a for each internal node of the expression tree IDB The predicate .IDB One or more rules of the predicate correspond to the operators to be used on the tree nodes .
The predicate relationship associated with the root of the expression tree corresponds to the query result .
example :

package -> Datalog The rules
Relational algebra and Datalog Rules can be applied to package operations .
take Datalog When rules are applied to packages , The operation steps are as follows :
First step , Perform packet connection operation on the corresponding relationship of different sub targets ;
The second step , According to the content contained in the arithmetic sub goal , Perform packet selection operation on the result of connection operation ;
The third step , Perform packet projection operation on the selection result according to the relationship of rule headers .
example :
seek π x , z ( R ( x , y ) ⋈ S ( y , z ) ) \pi_{x,z}( R(x,y) \bowtie S(y,z)) πx,z(R(x,y)⋈S(y,z))
H(x,z) ← R(x,y) AND S(y,z)
Datalog Comparison between rules and relational algebra
- Every operation of basic relational algebra can be used Datalog Query expression , But the operations of grouping and aggregation in extended relational algebra cannot be expressed by rules .
- Any single Datalog Rules can be expressed in relational algebra , but Datalog Recursion can be expressed , Relational algebra cannot .
- because IDB Predicates can also appear in the body , Then the tuple of the rule header can be fed back to the rule body , So as to generate more tuples for the header .
边栏推荐
- Write the list to txt and directly remove the comma in the middle
- Thymeleaf快速入门学习
- [FatFs] migrate FatFs manually and transfer SRAM virtual USB flash disk
- Common methods of array
- "Statistical learning methods (2nd Edition)" Li Hang Chapter 14 clustering method mind map notes and after-school exercise answers (detailed steps) K-means hierarchical clustering Chapter 14
- [activiti] gateway
- Yolov5 learning summary (continuously updated)
- 字符串方法以及实例
- [activiti] Introduction to activiti
- Commands for quickly opening management tools
猜你喜欢

Could not load library cudnn_ cnn_ infer64_ 8.dll. Error code 126Please make sure cudnn_ cnn_ infer64_ eight

用指针访问二维数组

tensorflow和pytorch框架的安装以及cuda踩坑记录

Statistical learning methods (2nd Edition) Li Hang Chapter 22 summary of unsupervised learning methods mind mapping notes

Use QT to connect to MySQL and create table numbers, write data, and delete data

Machine learning (Zhou Zhihua) Chapter 4 notes on learning experience of decision tree

STM32 DSP library MDK vc5\vc6 compilation error: 256, (const float64_t *) twiddlecoeff64_ 256, armBitRevIndexTableF64_ 256,

Loss after cosine annealing decay of learning rate

String methods and instances

AD1256
随机推荐
Synergy LAN realizes multi host shared keyboard and mouse (AMD, arm)
Jupyter notebook选择conda环境
原生js放大镜效果
"Statistical learning methods (2nd Edition)" Li Hang Chapter 16 principal component analysis PCA mind map notes and after-school exercise answers (detailed steps) PCA matrix singular value Chapter 16
【数据库系统原理】第五章 代数和逻辑查询语言:包、扩展操作符、关系逻辑、关系代数与Datalog
Common methods of array
jupyter notebook一直自动重启(The kernel appears to have died. It will restart automatically.)
Conversion of world coordinate system, camera coordinate system and image coordinate system
JVM系统学习
[activiti] gateway
Demo of UDP communication applied to various environments
Numpy cheatsheet
PDF Text merge
Search of two-dimensional array of "sword finger offer" C language version
通道注意力与空间注意力模块
"Statistical learning methods (2nd Edition)" Li Hang Chapter 17 latent semantic analysis LSA LSI mind mapping notes and after-school exercise answers (detailed steps) Chapter 17
JS star scoring effect
YOLOv5学习总结(持续更新)
Yolov5 learning summary (continuously updated)
day-7 jvm完结