当前位置：网站首页>Dig deep into MySQL - resolve the clustered index / secondary index / federated index of InnoDB storage engine

Dig deep into MySQL - resolve the clustered index / secondary index / federated index of InnoDB storage engine

2022-06-24 23:09:00 【Little Camellia girl】

List of articles

1. Cluster index

create table index_demo(
	c1 int,
	c2 int,
	c3 char(1),
	primary key(c1)
) ROW_FORMAT=COMPACT;

Insert picture description here

B+ The tree itself is a directory , Or is itself an index , It has the following two characteristics ：

1、 Use the size of the record primary key value to sort records and pages , It contains 3 The meaning of each level ：

(1) The records in the page are arranged into a one-way linked list according to the size of the primary key , The records in the page are divided into several groups , The offset of the largest record of the primary key value in each group in the page will be placed in the page directory as a slot , We can quickly locate the record whose primary key column is equal to a certain value in the page directory by using dichotomy .
(2) Each page storing user records is also arranged into a two-way linked list according to the primary key size of user records in the page .
(3) The pages for storing catalog item records are divided into different levels , The pages in the same level are also arranged into a two-way linked list according to the primary key size of the directory item records in the page .

2、B+ The leaf node of the tree stores complete user records , The so-called complete user record , It means that the values of all columns are stored in this record .

We put B+ Trees are called clustered indexes , All complete user records are stored in the leaf node of the cluster index ,InnoDB The storage engine will automatically create a clustered index for us , stay InnoDB In the storage engine , Cluster index is the storage method of data , That is, all user records are stored in the leaf node , It's called ” Index is data , Data is index “.

Query data according to the primary key value ：

select * from user where c1=10;

At this point, the clustered index will work , because InnoDB The cluster index will be created automatically , And this query statement is based on the primary key value .

2. Secondary indexes

Clustered indexes can only work when the search criteria are primary key values , as a result of B+ The data in the tree is sorted by primary key , So what if we use other columns as search criteria ？ Can you only traverse the linked list from beginning to end ？

It's not , We can build another one B+ Trees , And adopt different sorting rules , For example, we use c2 Column size as data page 、 The collation of the records in the page , Build another one B+ Trees ：

create table index_demo(
	c1 int,
	c2 int,
	c3 char(1),
	primary key(c1)
) ROW_FORMAT=COMPACT;

Insert picture description here

This B+ Tree and cluster indexes are different ：

1、 Use records c2 Sort records and pages according to the size of columns , Contains the following three meanings ：

(1) The records in the page are in accordance with c2 The size order of the columns is arranged into a one-way linked list , The records in the page are divided into several groups , In each group c2 The offset of the record with the largest column value in the page will be placed in the page directory in turn as a slot , We can quickly locate... In the page directory by dichotomy c2 A record whose column equals a certain value .
(2) Each page storing user records is also recorded according to the page c2 The column sizes are arranged in a two-way linked list .
(3) The pages for storing catalog item records are divided into different levels , Pages at the same level are also recorded according to the directory entries in the page c2 The columns are arranged in a two-way linked list .

2、B+ The leaf node of the tree does not store complete user records , But only c2 Column + The values of the two columns of the primary key .

3、 Directory entry records are no longer primary keys + Page number matching , And become c2 Column + Page number matching .

This sort rule is based on the size of non primary key columns B+ Trees , You need to perform a table return operation to locate the complete user record , So this kind of B+ Trees are also called secondary indexes or secondary indexes . Because we are based on c2 The size of the column is used as B+ Tree sorting rules , So we also call it B+ Tree is c2 Index created by column , hold c2 Columns are called index columns .

The records in the clustered index leaf nodes are called complete user records , The records in the secondary index leaf node are called incomplete user records .

3. Back to the table

If we want to find that the search conditions are met c2=4 The record of , You can use the non clustered index above .

because c2 Columns do not use uniqueness constraints , Satisfy c2=4 There may be more than one record , In fact, we just need to B+ The leaf node of the tree is located to the first one that meets the prime search condition c2=4 The record of , Then scan backward along the one-way linked list composed of records .

in addition , Each leaf node forms a two-way linked list , After searching the records on this page, you can skip to the first record in the next page in sequence , Then scan backward along the unidirectional linked list of records .

The query process is as follows ：

1、 Through page 44 Determine the first compliance c2=4 The page where the condition's catalog entry record is located .

According to page 44, because 2<4<9, You can quickly locate the first line that meets c2=4 The page where the directory entry record of the condition is located is page 42;

2、 Through page 42 Determine the first compliance c2=4 The page where the user record of the condition is located .

According to page 42, because 2<4<=4, You can locate the page where the first qualified user record is located as page 35 Or page 35;

3、 In the page where the user record is actually stored, locate the page that matches c2=4 Conditional user records .

Go to the page 34 Or page 35 Locate the specific user record in ;

4、 This B+ The leaf node value of the tree is stored c2 Column sum c1 Column （ Primary key ）, In this B+ After locating the first qualified user record at the leaf node of the tree , We need to find the complete user record in the cluster index according to the primary key information in the record , The process of relocating a complete user record by carrying the primary key information to the cluster index is called Back to the table .

Then return to this B+ At the leaf node of the tree , Find the qualified user record just located , And continue to search for other conditions along the one-way linked list composed of records c2=4 The record of , Every time you find one, continue to return to the table , Repeat the action , Know that the next record is not satisfied c2=4 Until this condition .

4. Joint index

We can also use multiple columns as sorting rules at the same time , That is, index multiple columns at the same time . such as , We want to make B+ The tree follows c2 Column sum c3 Sort column sizes , These two contain two meanings ：

(1) First, follow the records and pages c2 Sort columns ;

(2) On record c2 In the same case , Then use c3 Sort columns ;

by c2 Column sum c3 Column index , The schematic diagram is ：
Insert picture description here
In the picture above B+ The following two points need to be noticed in the tree ：

(1) Each directory entry record is created by c2 example 、c3 Column 、 Page number here 3 Part of it is made up of . Each record is based on c2 Sort the values of the columns , If it's recorded c2 The columns are the same , According to c3 Sort the values of the columns .
(2) B+ The user record at the tree leaf node is recorded by c2 Column 、c3 Columns and primary keys c1 Column composition .

Be careful ： With c2 Column sum c3 The size of the column is set up by the collation B+ The tree index is a union index , Pages are called composite indexes or multi column indexes . It is essentially a secondary index , Its index columns include c2 Column sum c3 Column .

With c2 and c3 The size of the column is to establish a joint index for the collation and c2 and c3 Columns are indexed differently , The differences are as follows ：

(1) Building a federated index will only create the top one B+ Trees ;
(2) by c2 and c3 When the columns are indexed separately , Will be c2 and c3 The size of the column creates two columns for the collation B+ Trees ;

5. Secondary index optimization

Earlier, we said that the contents of the directory entry records of the secondary index are ： Index columns + Page number , But only index columns + There is a problem with the page number , Suppose the data in this table is ：

insert into index_demo values(1,1,'u'),(3,1,'d'),(5,1,'y'),(7,1,'a')

This is the case c2 Column set up B+ The schematic diagram of the tree is ：
Insert picture description here
At this point, if index_demo Insert another record in ：

insert into index_demo values(9,1,'c')

Because of the page 3 Two catalog entry records in , Corresponding c2 The values of the columns are 1, In the newly inserted record ,c2 The value of the column is also 1, Then this record should be placed on page 4 Middle or page 5 What about China? ？ That's the problem .

therefore , In order for the newly inserted record to find its own page , We need to make sure that B+ Tree entries at the same level , It is unique except for the page number field . So the non leaf nodes of the secondary index are composed of 3 Part of the form ：

1、 Value of index column ;
2、 Primary key value ;
3、 Page number ;

We also add the primary key value to the directory entry record of the non leaf node of the secondary index , So you can make sure B+ Each directory entry record in each layer node of the tree is unique except for the page number field , Therefore c2 The schematic diagram of the column after establishing the secondary index is ：
Insert picture description here
here , The contents of the directory entry record of the secondary index are ： Index columns + Primary key + Page number

Take the new record first c2 Column values and pages 3 Of each catalog item record in c2 Compare the values of columns ; If c2 The values of the columns are the same , You can then compare the primary key values , Finally, you can locate a unique directory entry record . So the new record should be inserted into the page 5 in .

Be careful ： For secondary index records , First, sort by the value of the secondary index column , When the secondary index column values are the same , Then sort by primary key value , therefore , by c2 Indexing columns is actually equivalent to (c2,c1) Column to create a federated index .

For a unique secondary index , Its non leaf nodes also include the primary key value of the record .

原网站

版权声明
本文为[Little Camellia girl]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/02/202202211145087995.html