当前位置:网站首页>Underlying principle of MySQL index

Underlying principle of MySQL index

2022-06-26 06:10:00 Laughing_ Xie

Indexes Help MySQL Efficient access to data Ordered data structure .

An index is a data structure , The data structure has : Binary tree ( Binary search tree Binary Search Tree)、 Red and black trees (Red Black Tree)、Hash surface 、B-Tree、B+Tree

Index details :

1. Suppose the index adopts binary tree data structure :

As shown in the figure : We built a table , Some personal data , At this time, use age Field to build an index , If the binary tree data structure is used as the index, the figure is as follows :

At this point, if you want to query :select * from user where age=29 ;

age For index fields , First, we will query the data pages stored in our index ,29 Greater than 24, Less than 31, Soon to find 29 This index value , And according to this key Value stored disk file address value , Do the disk again IO To find the target data stored in this address . If age Not an index field , You need to compare the data row by row in the database , comparison 6 The target data is not available until after times , Keep on disk IO, Efficiency is very low .

disadvantages : If the index field has been increasing regularly , The height of the tree will be very high , And the query efficiency is still very low , In this case , It almost becomes a linear search . Pictured :

2. Red and black trees  

At this time, red and black trees came into being , Can solve this problem , Red black trees not only have the characteristics of binary trees , It also has its own additional properties ;

1. The nodes are red or black .

2. The root node is black .

3. Each leaf node is black and empty (NIL node ).

4 The two child nodes of each red node are black .( There cannot be two consecutive red nodes on all paths from each leaf to the root )

5. All paths from any node to each leaf contain the same number of black nodes .

  There is no in-depth explanation here ( Because I am not an expert ); Here's the way to solve the problem , Is to use the spin properties of the red black tree , If there is a linear case , The nodes of the red black tree will rotate left or again , To achieve the self balance of the tree structure

A simple understanding is shown in the figure :

  Rotate to form a reasonable tree structure , So the height of the tree is reduced , Find more efficiently ! For example, a primary key :

    In the case of large amounts of data , Red and black trees still have performance problems , For example, millions or more of data , The height of red and black trees will still be very high , Because the basic feature is binary search . here B-Tree And that's what happened , When the vertical height cannot be broken through , Only lateral expansion can be considered , The index data that can be placed horizontally is increased , Let a node store more index elements , At the same time, it can be divided into more branches .

    After some transformation of red and black trees , It can support the search of large amounts of data , The height can be controlled within 5 within .

   B-Tree : Leaf nodes have the same depth , The pointer to the leaf node is null , The data indexes in the node are incrementally arranged from left to right .

MySQL The bottom layer of the index  B+Tree , It's right B-Tree A transformation of , It is actually a multi fork balanced tree .

B+Tree :  Non leaf nodes do not store data, Store index only , More indexes can be placed ; At the same time, the leaf node maintains the sequential access pointer ( bi-directional pointer ), Improve the performance of interval access .( All index elements are stored in leaf nodes , As shown in the figure below )

 

      The relevant index data is only stored in the leaf node , This allows non leaf nodes to store more indexes , But it does not mean that every node index can be stored indefinitely , Because in the end, it will be put into memory for operation and search , Need to consider memory and cost . at present MySQL The space allocated by each node of the can be viewed through the command :

SHOW GLOBAL STATUS LIKE 'INNODB_page_size';

  Each node is assigned 16384 Bytes , That is to say 16KB Space .

 

  When storing indexes , A node can store up to 16k/14B=1170 An index ; Leaf nodes store indexes and related data , Leaf node Suppose the size of each group is 1KB Then a leaf node can be saved 16 Data ; As shown in the figure ,B+Tree The height is 3 When , Theoretically, the amount of data that can be stored is :1170*1170*16=21902400 20million data volume . That is, in the case of tens of millions of levels of data , The height of the tree is only 3, It's very efficient .

       Hash surface :MySQL At present, we support BTREE and HASH Two indexing methods , That is, two index storage data structures , use Hash Hash algorithm ,key Is the index value ,value Disk pointer data for storing data , Same as hash The calculated values are mapped , In this way, the search for a single piece of data will be very fast , But the general database is rarely used , Although a single data search is fast , however hash Unable to support range lookup , The performance of range lookup is very low , The supported scenarios are single .

       B+Tree  Leaf nodes maintain pointers , The index values of the elements on the right are greater than those on the left , So when range lookup occurs , Just find the corresponding index to find the desired result set . Put all the following elements into the result set in turn along the pointer , Without this pointer , Each time, you will start from the root node to find the corresponding index and data .   


Different storage engines , The underlying use B+Tree When storing , There is a little difference between storing files

1.MyISAM The index file and data file of are separated ( Non aggregation )

      When files are stored on the disk , A table has at least three files ,*_myisam.frm Table structure file ,*_myisam.MYD Table data row file ,*_myisam.MYI An index that stores table data (B+Tree structure ). If this type of storage engine , Hit the index first MYI Locate this index in the file , Find the corresponding... Stored in this file Disk file pointer , here ( Leaf node data What is stored is the file pointer of the row on the disk where the data corresponding to this index is located ), Go again MYD Find the corresponding data in the file , Put it in memory .

2.InnoDB Index implementation of ( Gather ), Support transactions , Most used

     The table data itself is by B+Tree An index structure file of an organization ,  Clustered index - Leaf nodes contain complete data records

      One for use InnoDB The table that stores the engine , There are two corresponding files *_innodb_lock_frm ( Table structure file )、*_innodb_lock_ibd( Stored index + data ),( Inside the leaf node data The corresponding row data is stored ), Than MyISAM One less disk IO.

      Why? InnoDB Must have primary key , It is recommended to use an integer self incrementing primary key ?

     1. If you do not set the primary key, you can still create a table for storage , But the bottom layer will automatically identify a field or create a field as a primary key . 

    2. Non primary key index ,data It does not store specific row data , It's the primary key index

    3. Integer is used because there are many comparison operations when searching the index , In this case, other types of efficiency will not be very high , For example, use UUID When , String comparison , A bit by bit comparison is required , If English and numbers need to be transposed ASCII Code to compare sizes , Affect efficiency .

    4. If the primary key is not auto incremented , There will be subsequent new data that may be in the middle of the previous node , because data Stored row data , A node 16KB size , If this leaf node is already so large , Now we need to add a primary key index and data , The previous leaf node needs to perform a data page splitting operation , To add the new data , Affect performance . Use self increasing , You can ensure that each new data is added later , There will be no data page splitting .

      Why do leaf nodes of non primary key index structure store primary key values ?( Consistency and storage savings )

原网站

版权声明
本文为[Laughing_ Xie]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/177/202206260600572945.html