当前位置：网站首页>Multi table associated query -- 07 -- hash join

Multi table associated query -- 07 -- hash join

2022-06-27 07:24:00 【High high for loop】

Tips ： When the article is finished , Directories can be generated automatically , How to generate it, please refer to the help document on the right

List of articles

Hash join
Hash join Principle analysis

Hash join

1. brief introduction

website :Hash join in MySQL 8

mysql8.0 Start introducing Hash join
Insert picture description here

Insert picture description here

2. What is? hash join

So-called hash join Definition ： Use hash Table to match row data in multiple tables join Realization .
Usually ,hash join Efficient than nested loop join fast （ When join There is a small amount of data in one of the tables , When it can be fully cached in memory ,hash join Efficiency is the best ）.

3.Hash Join Treatment process

Let's use an example to illustrate .

SELECT
  given_name, country_name
FROM
  persons JOIN countries ON persons.country_id = countries.country_id;

country Country table , As a basic element table , The amount of data is relatively small
persons Personnel information sheet , The amount of data is relatively large

4.Hash join The process

HashJoin Generally, there are two processes ,hash Table building process and be based on hash Probe comparison section of the table .

establish hash Tabular build The process
Probe hash Tabular probe The process

5. How do you use it? hash join

By default ,hash join Is open .
Can be in explain Add “FORMAT = tree” View a sql

mysql> EXPLAIN FORMAT=tree
    -> SELECT
    ->   given_name, country_name
    -> FROM
    ->   persons JOIN countries ON persons.country_id = countries.country_id;
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                 |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Inner hash join (countries.country_id = persons.country_id)  (cost=0.70 rows=1)
    -> Table scan on countries  (cost=0.35 rows=1)
    -> Hash
        -> Table scan on persons  (cost=0.35 rows=1)
 |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)

Usually , If join Used Equivalent conditions （ One or more ）、 also , No, Index available , Will use hash join .（ in other words , If there is an index , be mysql Will still give priority to All queries ）

We can also use the command to close hash join :

mysql> SET optimizer_switch="hash_join=off";
Query OK, 0 rows affected (0.00 sec)

mysql> EXPLAIN FORMAT=tree
    -> SELECT
    ->   given_name, country_name
    -> FROM
    ->   persons JOIN countries ON persons.country_id = countries.country_id;
+----------------------------------------+
| EXPLAIN                                |
+----------------------------------------+
| 
 |
+----------------------------------------+
1 row in set (0.00 sec)

Hash join Principle analysis

1.hash Table building (build) The process

Select a table with a small amount of data To build hash surface

In the build hash Table time ,mysql take join Data of a data table in The cache to this hash In the table . Usually , Select a table with a small amount of data To build hash surface （ The amount of data to be cached is relatively small ）.
hash Table use join The use of join This table in the uses the condition as hash key.
For example, in the example above , country Table as a Basic element table , The amount of data is relatively small , Then select cache country Table data . in addition ,join Condition is persons.country_id = countries.country_id , Then use countries Tabular country_id Field value as hash key.
When country All relevant data rows in the table are cached , The build process is over .

Insert picture description here

2.hash Table probe （probe） The process

Insert picture description here

In the detection phase , database Read the data row from the data table to be probed （ In this case is person surface ）.
For each row of data read , mysql Will use In a row country_id value Inquire about hash surface , Each row of data is matched , Find one reasonable join Results data .
On the whole ,mysql Just for each table , Just one scan . For probe table scanning , Every time a piece of data is scanned , Then use a constant time based on hash Watch coming in Data results match .

3. Data table splitting

When A data sheet （ As hash Table of source data ） When it can be cached in memory ,hash join Our efficiency is very fast .
that hash join Available hash How big is the cache ？
This is a adopt System Variable join_buffer_size The control of the . This variable can be modified at any time , Immediate effect .
So if as join Amount of data in the data sheet It's big , Unable to complete caching , How to deal with that ？
If you are building hash In the process of watch , If it reaches join_buffer_size value , be mysql Write the remaining data to a file block on disk .
When writing file blocks ,mysql Will try to Control the size of each block , So that the subsequent block can be just loaded into join_buffer_size The size of hash In cache .（ however mysql There is also a biggest limitation , For each join , Maximum 128 individual Disk data block ）.

If you do join Amount of data in the data sheet It's big , Unable to complete caching , Data splitting is also required for small tables

When the data is written to the disk file block , How do I know which line of data is written to that file block ？ Here is a new hash function , Used to locate data blocks .
Then why use a new hash Function? ？ This reason will be followed by .

4. Split ----- Data detection phase

In the data detection phase ,mysql The process of data matching and no writing The process of disk block file is the same （ It's like all the data is written to memory hash The table is the same ）： In the probe table , Each row of data scanned , Just arrive In memory hash In the table matching , Find eligible data .
But here's the difference , If the disk block is written , It's in Probe every row of data scanned in the table A , stay hash After the table is matched , You also need to write to the disk file block （ Because for data rows A , It is also possible to match Data previously written to the disk block ）.
Need to pay attention to when , Write the probe table to Disk file block , Locate the data row to Of a particular block of data hash function and take hash The algorithm for writing table source table data to data block is consistent . therefore Matching data Will be written to The same couple Data block .

for example ,country The table has a large amount of data , Can only be With A - D The first country is written in Memory hash In the table . in addition , take The rest of the country data Write to disk file block .

If the country HXX Write to hash Table block HA in . Scanning Person Table time , If a person's country is also HXX , The same is true , Will The calling data is written to Probe table HXX In the block number .

Insert picture description here
ad locum , There are two things that need to be explained ：

In the beginning, I wrote Disk file block , We need to pay attention to The size of each file block should not exceed join buffer size, all , One of them hash Disk file blocks can be loaded exactly into hash join In the table ;
Why use different hash Algorithm to allocate different data rows to different Disk data block ？ If the algorithm is the same , The data of a block is loaded into join hash In the table , Then a large amount of data will be in hash The same row of the table , A lot of conflicting data .

原网站

版权声明
本文为[High high for loop]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/178/202206270650181576.html