当前位置:网站首页>Interview key points that must be mastered index and affairs (with B-tree and b+ tree)

Interview key points that must be mastered index and affairs (with B-tree and b+ tree)

2022-06-26 18:11:00 A goblin

One , Indexes

1, Concept

Index is a special kind of file , Contains a reference pointer to all records in the data table . You can index one or more columns in a table , And specify the type of index , Each index has its own data structure implementation .

2, effect

  • Tables in the database 、 data 、 The relationship between indexes , It's like a book on a shelf 、 The relationship between book content and book catalog .
  • The function of the index is similar to that of a book catalogue , It can be used for quick positioning 、 Retrieving data .
  • Index is very helpful to improve the performance of database .

3, What can be used as an index ( Be sure to watch B- Trees and B+ Trees )

1) If there is no index , At this point, the search method is to traverse sequentially

2) In data structure , What kind of data structure is fast to find ?

A. Hashtable      
B. Binary search tree 
 
  reflection : Is it possible to use hash as index ?
  Not very good 
  For example, to perform sql sentence     
 select name from student
 where id = 10;
  A search like this uses hash It's totally feasible 
  But when it comes to execution sql Statement for 
 select name from student
 where id = '1%'; 
  When fuzzy matching is encountered ,hash There's nothing we can do 
  because hash Finding needs to be clearly known key Value ,
  Can pass hash The function calculates the subscript , And then find the data 
 
  For the scenario of fuzzy query , Binary search tree is more appropriate ,
  Although I am not sure which one it is key,
  However, all relevant nodes can be obtained 

in fact ,mysql Index structure in , Neither use hash surface , Nor is it a binary search tree , But use B+ Trees (N Cross search tree ) Such a structure

3) Let's get to know each other B+ Trees

To know B+ Before the tree, let's have a brief look B- Trees ( Pay attention to reading and doing B Trees , No B Subtract )

As shown in the picture B- Trees :
 Insert picture description here
B-Tree Each node in the can store multiple data , Multiple data can be divided into certain intervals , Further data are placed in combination with such intervals
, Make the height of the tree lower , More efficient production and search

Here's the picture :B+ Trees :
 Insert picture description here

  • know B+ Trees :

    1.B- Non leaf nodes in the tree may store data ,B+ The tree data must be on the leaf node , Non leaf nodes are only used to assist in searching
    2. The sibling nodes of each layer are connected ( It's similar to a linked list )( Traversal is more convenient , Especially when the interval search is specified )

  • B+ Tree advantage :
    1. A single node stores more elements , Make the query IO Fewer times .
    2. All queries need to find leaf nodes , Query performance is stable .
    3. All leaf nodes form an ordered list , Easy range query .

4, Use scenarios

Consider creating an index on a column or columns of a database table , The following points need to be considered :

  • Large amount of data , And we often make conditional queries on these columns .
  • The insert operation of the database table , And modify these columns less frequently .
  • Indexes take up extra disk space .

When the above conditions are met , Consider indexing these fields in the table , To improve query efficiency .

conversely , If the non conditional query column , Or insert it often 、 Modify the operating , Or when there is not enough disk space , Don't think about creating indexes

5, Use

Create a primary key constraint (PRIMARY KEY)、 Unique constraint (UNIQUE)、 Foreign key constraints (FOREIGN KEY) when , Automatically created Index of corresponding column .

Be careful :

The primary key index is different from other column indexes

  • The leaf nodes in the primary key index store records one by one , Search with primary key , In order to complete the search
  • The primary key stored in other column indexes id value , First, find the primary key according to the index id, Holding the primary key id Find... In the primary key index

Therefore, the search rate through the primary key index is higher than that using other indexes

  • Look at the index
show index from  Table name ;
  • Create index
create index  Index name  on  Table name ( Field name );
  • Delete index
drop index  Index name  on  Table name ;

6, Interview related questions

1) What is the background of the index , What problem does the index solve ?
Avoid traversal , Optimize query speed

2) How indexing works
Underlying data structure ,B+ Trees ( It is better to describe B+ Tree related characteristics , and hash What advantage does it have ( Can handle fuzzy matching problems ), And binary search tree than what advantage ( Lower height , More efficient search ), and B What are the advantages of trees )

Two , Business

1, Why use transactions

First create a test table

drop table if exists accout;
create table accout(  
id int primary key auto_increment,  
name varchar(20) comment ' title of account ',  
money decimal(11,2) comment ' amount of money ' );
insert into accout(name, money) values 
(' Alibaba ', 5000),
(' The Forty Thieves ', 1000);

for instance , Forty thieves stole from Alibaba's account 2000 element

--  Alibaba accounts are down 2000 
update accout set money=money-2000 where name = ' Alibaba '; 
--  The accounts of the forty thieves have increased 2000 
update accout set money=money+2000 where name = ' The Forty Thieves '; 

If you are executing the first sentence above SQL when , There was a network error , Or the database hangs up , Alibaba's account will be reduced 2000, however There is no increase in the account of the forty thieves .

Solution : Use transactions to control , Make sure the above two sentences SQL Or it all works , Or all failed .

2, The concept of things

A transaction is a logical set of operations , The units that make up this set of operations , All or nothing , All or nothing .
In different environments , You can have business . In the database , It's database transactions .

3, The four basic characteristics of things

  • 1, Atomicity

A transaction is an indivisible unit of work , The operations included , Or you can just do it all , Or don't do any of them .

If during execution , The first step was successful , There is a problem in the second step , To avoid data errors , Often rollback occurs (rollback)

In the definition Not one of them , Not really did not do , Instead, the data is recovered through rollback

  • 2, Uniformity ( correctness )

Before and after the transaction , Data should be in a legal situation

for example :
A The amount in the card :5000
B The amount in the card :1000

here A—>B 2000,

Implied conditions :
1) Before and after transfer , You have to promise AB The total amount in the card is 6000
2) Amount cannot be negative

  • 3, persistence :

Once the transaction is committed ( performed ), Changes to the database should be permanent , Other operations to follow / fault , Will not affect the correctness of the transaction just now .

Business 1:A—B 1000
Business 2:A—B1000

When a transaction 1 After successful execution , No matter what the extreme situation is ( Business 2 Anyway ), Will not deal with affairs 1 Impact

  • 4, Isolation,

When multiple transactions are executed concurrently , The internal operation of things is isolated from the internal operation of other things , No impact

1) Concurrency can improve the efficiency of program execution , If the data correctness is not affected , We still want to make as many operations as possible ( Business ) Concurrent execution

2) Isolation is to describe the corresponding problems in the concurrent process

3) Possible problems caused by concurrent operation of database ( These problems do not just occur in the database , As long as it is concurrent programming , May involve )

  • Dirty reading

example : Yes AB Two students ,A Typing Code ,B Looking at A Knock on the code ,A Write a class Student, Defines an attribute String
name, then B And left ,A Continue to write, and finally put the name Deleted , here B Dirty reading happened , Read dirty data , bad data .

At this time, we do not have any requirements for isolation , So dirty reading occurs
To avoid dirty reading , Lock when writing ( Do not read while writing , Improved isolation and degradation , This reduces concurrency ).

  • It can't be read repeatedly
    The results of reading data twice in a transaction are inconsistent

On the basis of locking the write operation :A Wrote a Student class , Defined a Sting name attribute , Submit , Then the students saw , We watch ,A Revised Student class , Deleted name attribute , Submit again ,B After updating the code, we found , Why , What I saw just now is missing !!

After locking the write operation just now , Although it can solve the dirty reading problem , But the problem of non repeatable reading still exists , To avoid non repeatable reading , Read operations can also be locked ,( Further improve the isolation , Concurrency is further reduced )

  • Fantasy reading
    The result set returned by multiple queries in the same transaction is different .

Just now, when locking the reader , Just for " read Student" Lock it up . At this time I can't read Student When the Student. But I can change other classes, such as Score class .
The next time I submit the code , Students will find , Just now I haven't Score This class , Now why do you have it ?

In order to solve the problem of unreal reading, we must further improve the isolation ." Serialization "

B When reading code A Just take a break and don't make any changes ( It's not just Student, Don't move other classes ).A When writing code ,B Just go and have a rest , Don't read anything .

Parallelism is gone -( Completely serial execution ) Isolation is the highest . There is no impact between multiple transactions .

additional :
MySQL Transaction isolation level in ( Set this level manually , Adjust concurrency and isolation )

  1. read uncommitted Allow reading uncommitted data , Parallel max , Isolation will cause dirty reading problems at least

  2. read committed, Only the submitted data is allowed to be read , Equivalent to writing and locking , Parallelism reduces - - Point isolation is improved - - spot Can avoid dirty reading problems , But there are non repeatable

  3. repeatable read, Lock when reading and writing , At this point, parallelism is further reduced Isolated inlet - Step up , It can avoid the problem of non repeatability , There is the problem of unreal reading ( Default isolation level )4. serializable, Strict serial execution , The highest degree of isolation , Minimum parallelism , Can avoid phantom reading problems .

4, Use of transactions

(1) Open transaction :start transaction;
(2) Execute more than one SQL sentence
(3) Roll back or commit :rollback/commit;
explain :rollback It's all failure ,commit It's all success .

start transaction; 
--  Alibaba accounts are down 2000 
update accout set money=money-2000 where name = ' Alibaba '; 
--  The accounts of the forty thieves have increased 2000 
update accout set money=money+2000 where name = ' The Forty Thieves '; 
commit;

5, Interview questions about things

1, Do you know anything about affairs ????
answer :
1) The background of things ( There's nothing wrong with nothing / Questions about transaction resolution topic )
2) The nature of things (4 A feature ) You must express it according to your own understanding , Don't endorse
3) Focus on isolation

  a) The problem background : Parallel and concurrent 
  b) Problems in concurrent transactions :(3 A question )
  c) How to solve each problem 
  d)mysql The corresponding isolation level in the database 

The advanced problem of transaction :
1: How transactions should be implemented
2: Do you know how to implement transactions in distributed systems
3: How to realize inter-bank transfer

原网站

版权声明
本文为[A goblin]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/177/202206261801123184.html