当前位置:网站首页>Cause analysis and intelligent solution of information system row lock waiting
Cause analysis and intelligent solution of information system row lock waiting
2022-06-23 14:13:00 【Minimalist intelligence】
“ We all know that waiting for a database lock is very harmful , The immediate consequence is business failure or slow business .《 Cause analysis and intelligent solution of information system row lock waiting 》 A text is composed of minimalist intelligence CTO Huangzhiyi ( Chief scientist of Zhongguancun Zhonglian enterprise financial investment innovation promotion association ) Published in the 《 Electronic finance 》 A professional article in a magazine , I didn't share it in words before , Today, I have compiled the form of pictures and texts , Take the financial industry as an example , Through the systematic research on the waiting event of bank lock in the database of financial institutions , Try to trace the source , Find a reasonable solution , Provide reference and reference for peers .”
At present, the background of the development of smart finance is that there are more and more systems , Business complexity grows exponentially , Software problems such as lock waiting will occur more and more frequently , But at present, most financial institutions lack effective means to deal with it , This makes it particularly urgent and necessary to deal with lock waiting events . The lock waiting events that cause damage to the business of financial institutions are mainly bank lock waiting . When a critical lock wait event is encountered , Usually, the business process is shut down randomly or the entire business database system is restarted directly , But there are serious problems with both approaches , First, it will cause damage to the business , Second, both of these methods will lead to the responsible party who cannot trace the problem afterwards , Causes problems to occur repeatedly .
therefore , It is very necessary to systematically study the bank lock waiting events in the database of financial institutions , go back to the source or origin , Find a reasonable solution .
01
—
The harm of lock waiting to the business
The harm of lock waiting to the business can be explained from the following three dimensions :
1.1 Problems caused
Minor lock waiting events show that the business is slow , For example, there is a long queue in the processing hall ; In serious cases As a result, the relevant business is completely ineffective ;
1.2 Time of occurrence 、 Problem target
It often occurs on the core business system during business peak hours .
1.3 High frequency occurs
At present, lock waiting events occur not only in busy businesses 、 Medium sized financial institutions , More and more Small financial institutions in China have also started to frequent various lock waiting events .
By the visible on , Lock waiting events are very harmful to the business , At present, the background of the development of smart finance is that there are more and more systems , Business complexity grows exponentially , In this context , If you don't respond in time , Such software problems as lock waiting will only occur more and more frequently . But at present, most financial institutions lack effective means to deal with it , This makes it particularly urgent and necessary to deal with lock waiting events .
The lock waiting events that cause business damage are mainly row lock waiting , Therefore, we will focus on the row lock wait .
02
—
The performance and causes of row lock waiting
The lock in the database is an indispensable measure to ensure the orderly access order of the database , The main body of this article --- The row lock wait event is a negative manifestation of the lock mechanism in the process of system operation .
Typical scenario of row lock waiting , Business A To change (Update) surface 1 Some of these lines are recorded , Business A Row level exclusive locks will be placed on these row records , This lock allows other transactions to read these row records inconsistently , However, no other transaction is allowed to... These rows DML( It refers to the deletion of rows in the table Delete、 change Update、 Insert Insert be called DML) Modify the operating , Until transaction A End releasing the exclusive lock . here , If there are other transactions trying to modify these locked rows , These transactions will be forced to wait for transactions A end , To perform their own operations .
The lock waiting events we discuss here refer to the lock waiting events that have severely timed out , In normal operation , Lock waiting happens all the time , The normal lock wait event time is very short , Most of them are at the millisecond level , The lock waiting event that we discussed has a significant impact on the business , At least second level , Even a few minutes 、 Dozens of hours . These row lock waiting events that cause significant harm , There are two main manifestations .
2.1 The first row lock :DML Related row lock
DML( That is, the deletion of data Delete、 change Update、 Insert Insert) Statement will generate row lock , here , If there are other transactions to modify the same row of data , Lock waiting will occur . Generally speaking ,DML Statement if the locking time is long , There are usually two reasons :
2.1.1DML The amount of data that the statement needs to modify is huge , The execution time of the statement is naturally long .
2.1.2DML Statement when accessing a table , The index may be missing , Generated a full table scan , This results in long execution time .
We use an example to illustrate this situation :
UPDATE < surface A>
SET FILE_CONTENT=:1, FILE_NAME=:2
WHERE ID=:3
If < surface A> It's big , There are millions of lines , If ID The column used as the search condition is not indexed , So this one UPDATE Statement will be run against < surface A> Perform a full table scan , It's going to be a long time , This means that the row locking time will also be very long , If there are other transactions to be modified at this time < surface A> Related lines on , It is forced to produce lock waiting . This situation produces lock waiting , We can use the following figure to illustrate .
2.2 The second row lock : Long transaction related row lock
The survey shows that , Long transactions are the most common scenario that causes a large number of lock waits , About the row lock 70%. In financial institutions , It is more common than the first scenario and less easy to find , So the harm is even greater . Let's use the following figure to analyze the lock waiting event caused by a typical long transaction .
The second row lock wait is compared with the first , The main reason for the long duration of the lock is not that the lock is generated DML Statement execution time is long , Instead, the whole transaction takes a long time to execute , Because the release of the lock is based on the commit in the code commit Or rollback rollback To identify .
This kind of lock is very common in the system , For example, we often find lock waiting events when the system slows down , And it is found that the head lock process blocking other sessions is running a path that will not generate row level locks Select Query statement , Even this one SQL Statements access tables that are not locked . The reason is that the lock is generated by other statements before this query statement , The query statement is in the same transaction as the statement that generates the lock , When the lock wait event has occurred , Whether we have the technical means of real-time viewing , You can only see the executing statements in the transaction , therefore , We saw this Select Query statement .
03
—
Routine disposal scheme for row lock waiting
3.1DML Routine disposal scheme for relevant row locks
For those that must be performed during business hours DML sentence , Add indexes to the table or create appropriate data partitions to solve the problem of long locking time , As described in the following figure :
3.2 Routine handling scheme for long transaction related row locks
The lock waiting time caused by long transactions is long , The first choice is when the business logic allows , Add a commit after the program generates a lock statement (commit) operation , To release the lock .
3.3 Limitations of conventional disposal methods
The above mentioned two main schemes for handling row lock wait events , But the solution is that the problem has occurred , And the disposal method after determining the specific cause , The row lock wait event is not resolved from the source .
04
—
The system should respond to the row lock wait event
In dealing with lock waiting Events , The main appeal of smart finance is , How to eliminate all kinds of potential factors that cause serious lock waiting Events , And how to quickly and efficiently handle the unavoidable lock waiting event .
4.1 The system should wait for row lock
The occurrence of severe lock waiting events is the result of typical multi factor interaction , This feature makes it difficult for financial institutions to identify the responsible party when dealing with such events as lock waiting , It is difficult to obtain the support of relevant parties , But if we delve into the cause , It can be found that the generation factors of lock waiting events can be basically divided into three main factors , That is, software performance quality 、 Available resources of the system and specific database access events . The problem of software performance quality can often be traced back to the software development and testing stage , What financial institutions need is a long-term monitoring 、 Continuously analyze and optimize the long-term governance mechanism . We will cover the details of this section in other articles . In this paper, we focus on the suggestions on the emergency response mechanism for lock waiting events . We suggest that : Early warning and rapid automatic diagnosis of lock waiting events should be realized mainly by means of intelligent means .
4.1.1 About early warning
One side , Early warning is the requirement for the healthy operation of financial institutions , Lock wait events are typical “ A small illness will not cure a serious illness ”, At first, it may just be an exception in some performance indicators , It has not affected the business operation , But when the problem has affected the business , Often diagnosis and resolution will also become costly .
On the other hand , Technology also allows early warning . From a technical point of view , The information department can find problems earlier than the business department , Because the lock wait event is characterized by an abnormal symptom , progressive development , Until it affects the operation of the business . This means that if we can notice those abnormal signs at an early stage , And give an early warning , So that the information department can find problems earlier than the business department .
Our current situation is , When the lock waiting event has seriously affected the business , Only when the business department notifies the information department , That is, the information management department often knows the problem later than the business department , This makes the golden time to solve the lock waiting problem has passed , Financial institutions have to bear the avoidable business damage to solve the problem .
4.1.2 About quick automatic diagnosis
When the lock wait event has occurred , The appeal of financial institutions is to have means to solve the lock waiting event itself as soon as possible , To restore the normal state of the business , At this time, the fastest way must be to realize the instantaneous traceability of lock waiting events with the help of intelligence rather than relying on pure manual means .
At present, it is difficult for financial institutions to have the ability of early warning and rapid diagnosis , This is because there is a lack of such tools and software that can accurately find and automatically trace the source lock . For such events that occur at the application level , Traditional monitoring products are a blind spot .
4.2 Intelligent solutions
Through research , We have designed and developed an efficient and early warning system 、 Instantaneous diagnosis 、 Methods for quickly handling lock waiting events . The basic logic is shown in the figure below
4.2.1 Monitoring early warning
To discover the lock waiting events that are happening in the system , A more general approach is to monitor the performance counters related to lock waiting , Once these performance counters are found to be abnormal , Prompt warning . such as , When the lock waits longer than 60 The alert notification will be triggered in seconds .
4.2.2 Traceability and review
In fact, the traceability of locks can be realized by a simple algorithm , Recursive algorithm , First discover the transactions that are forced to lock and wait , Then recursively use the algorithm to see who blocked it , Whether the transaction blocking it is blocked by other transactions , This level of recursion goes back , We can find the head lock of a lock waiting event .
Lock viewing is also critical , We have now found a very effective technique for rendering lock wait events : That is, the sanggi graph is used to present the panoramic view of the lock waiting event . Sangji map was originally a way to show the statistics and reflection of river flow , The context of database lock waiting is very clear , It provides a god perspective for information managers to observe lock waiting events . We can see at a glance , Who created the original head lock ( Figure is the leftmost process number ), And the development of the whole lock waiting event .
4.2.3 Emergency response
When we can find problems early in the occurrence of problems , And have the perspective of God to observe the lock waiting for events , Disposal will become a simple multiple-choice question , The following two examples are used to illustrate this intelligent disposal method .
The following two pictures are screenshots of Sangji when a serious lock waiting event occurs in a financial institution , example 1:
example 1 explain : This example shows the complexity of a financial institution lock waiting transaction , And the necessity of intelligent means , This is a lock waiting event that has plagued an organization for many years , Because the blocking relationship used to be complicated , In the past, the responsible party could not be determined by manual analysis . In fact, the head lock process in this example 2531, Is a very inconspicuous small program , But through the complex transmission between multiple different business processes , Final ,2531 Operation of , It will always lead to the whole core business - The slow or even short-term failure of the credit card program .
Emergency response plan after having intelligent system : Blocking caused by rapid shutdown or rollback 2531 process , If business logic allows , In the future, intelligent means can be used to realize the following functions : That is, once the operation of this small program once again affects the core business , You can set simple keywords and thresholds , Automatic disposal by intelligent means .
05
—
Summary and prospect
The above-mentioned handling methods and technical means for row lock waiting Events , It has been practiced in many financial institutions , The current practical effect is : Serious lock waiting events can be notified within one minute , And at the same time of warning , With the help of recursive algorithm and sangi graph, the traceability analysis of lock waiting events has been completed , Present the context of lock waiting events in real time , It can reduce the service failure time for financial institutions and realize early detection of lock waiting events 、 Early resolution provides successful practice cases .
With the development of smart Finance , Dealing with software related issues such as lock waiting , It has become one of the major challenges in the informatization construction of financial institutions , Looking at the current software problems that financial institutions often encounter , Such as lock waiting 、 Program parsing too much , Missing index, etc , One side , One common feature of these problems that distinguishes them from hardware problems is software problems , There are always signs 、 There is a process of development , Problems different from hardware often occur instantaneously ; On the other hand , Every kind of software problem has its complicated side , The traceability algorithms for each type of software problem are different , therefore , When we try to use intelligent means to deal with such software related problems as lock waiting , We can take advantage of the feature that software problems have early symptoms , Achieve early warning , The algorithm library is used to realize the fast and automatic traceability of each kind of software problems .
Be able to 、 Intelligently deal with software problems , The health of the business is basically guaranteed , So that the human resources of information management can escape from the continuous fire-fighting state , It is the foundation for financial information management to move from operation and maintenance to operation , It is also the first step in upgrading smart finance .
reference
Baixinjiang , Liyanmin . elementary analysis SQL SERVER Causes and solutions of deadlock [J]. Computer application in petroleum industry , 2003, 011(004):34-36.
Liujinmei . Analysis on the optimization measures of the background database of the bank's core business system [J]. Computer CD software and application , 2014(17):67-68.
Zhumingying . Oracle Research on locking mechanism in database [J]. Weekly computer news , 2006(22):52.
Zhangweihua . On the client side Oracle Data row locking problem of [J]. Microcomputer world , 1999(47):85-87.
Yu Gang , Julie , Zhangyunrui . Oracle DML Cause analysis of blocking waiting and handling methods in application [J]. Computer knowledge and technology , 2005, 000(012):42-43.
Liuyuqiang . be based on ORACLE(OLTP) Research and implementation of database performance optimization scheme [D]. Beijing University of Posts and telecommunications .
—— Minimalist intelligence : Accurately predict the operation risks of key business systems and provide solutions ——
Address : Dazhongsi East Road, Haidian District, Beijing 9 Jingyi science and technology building C seat
边栏推荐
- One way linked list implementation -- counting
- Google Earth engine (GEE) -- Comparative Case Analysis of calculating slope with different methods
- Xmake v2.6.8 release, compilation cache improvement
- 【深入理解TcaplusDB技术】如何实现Tmonitor单机安装
- 人脸注册,解锁,响应,一网打尽
- KS003基于JSP和Servlet实现的商城系统
- Wechat applet pop up the optional menu from the bottom
- Assembly language interrupt and external device operation --06
- Deci 和英特尔如何在 MLPerf 上实现高达 16.8 倍的吞吐量提升和 +1.74% 的准确性提升
- Shutter clip clipping component
猜你喜欢

Error when Oracle enters sqlplus

【深入理解TcaplusDB技术】如何实现Tmonitor单机安装

2021-04-15

DTU上报的数据值无法通过腾讯云规则引擎填入腾讯云数据库中

Unity realizes the function of playing Ogg format video

White paper - Intel and Ashling, a well-known risc-v tool provider, strive to expand multi platform risc-v support
![[in depth understanding of tcapulusdb technology] how to realize single machine installation of tmonitor](/img/6d/8b1ac734cd95fb29e576aa3eee1b33.png)
[in depth understanding of tcapulusdb technology] how to realize single machine installation of tmonitor

如何打开/关闭chrome控制台调试时的时间戳?

Intelligent digital signage solution
Detailed explanation of kubernetes log monitoring system architecture
随机推荐
[digital signal processing] linear time invariant system LTI (judge whether a system is a "non time varying" system | case 1 | transform before shift | shift before transform)
[digital signal processing] linear time invariant system LTI (judge whether a system is a "non time variant" system | case 2)
KS008基于SSM的新闻发布系统
In depth analysis of mobilenet and its variants
quartus调用&设计D触发器——仿真&时序波验证
如何打开/关闭chrome控制台调试时的时间戳?
Go write file permission WriteFile (filename, data, 0644)?
Multi-Camera Detection of Social Distancing Reference Implementation
Building Intel devcloud
How do I turn on / off the timestamp when debugging the chrome console?
SQLserver2008r2安装dts组件不成功
php接收和发送数据
实战 | 如何制作一个SLAM轨迹真值获取装置?
leetcode:42.接雨水
Flex attribute of wechat applet
首个大众可用PyTorch版AlphaFold2复现,哥大开源OpenFold,star量破千
[digital signal processing] linear time invariant system LTI (judge whether a system is a "non time variant" system | case 3)
Hexiaopeng: if you can go back to starting a business, you won't name the product in your own name
Working for 7 years to develop my brother's career transition test: only by running hard can you get what you want~
Add Icon before input of wechat applet