brief introduction
Database performance tuning usually requires a higher database level , And accompanied by a lot of preparatory work , For example, collect various performance baselines 、 Different kinds of performance indicators 、 slow SQL Log etc. , This is usually time-consuming and ineffective , When faced with multiple databases, the total cost of ownership increases significantly . Today, databases have already entered the cloud era , With alicloud RDS for SQL Server Clouddba This free tool , Alibaba cloud can be quickly and accurately reduced RDS for SQL Server The cost of database load optimization and the skill level of operators , So as to achieve the goal of using more energy to realize the business itself , Not implementation details on the database .
This article mainly shares Alibaba cloud Clouddba Basic principle and usage of performance insight , And use the platform to diagnose and optimize common performance problems .
How to evaluate database load ?
When asked , How to evaluate database load , Different characters may think of different ways , For example, the following :
- QPS/TPS
- Use of resources : IOPS CPU Memory
- SQL execution time
- Concurrency
- Application Business feedback
Each of the above evaluation methods is one-sided and difficult to be used as a reference for actual optimization .
Usually , It is complicated for us to evaluate the database resource load , We need to have a comprehensive understanding of relational databases , But as a user of the database , Most people don't need to learn more about databases , therefore , We tend to simplify the indicators .
for instance , We'll just watch CPU、IO、 Memory and other indicators to see if there is a problem in the database , These indicators are suitable for monitoring most applications , But for the database, it may not be able to correctly reflect what happens in the database , And how we should deal with . We also need to make a comprehensive judgment based on the unique indicators of many databases , Such as a variety of SQL Server Dedicated performance counters 、DMV、 Waiting type 、 Long business 、 The Internet 、 Active connections, etc . But this information requires us to have a high-level understanding of the database itself , This makes it a high threshold to evaluate the load of the database .
Now we might as well change our thinking , The relational database itself is a synchronous process , in other words , From the application SQL, Return results to the database , It's synchronous , The database does not complete the request , Then the application will not receive the results , Between the application and the database during this period Session It's called “Active” state , Therefore, we can try not to evaluate the database load from the perspective of resource use , It is simplified to a simple indicator -AAS(Average Active Session), That is, the number of active sessions .
Why do we use AAS Concept
imagine , When you drive to a destination , What are you more concerned about ? Distance to destination ? Is there a traffic jam on the road ? Is there a parking space to the destination ? wait , Do you care about the state of the car ? Maybe , But you need to know how the engine works 、 Can the relevant principles of the car correctly judge whether the state of the car is normal ? We only need to make a simple judgment through a few simple indicators and alarm lights on the instrument panel .
The same goes for databases , Most user scenarios do not require an understanding of the underlying principles of the database engine , Instead, focus more on how to use the database , Of course, enthusiasts say otherwise :-)
We use AAS The concept of , Provides a simple 、 Abstract evaluation methods , That is, the number of active connections to the database to measure the overall load of the database , And every kind of SQL Contribution to load , Put all kinds of database metric Summarized as a simple indicator ----AAS
, This enables users to use this abstract concept to evaluate database load , Users only need to compare AAS And CPU Core count to evaluate whether the current load exceeds the capacity of the current instance , This greatly reduces the requirements of users for database skills , Users can focus more on business logic than on database technical details . Optimizer 、 Implementation plan 、 Execution engine ,Buffer Pool, We can reduce our understanding of the technical details of these databases
One AAS A graphic example with simple concept is shown in Figure 1 Shown :
chart 1. Simple examples of performance insights
The horizontal axis Time For time , Suppose there is 3 Long connections ( It's the one in the picture above User), Each connection sends... To the database according to the application load SQL request , When the time is 1 when ,User1 Connection in progress SQL, And use CPU resources ,User2 Waiting for lock resource ,User3 No load , So time 1 Of AAS The value is 2, Time 2 Of AAS The value is 3, And so on .
that AAS The value of is 2 still 3 Whether it's high or low ? It depends on what the current database has CPU Core Number , every last Core Maintain a complete SQL Execution cycle , Pictured 2 Shown :
chart 2.SQL Execute each CPU Scheduling status of
When AAS value <=CPU Checking time , Generally speaking, there is no extra waiting for the load of the database , The current load does not need to wait for additional CPU The scheduling , yes AAS Ideal state .
Imagine a scenario , As the database operation and maintenance personnel , The developer or business partner comes to you and says , hi , There's something wrong with the database . adopt AAS, You can simply base it on AAS An indicator , Initially narrow the scope of investigation , Determine if the problem is really in the database .
A simple AAS The comparison with the instance kernel number is as follows :
- AAS ≈0 The database has no obvious load , The exception is on the application side
- AAS < 1 The database is not blocked
- AAS< Max CPUs There is spare time CPU nucleus , But there may be a single Session Full or resources (OLAP)
- AAS> Max CPUs There may be performance issues , But there are special circumstances
- AAS>> Max CPUs There are serious performance problems , But there are special circumstances
Introduction to performance insight
Pass diagram 3 We can see the performance insight function UI, The entry of this function is shown in the figure
chart 3. A classic example of performance insight UI
The upper and lower parts , The upper part shows each time period in time series AAS Load condition , The following section shows the load of resources in different dimensions from high to low , Default to SQL Dimension based .
In the upper part, you can see the load of time periods , The proportion of each resource , For example, the blue color in the picture shows CPU, The most important one is the number of cores in the current instance specification (max Vcores: 32), If AAS The value exceeds what the instance has CPU Check the number , We know that the current instance load is out of limit , chart 3 The indicated load is always at 10 about , lower than Max Vcores 32, You can know that the overall load of the database is at a healthy level .
Where do you know the source of these loads ? You can go through the graph 3 The following sections see the corresponding SQL, And each SQL Contributed AAS The proportion , For example, you can see the first one in the figure SQL All orange , The value is 1.7056, This value indicates that within a given time period , The average session that this statement has is 1.7 Time . The main thing is to wait Lock resources , This shows that the bottleneck of the statement lies in the lock .
So we notice the first statement AAS The highest contribution , And the bottleneck of waiting lies in the lock , According to the figure 4 Abstract methodology of database tuning , It solves two problems “ Narrow the scope of ” and “ Positioning bottlenecks ” Two questions :
chart 4. performance tuning 4 A step
Popular point theory , That is to say, the following two problems have been solved :
- Which? SQL It has the greatest impact on the load of the instance at a specific time
- these SQL Why slow
And how to implement optimization , And how to verify the optimization effect , It will be described in the following articles .
USE CASE1: Quickly optimize the overall load situation
80 20 The same rule applies to databases ,80% All loads are caused by 20% Of SQL produce , That is to say, as long as we optimize this 20% Of SQL We're done 80% Optimization of , Think further , If 20% Medium 20%, That is to say 4%, This part of optimization can be completed 80%*80%=64% The job of . So in many scenarios , Optimize several of the header SQL Can complete most optimization work .
chart 5.CPU 100% Problem location
chart 4 We can see , Example CPU Usage has been 100%, In case of congestion, it will drop to single digits instantly . We watched for an hour AAS data , See the following single Select Of SQL The average of AAS by 78, Far more than examples 8C Specifications , So just optimize this one SQL, The problem of this example can be basically solved .
Pass diagram 4 Of SQL“ analysis ” function , We can quickly find common problems according to the execution plan SQL The reason for the slow , Including missing index 、 Parameter type conversion 、 Inaccurate statistical information .
USE CASE2: Find out the reason why the database response time is slow in a specific time period
This kind of scene is also a classic scene , The database as a whole may be at a healthy level for a long time , But at the peak of business or at a specific time , The database load pressure is high , Business side SQL Slower scenes . Usually , Most databases only have some indicator dimension monitoring , For example, universal CPU、 The Internet 、IO. Or engine side indicators , Usually we can guess the approximate range through these indicators , But it is difficult to locate specific statements , adopt AAS, We can locate the statements that cause database problems at a specific time by looking at the high load for a specific time period , Pictured 6 Shown :
chart 6. High load at a specific time
Pass diagram 6, We can see that again specific 2 There are burrs with sudden performance within minutes , We can enlarge the time range by dragging and dropping the mouse , Get as shown 7 The results shown are
chart 7. After dragging, it is obvious that two leads to high AAS The sentence of
Pass diagram 7, We can quickly locate two statements that cause performance glitches , And notice that the waiting types are Lock And Tran Log IO, Thus, according to figure 4 The optimization theory of , We can preliminarily judge that the logs are generated by a large number of update operations IO load , And lock waiting is caused by lock blocking between these statements . This can significantly reduce tuning costs .
Summary
Use performance insights , On the cloud, we can do it without any extra cost , Quick view of the overall load , View load details , And locate the corresponding... Of different loads SQL, This can help us quickly solve the database performance problems in the cloud 、 And regularly tune the overall load .
And more importantly , Performance insight is free !!! Alibaba cloud RDS for SQL Server Full range available :-)