当前位置:网站首页>Solution to oom exceptions caused by improper use of multithreading in production environment (supreme Collection Edition)
Solution to oom exceptions caused by improper use of multithreading in production environment (supreme Collection Edition)
2022-07-25 20:32:00 【Dragon back ride Shi】

Description of the accident
2022 year 07 month 12 Japan 9 spot 32 Start a small number of users to access App Home page access exception will appear , To 10 spot 20 The sub home page service is not available on a large scale ,10 spot 36 Solve problems by problems .
The whole process
9:58 Discovery alarm , At the same time, it is found that the network is busy on the feedback home page of the group , Considering that the store list service was launched a few nights ago , So consider rolling back the code to deal with the problem urgently .
10:07 Start contacting XXX View and solve problems .
10:36 Code rollback finished , Service is back to normal .
The root cause of the accident
Accident code simulation :
public static void test() throws InterruptedException, ExecutionException {
Executor executor = Executors.newFixedThreadPool(3);
CompletionService<String> service = new ExecutorCompletionService<>(executor);
service.submit(new Callable<String>() {
@Override
public String call() throws Exception {
return "HelloWorld--" + Thread.currentThread().getName();
}
});
} The root is ExecutorCompletionService It didn't call take、poll Method .
The correct wording is as follows :
public static void test() throws InterruptedException, ExecutionException {
Executor executor = Executors.newFixedThreadPool(3);
CompletionService<String> service = new ExecutorCompletionService<>(executor);
service.submit(new Callable<String>() {
@Override
public String call() throws Exception {
return "HelloWorld--" + Thread.currentThread().getName();
}
});
service.take().get();
}One line of code causes a murder , And it's not easy to find . because OOM It's a process of slow memory growth , A little carelessness will ignore . If the number of calls to this code block is small , It's likely that a thunderstorm will occur in a few days or even months .
The operator rollback or restart the server is indeed the fastest way . But if you don't analyze it quickly afterwards OOM Code for , And unfortunately, the rollback version also comes with OOM Code , It's sad . As I said just now , The flow is small 、 Rolling back or restarting can release memory ; But when the flow is large , Unless you roll back to the normal version , otherwise GG.
Explore the root cause of the problem
For better understanding ExecutorCompletionService Of “ tricks ”, We use it ExecutorService For comparison , It can make us better understand what scenarios to use ExecutorCompletionService.
First look at ExecutorService Code ( It is recommended to run by yourself after downloading )
public static void test1() throws Exception{
ExecutorService executorService = Executors.newCachedThreadPool();
ArrayList<Future<String>> futureArrayList = new ArrayList<>();
System.out.println(" The company asked you to inform everyone of the dinner You drive to pick someone up ");
Future<String> future10 = executorService.submit(() -> {
System.out.println(" President : I have a large size at home I've had slow diarrhea recently To squat 1 It takes hours to get out Come and pick me up later ");
TimeUnit.SECONDS.sleep(10);
System.out.println(" President :1 Hour I'm finished with the tuba . You pick it up ");
return " The president has finished the tuba ";
});
futureArrayList.add(future10);
Future<String> future3 = executorService.submit(() -> {
System.out.println(" Research and development : I have a large size at home I'm faster To squat 3 You can come out in minutes Come and pick me up later ");
TimeUnit.SECONDS.sleep(3);
System.out.println(" Research and development :3 minute I'm finished with the tuba . You pick it up ");
return " The research and development is finished, and the large size ";
});
futureArrayList.add(future3);
Future<String> future6 = executorService.submit(() -> {
System.out.println(" Middle management : I have a large size at home To squat 10 You can come out in minutes Come and pick me up later ");
TimeUnit.SECONDS.sleep(6);
System.out.println(" Middle management :10 minute I'm finished with the tuba . You pick it up ");
return " The middle management has finished the big size ";
});
futureArrayList.add(future6);
TimeUnit.SECONDS.sleep(1);
System.out.println(" It's all over , Wait to answer .");
try {
for (Future<String> future : futureArrayList) {
String returnStr = future.get();
System.out.println(returnStr + ", You pick him up ");
}
Thread.currentThread().join();
} catch (Exception e) {
e.printStackTrace();
}
}Three tasks , The execution time of each task is 10s、3s、6s . adopt JDK Thread pool submit Submit these three Callable Type of task .
First step : The main thread submits three tasks to the thread pool , Return the corresponding Future Put it in List Save it inside , And then execute “ It's all over , Wait to answer .” This line outputs the statement ;
The second step : Execute in a loop
future.get()operation , Block waiting .
The final result is as follows :

Inform the president first , Also take the president first It's enough to wait 1 Hours , After receiving the president, go to pick up R & D and middle management , Even though they've already done it , I have to wait for the president to go to the bathroom ~~
The most time-consuming -10s Asynchronous tasks enter first list perform . So get this in the loop 10 s When the mission results ,get The operation will be blocked all the time , until 10s The asynchronous task is completed . Even if 3s、5s The task of has long been completed, but it must also be blocked , wait for 10s Mission accomplished .
See here , In particular, students who do gateway business may resonate . Generally speaking , gateway RPC Will call downstream N Multiple interfaces , Here's the picture :

If they all follow ExecutorService This way, , And it happens that the interfaces called by the first few tasks take a long time , While blocking waiting , Then it's more sad . therefore ExecutorCompletionService Come out in response to the situation . It acts as a reasonable controller of task threads ,“ Mission Planner ” Is worthy of its name .
The same scene ExecutorCompletionService Code :
public static void test2() throws Exception {
ExecutorService executorService = Executors.newCachedThreadPool();
ExecutorCompletionService<String> completionService = new ExecutorCompletionService<>(executorService);
System.out.println(" The company asked you to inform everyone of the dinner You drive to pick someone up ");
completionService.submit(() -> {
System.out.println(" President : I have a large size at home I've had slow diarrhea recently To squat 1 It takes hours to get out Come and pick me up later ");
TimeUnit.SECONDS.sleep(10);
System.out.println(" President :1 Hour I'm finished with the tuba . You pick it up ");
return " The president has finished the tuba ";
});
completionService.submit(() -> {
System.out.println(" Research and development : I have a large size at home I'm faster To squat 3 You can come out in minutes Come and pick me up later ");
TimeUnit.SECONDS.sleep(3);
System.out.println(" Research and development :3 minute I'm finished with the tuba . You pick it up ");
return " The research and development is finished, and the large size ";
});
completionService.submit(() -> {
System.out.println(" Middle management : I have a large size at home To squat 10 You can come out in minutes Come and pick me up later ");
TimeUnit.SECONDS.sleep(6);
System.out.println(" Middle management :10 minute I'm finished with the tuba . You pick it up ");
return " The middle management has finished the big size ";
});
TimeUnit.SECONDS.sleep(1);
System.out.println(" It's all over , Wait to answer .");
// submitted 3 Asynchronous tasks )
for (int i = 0; i < 3; i++) {
String returnStr = completionService.take().get();
System.out.println(returnStr + ", You pick him up ");
}
Thread.currentThread().join();
} The results are as follows :
This time it's relatively efficient . Although inform the president first , But according to the speed at which everyone goes to the tuba , Who pulls first, who picks up first , You don't have to wait for the oldest president ( The first one is recommended in real life , Without waiting for the consequences of the President emmm Ha ha ha ).
Put them together and compare the output results :
The difference between the two pieces of code is very small When you get results ExecutorCompletionService Used :
completionService.take().get();
Why use take() And then again get() Well ?
Let's look at the source code :
CompletionService Interface and its implementation class
1、ExecutorCompletionService yes CompletionService Implementation class of interface 
2、 Then follow ExecutorCompletionService Construction method of .
You can see that the input parameter needs to pass a thread pool object , The default queue used is LinkedBlockingQueue, However, there is another constructor that can specify the queue type , Here are two pictures , There are two constructors . Default LinkedBlockingQueue Construction method of .
Construction method of optional queue type :
3、Submit There are two ways to submit tasks , All have return values , The first one is used in our example Callable Method of type .
4、 contrast ExecutorService and ExecutorCompletionService Of submit Method can see the difference .


5、 The difference is QueueingFuture.
What is the function of this , Let's keep going :
QueueingFutureInherited fromFutureTask, And the position marked by the red line , Rewrote done() Method ;hold task Put it in
completionQueueInside the queue . When the task is completed ,task Will be put in the queue ;At the moment ,
completionQueueIn the queue task All havedone()It's done task. And this task That's what we got one by one future result ;If the
completionQueueOf task Method , Will block waiting tasks . What we wait for must be finished future, We call.get()Method You can get results right away .

See here , I believe the big guys should understand more or less :
We are using
ExecutorService submitAfter submitting a task, you need to pay attention to the return of each task future. HoweverCompletionServiceFor these future Tracked , And rewritten done Method , Let you wait completionQueue It must be finished in the queue task;As gateway RPC layer , We don't have to drag down all requests because of the slow response of an interface , Can be used in business scenarios that handle the fastest response
CompletionService.
But notice ! It is also the core problem of this accident .
Only called ExecutorCompletionService Below 3 When any one of the methods , Block... In the queue task The execution result will be removed from the queue , Free heap memory .
Because the business does not need to use the return value of the task , There is no call take、poll Method , As a result, heap memory is not freed . Heap memory will continue to grow as the amount of calls increases .

therefore , There is no need to use the return value of the task in the business scenario , Don't use it for nothing CompletionService. If used , Remember to remove from the blocking queue task Execution results , avoid OOM!
summary
Know the cause of the accident , Let's summarize the methodology . After all, Confucius said : Introspection , I often think about my past , Be good at cultivating your body !
Before going online
Strict code review habit , Be sure to give it to back People go to see it , After all, you can't see the problem with your own code , I believe every program ape has this confidence ;
Online record : Note: the last package version that can be rolled back ( Leave yourself a way back );
Confirm the rollback before going online , Whether the business can be degraded . If it cannot be degraded , We must strictly lengthen the monitoring cycle of this launch .
After the launch
Continue to pay attention to memory growth ( This part can easily be ignored , People pay less attention to memory than CPU Usage rate );
Continuous attention CPU Usage growth
GC situation 、 Whether the number of threads increases 、 Whether there are frequent Full GC etc. ;
Pay attention to service performance alarm ,TP99、999 、MAX Whether there is a significant increase .
边栏推荐
- Mobile web layout method
- Network protocol: TCP part2
- Advantages of network virtualization of various manufacturers
- 【高等数学】【3】微分中值定理与导数的应用
- Docker builds redis cluster
- Vivo official website app full model UI adaptation scheme
- [today in history] June 29: SGI and MIPS merged; Microsoft acquires PowerPoint developer; News corporation sells MySpace
- 【高等数学】【5】定积分及应用
- [today in history] July 7: release of C; Chrome OS came out; "Legend of swordsman" issued
- Go language go language built-in container
猜你喜欢

MySQL 日期【加号/+】条件筛选问题

移动web布局方法

4everland storage node portal network design
![[today in history] July 2: BitTorrent came out; The commercial system linspire was acquired; Sony deploys Playstation now](/img/7d/7a01c8c6923077d6c201bf1ae02c8c.png)
[today in history] July 2: BitTorrent came out; The commercial system linspire was acquired; Sony deploys Playstation now

Google pixel 6A off screen fingerprint scanner has major security vulnerabilities

FanoutExchange交换机代码教程

Interpretation of filter execution sequence source code in sprigboot
![[today in history] June 28: musk was born; Microsoft launched office 365; The inventor of Chua's circuit was born](/img/bf/09ccf36caec099098a22f0e8b670bd.png)
[today in history] June 28: musk was born; Microsoft launched office 365; The inventor of Chua's circuit was born

各厂商网络虚拟化的优势

Remote monitoring solution of intelligent electronic boundary stake Nature Reserve
随机推荐
QQ是32位还是64位软件(在哪看电脑是32位还是64位)
Apache Mina framework "suggestions collection"
The uniapp project starts with an error binding Node is not a valid Win32 Application ultimate solution
[Infographics Show] 248 Public Domain Name
【NOI模拟赛】字符串匹配(后缀自动机SAM,莫队,分块)
During the interview, I was asked how to remove the weight of MySQL, and who else wouldn't?
Arrow parquet
[today in history] July 8: PostgreSQL release; SUSE acquires the largest service provider of k8s; Activision Blizzard merger
[today in history] July 7: release of C; Chrome OS came out; "Legend of swordsman" issued
Question and answer 47: geeks have an appointment - the current monitoring system construction of CSC
【高等数学】【6】多元函数微分学
What is cluster analysis? Categories of cluster analysis methods [easy to understand]
CarSim simulation quick start (XV) - ADAS sensor objects of CarSim sensor simulation (1)
Card link
[today in history] July 5: the mother of Google was born; Two Turing Award pioneers born on the same day
【TensorRT】动态batch进行推理
Link list of sword finger offer question bank summary (III) (C language version)
Proxy实现mysql读写分离
Aircraft PID control (rotor flight control)
雷达水位计的工作原理及安装维护注意事项