当前位置:网站首页>Introduction to async profiler
Introduction to async profiler
2022-06-22 20:21:00 【Fenglibin】
1、 Introduce
Async-profiler It has little impact on system performance Java Sampling analyzer , Its implementation is based on HotSpot Peculiar API, Through these unique API Collect stack traces and trace memory allocation , So it can be compared with OpenJDK、Oracle JDK And others based on HotSpot JVM Of Java Applications work together at runtime .
Github Project link address :https://github.com/jvm-profiling-tools/async-profiler
Async-profiler You can track the following types of events :
- CPU cycle ;
- Hardware and software performance counters , Such as cache miss 、 Branch miss 、 Page error 、 Context switching, etc ;
- Java Allocation in heap ;
- Satisfied lock attempts , Include Java Object monitor and reentrant lock ;
Supported platforms
- Linux / x64 / x86 / ARM / AArch64
- macOS / x64
Be careful :macOS Analysis is limited to user space code .
2、CPU Performance analysis
In this mode ,profiler Collect stack trace example , These include Java Method 、native call 、JVM Code and kernel functions .
In order to accurately generate Java and native Exact performance report of the code , A common method is to receive perf_events Generated call stack , And connect them with AsyncGetCallTrace The generated call stack . Besides Async-profiler It also provides a way to AsyncGetCallTrace Some cases of failure , The solution of recovering stack trace .
And converting addresses to Java Method name Java Compared with agents , Use perf_eventst The method has the following advantages :
- It applies to older Java edition , Because it doesn't need -XX:+PreserveFramePointer, This parameter is only in JDK 8u60 And later ;
- No need to introduce -XX:+PreserveFramePointer, Because it may lead to high performance overhead , In rare cases it may be as high as 10%;
- There is no need to generate a mapping file to Java The code address maps to the method name ;
- Using the interpreter frame ;
- There is no need to generate for further analysis perf.data file ;
3、 Heap memory allocation analysis
async-profiler The analysis technology used has little impact on the performance of the system , It's not like bytecode detection or DTrace Detection may have a great impact on system performance . It also does not affect escape analysis or prevent JIT Optimize , Such as distribution elimination ,async-profiler Only the actual heap allocation is measured .
The analyzer has TLAB(Thread Local Allocation Buffer, That is, the thread allocates the cache locally ) Driven sampling function , It depends on HotSpot Specific callbacks to receive the following two TLAB notice :
- The newly created TLAB When allocating objects in ;
- stay TLAB When allocating objects on an external slow path .
This means that the parser does not apply to every TLAB Allocate for calculation , And will only calculate every N kB The distribution of , among N yes TLAB The average size of . This makes heap sampling very lightweight , It is also suitable for production environment . Although this collection method may also lead to incomplete data collection , But according to practical experience , This collection method usually reflects the top-level allocation source .
The sampling interval can be -i Options to adjust , for example ,-i 500k Will be divided equally among 500kb Space to get a sample . however , Less than TLAB The size of the interval will not take effect .
Similar to using a similar method Java Task control is different ,async-profiler Unwanted Java Flight Recorder Or any JDK Other business features of , It is based entirely on open source technology , And with OpenJDK Working together .
notes : If it is necessary to collect TLAB Information about ,JDK The minimum version requirement for is 7u40, Greater than or equal to these versions JDK Only then TLAB Callback function .
The heap analyzer requires HotSpot Debug symbols ,Oracle JDK They have been embedded in libjvm.so in , however OpenJDK At build time , Packed in a separate package , If you want to Debian/Ubuntu Installation on OpenJDK Debug symbols , Please run :
apt install openjdk-8-dbgOr for OpenJDK stay CentOS、RHEL And others based on RPM On the distribution of , This can be used debuginfo-install To install :
debuginfo-install java-1.8.0-openjdk4、Wall-clock analysis
Options -e wall tell async-profiler Average sampling of all threads in a given time period , You can run 、 Dormant or blocked threads for sampling , If you need to analyze the application startup time , You can use this option .
stay per-thread In mode ,Wall-clock Analysis can play a better role , By joining -t Parameter to enable this mode , Example :
./profiler.sh -e wall -t -i 5ms -f result.svg 89835、 compile
compile async-profiler, The following conditions are required :
- JAVA HOME environment variable , And point to JDK The installation path ;
- GCC( It can be done by apt install gcc And so on ).
And then through make Order to pack , The compiled agent binaries will be located in the build subdirectory , meanwhile , Small applications that can load agents into the target process jattach Will also compile to build Subdirectory .
6、 Basic usage
from Linux4.6 Start , If you need to use non root In a user initiated process perf_events, Capture information about the kernel call stack , Two system runtime variables need to be set , have access to sysctl Or set them as follows :
echo 1 > /proc/sys/kernel/perf_event_paranoid
echo 0 > /proc/sys/kernel/kptr_restrictasync-profiler adopt profiler.sh Script to start , And pass the command to the application that needs to be analyzed , A typical workflow :
- start-up Java Applications ;
- Attach the agent and start analyzing ;
- Run performance scenarios ;
- Stop analyzing .
The output of the agent ( Include analysis results ) Will be displayed in Java In the standard output of the application .
Example :
$ jps
9234 Jps
8983 Computey
$ ./profiler.sh start 8983
$ ./profiler.sh stop 8983It can also be done through -d(duration) Parameter specifies the time of analysis , In seconds :
$ ./profiler.sh -d 30 8983By default , The analysis frequency is 100Hz( Every time 10ms CPU Time ), Here is the output to Java Example output from the application terminal :
--- Execution profile ---
Total samples: 687
Unknown (native): 1 (0.15%)
--- 6790000000 (98.84%) ns, 679 samples
[ 0] Primes.isPrime
[ 1] Primes.primesThread
[ 2] Primes.access$000
[ 3] Primes$1.run
[ 4] java.lang.Thread.run
... a lot of output omitted for brevity ...
ns percent samples top
---------- ------- ------- ---
6790000000 98.84% 679 Primes.isPrime
40000000 0.58% 4 __do_softirq
... more output omitted ...This shows that the most affected method is Primes.isPrime, It is by Primes.primesThread Thread called .
7、 With Agent Way to start
If you need to JVM Analyze some code immediately after startup , Instead of waiting for the application to start up profiler.sh Script for analysis , Can be appended to the command line async-profiler Acting as agent . for example :
$ java -agentpath:/path/to/libasyncProfiler.so=start,file=profile.svg ...Agent Library is through JVMTI Parameter interface configuration , The format of the parameter string is in Source code Description in ,profiler.sh The script actually converts the command line arguments to this format .
for example :
-e alloc Will be converted to event=alloc;
-f profile.svg Will be converted to file=profile.svg wait .But some parameters are determined by profiler.sh The script directly handles . For example, parameters -d 5 Will lead to 3 Operations :
- Use start Command attach profiler agent;
- Sleep 5 second ;
- And then use stop Command to attach the proxy again .
8、 Look at the flame diagram
async-profiler Provide out of the box Flame graphics Support , Specify the parameters -o svg To dump the results of the analysis into an interactive... That can be viewed in all major browsers svg Images . in addition , If the destination file name starts with .SVG ending , Will automatically select SVG Output format .
Following commands :
$ jps
9234 Jps
8983 Computey
$ ./profiler.sh -d 30 -f /tmp/flamegraph.svg 8983The following flame diagram may be generated :

9、 Analysis Option parameters
Here is profiler.sh A complete list of command line options that the script accepts :
start - Start the analysis in semi-automatic mode , That is, when you explicitly call stop Before the command , The analyzer will always run ;
resume - Start or resume a previously stopped profiling session , All the data collected above are still valid , Profiling options are not preserved between sessions , Should be specified again ;
stop - Stop the analysis and print the report ;
status - Print analysis status : Whether the analyzer is active and for how long ;
list - Displays a list of available analysis events , This option still requires PID, Because the supported events may be caused by JVM Versions vary ;
-d N - Analysis duration , In seconds . If not provided start、resume、stop or status Options , The profiler will run within the specified time period , And then automatically stop , Example :./profiler.sh - d 30 8983
-e event - Specify the events to analyze , Such as :cpu、alloc、lock、cache misses etc. . Use list Parameter to view a complete list of available events .
In distribution (alloc) Analysis mode , The top frame of each call trace is the class of the assigned object , A counter is a record in the heap ( Already allocated TLAB or TLAB Total size of objects other than ).
In the lock (lock) In analysis mode , The top frame is a lock / Monitor class , The counter is entering this lock / Number of nanoseconds required by the monitor .
Linux Two special event types are supported on : Hardware breakpoints and kernel trace points :
-e mem:<func>[:rwx] In function <func> Set read at / Write / Execute breakpoint .mem The format of the event is the same as perf-record identical . The execution breakpoint can also be specified by the function name , for example -e malloc Local... Will be tracked malloc All calls to functions ;
-e trace:<id> Set kernel trace points . You can specify the trace point symbol name , for example -e syscalls:sys_enter_open All open system calls will be tracked ;
-i N - Followed by ms( millisecond )、us( Microsecond ) or s( second ), Set the analysis interval in nanoseconds or other units . Calculation only CPU Time of activity ,CPU Do not collect samples when free , The default value is 10000000(10ms).
Example :./profiler.sh - i 500us 8983
-j N - Set up Java Stack analysis depth . If N Greater than default 2048, This option will be ignored .
Example :./profiler.sh - j 30 8983
-b N - Set the frame buffer size , To the buffer that should hold Java Method id The quantity of is in . If a message about insufficient frame buffer size is received , Please increase this value from the default value , Example :./profiler.sh - b 5000000 8983
-t - Analyze each thread individually , Each stack trace will end in a frame representing a single thread , Example :./profiler.sh - t 8983
-s - Print simple class names instead of FQN(Full qulified name Full class name );
-g - Print method signature ;
-a - By adding _[j] Suffixes to annotate Java Method name ;
-o fmt - Specify the information to dump at the end of the analysis .fmt It can be one of the following options :
summary - Dump basic configuration Statistics ;
traces[=N] - Dump call trace ( most N Samples );
flat[=N] - dump flat profile( The front with the most calls N A way );
jfr - With Java Mission Control Readable Java Flight Recorder Format dump Events , This does not require enabling JDK Business functions ;
collapsed[=C] - With FlameGraph The format used by the script dumps the results of the call trace , This is a collection of call stacks , Each line is a semicolon separated list of frames , Followed by a counter .
svg[=C] - Generate svg Flame diagram in format .
tree[=C] - With HTML Format generate call tree .
--reverse This option will generate a backtrace view .
C Is the counter type :
samples - A counter is a number of samples for a given trace ;
total - The counter is the total value of the metrics collected , For example, total allocated size .
Summary , Tracing and unrolling can be combined .
The default format is summary,traces=200,flat=200.
--title TITLE,--width PX,--height PX,-- minwidth PX,--reverse -FlameGraph Parameters ;
Example :./profiler.sh - f profile.svg--title " Example CPU The configuration file " --minwidth 0.58983
-f FILENAME - The file name to dump the configuration file information to .
%p - Be extended to the target JVM Of PID;
%t - Timestamp to the time of command invocation .
Example : ./profiler.sh -o collapsed -f /tmp/traces-%t.txt 8983
--all-user - Include only user mode events . When kernel analysis is affected by perf_event_paranoid When setting limits , This option is very useful .
--all-kernel Indicates that only kernel mode events are included .
--sync-walk - Preferred synchronization JVMTI Stack walker, instead of AsyncGetCallTrace. This option can improve the analysis JVM Runtime functions ( Such as VMThread::execute、G1CollectedHeap::humongus_obj_allocate etc. ) when Java Accuracy of stack trace , Unless you are absolutely certain , Otherwise do not use ! If not used properly , This mode will result in JVM collapse !
-v,--version - Print the version of the profiler Library , If you specify PID, Get the version of the library loaded into the given process .10、 Analyze the... In the container Java Applications
It can be analyzed from the inside of the container and the host system Docker or LXC Running in a container Java process .
When analyzing from the host ,pid Should be in the host namespace Java process ID. Use ps aux | grep java or docker top<container> Find the process ID.
async-profiler Should be run from the host by a privileged user - It will automatically switch to the correct pid/ Loading namespaces , And change the user credentials to match the target process . Also make sure that the target container can be accessed through the same absolute path as on the host libasyncProfiler.so.
By default ,Docker container Restricted pair perf_event_open syscall The interview of . therefore , To allow analysis in the container , You need to modify seccomp The configuration file , Or use --security-opt seccomp=unconfined Option to disable it completely . Besides , You may need to ,--cap-add SYS_ADMIN.
perhaps , If it cannot be changed Docker To configure , You can return to -e itimer Analysis mode , see also Troubleshooting .
11、 Limit
- In most Linux In the system ,perf-events The maximum capture stack depth is 127 Call stack of frames , In the latest Linux On the kernel , This can be used sysctl kernel.perf_event_max_stack Or by writing /proc/sys/kernel/perf_event_max_stack File to configure ;
- Profiler Assign to each thread of the target process 8kB Performance event buffer , When running under an unprivileged user , Please make sure /proc/sys/kernel/perf_event_mlock_kb It's big enough ( Greater than 8* Total number of threads ), Otherwise, the message will be printed “perf_event mmap failed:Operation not allowed”, And will not collect native stack traces ;
- No guarantee perf_events overflow Signals are passed to in a way that ensures that no other code is running Java Threads , This means that in some rare cases , The captured Java The stack may be different from the captured native ( user + kernel ) Stack mismatch ;
- On the stack Java Before frame , You will not see non Java frame , for example , If start_thread call JavaMain, then Java The code starts running , The first two frames will not be seen in the generated stack . On the other hand , You will see Java Code calls non Java frame ( User and kernel );
- If -XX:MaxJavaStackTraceDepth Parameter set to 0 Or negative , Will not collect Java Stack ;
- Analysis interval is too short , May be clone() And other methods that occupy more system resources , Therefore, it cannot serve the purpose of data collection and analysis , see also #97 issue, The solution is simply to increase the spacing ;
- If in JVM Agent not loaded at startup ( By using -agentpath Options ), Strongly recommended -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints JVM sign , Without these signs , The analyzer still works , But the results may not be accurate , for example , without -XX:+DebugNonSafepoints, It is very likely that the simple inline method will not appear in the profile . Attach when the agent is running CompiledMethodLoad when ,JVMTI Event enable debug information , But only for methods compiled after the event is opened ;
12、 common problem
1)Failed to change credentials to match the target process: Operation not permitted
because HotSpot Limitations of dynamic attachment mechanism ,Profiler Must be related to the target JVM Users with identical process owners ( And groups ) function , If the profiler is run by another user , It will attempt to automatically change the current user and group , For the root user, this may succeed , But not for other users , This leads to the above error .
2)Could not start attach mechanism: No such file or directory
Profiler Unable to get UNIX Domain socket and destination JVM Establish communication , It usually happens in one of the following situations :
- socket socket connection /tmp/.java_pidNNN Be deleted Attach, Probably because of /tmp/ Under the table of contents , Deleted by other system cleanup programs , The check can be done by the following command :
lsof -p PID | grep java_pidIf it lists a socket file , But the file does not exist , So this is the problem described ;
- JVM With -XX:+DisableAttachMechanism Option to start the ;
- Java Process /tmp The directory is physically related to shell Of /tmp The catalog is different , because Java Is in a container or chroot Running in the environment .jattach Try to solve this problem automatically , However, it may lack the necessary permissions to do so, which can be checked by the following command :
strace build/jattach PID properties- JVM Busy , Unable to reach the safe point , for example ,JVM A long garbage collection is in progress , Check current JVM Whether busy commands :
kill-3 PIDWorking well JVM The process should print thread dumps and heap information in its console ;
3)Failed to inject profiler into <pid>
Established and objective JVM The connection of , but JVM Unable to load profiler shared library , Make sure JVM The user of the process has access to libasyncProfiler.so Authority , The absolute path to access is exactly the same . For more information , Please see the #78 Issue.
4)Perf events unavailble. See stderr of the target process.
perf_event_open() System call failed , The error message is printed to the destination JVM In the error stream .
Typical reasons include :
- /proc/sys/kernel/perf_event_paranoid Set to restricted mode (>=2);
- seccomp Disable... In the container perf_event_open API;
- The operating system runs under a hypervisor that does not virtualize performance counters ;
- The current system does not support perf_event_open API, for example WSL.
If the configuration cannot be changed , You can return to using -e itimer Analysis mode . It is similar to cpu Pattern , But you don't need performance event support , But there is a disadvantage , Unable to collect information about kernel stack traces ;
5)No AllocTracer symbols found. Are JDK debug symbols installed?
May need to be installed with OpenJDK Debug the package of symbols , For more information , See allocation analysis .
Be careful , except HotSpot( for example Zing)JVM Beyond support , The rest of the JVM Allocation analysis is not supported .
6)VMStructs unavailable. Unsupported JVM?
JVM Shared library not exported gHotSpotVMStructs* Symbol - Obviously this is not a HotSpot JVM. Sometimes , Incorrectly constructed JDK It may also result in the same message ( Please see the 218 Issue), In these cases , install JDK Debugging symbols can solve problems ;
7)Could not parse symbols due to the OS bug
Async-profiler Cannot parse non Java Function name , because /proc/[pid]/maps The content in is corrupt , as everyone knows , Use Linux kernel 5.x function Ubuntu when , This problem occurs in the container . This is an operating system error , see also https://bugs.launchpad.net/Ubuntu/+source/Linux/+bug/1843018.
8、[frame_buffer_overflow]
This message in the output indicates that there is not enough space to store all call traces , Consider using -b Option to increase the framebuffer size .
边栏推荐
- 科技云报道:东数西算不止于“算”,更需“新存储”
- Pit of undefined reference
- 【深入理解TcaplusDB技术】入门Tcaplus SQL Driver
- Connect function usage of socket
- 年中大促 | 集成无忧,超值套餐 6 折起
- 自己写了一个telnet命令
- [in depth understanding of tcapulusdb technology] tcapulusdb model
- Redis中的Multi事务
- 【Proteus仿真】8x8Led点阵数字循环显示
- [deeply understand tcapulusdb technology] cluster management operation
猜你喜欢

如何低成本快速搭建企业知识库?

How should programmers look up dates

Async-profiler介绍

【深入理解TcaplusDB技术】入门Tcaplus SQL Driver

年中大促 | 集成无忧,超值套餐 6 折起

【深入理解TcaplusDB知识库】部署TcaplusDB Local版常见问题

完全背包如何考慮排列問題

程序员应该怎么查日期

An IPFs enabled email - skiff
![[in depth understanding of tcapulusdb technology] introduction tcapulusdb problem summary](/img/2b/3ab5e247ac103728b4d3579c3c5468.png)
[in depth understanding of tcapulusdb technology] introduction tcapulusdb problem summary
随机推荐
【Proteus仿真】74LS138译码器流水灯
Tree, forest and transformation of binary tree
Be careful with MySQL filesort
Web technology sharing | [Gaode map] to realize customized track playback
Multi transactions in redis
请你描述下从浏览器上输入一个url到呈现出页面的整个过程。
【深入理解TcaplusDB技术】TcaplusDB 表管理——清理表
Google | ICML 2022: sparse training status in deep reinforcement learning
socket的connect函数用法
[in depth understanding of tcapulusdb technology] tcapulusdb regular documents
client-go gin的简单整合十一-Delete
mysql filesort要小心
JWT简介
自己写了一个telnet命令
【深入理解TcaplusDB技术】单据受理之建表审批
芯和半导体“射频EDA/滤波器设计平台”闪耀IMS2022
How should programmers look up dates
Random talk on redis source code 119
Mysql database knowledge points (III)
[graduation season] step by step? Thinking about four years of University by an automation er