当前位置:网站首页>Introduction to async profiler

Introduction to async profiler

2022-06-22 20:21:00 Fenglibin

1、 Introduce

Async-profiler It has little impact on system performance Java Sampling analyzer , Its implementation is based on HotSpot Peculiar API, Through these unique API Collect stack traces and trace memory allocation , So it can be compared with OpenJDK、Oracle JDK And others based on HotSpot JVM Of Java Applications work together at runtime .

Github Project link address :https://github.com/jvm-profiling-tools/async-profiler

Async-profiler You can track the following types of events :

  • CPU cycle ;
  • Hardware and software performance counters , Such as cache miss 、 Branch miss 、 Page error 、 Context switching, etc ;
  • Java Allocation in heap ;
  • Satisfied lock attempts , Include Java Object monitor and reentrant lock ;

Supported platforms

  • Linux / x64 / x86 / ARM / AArch64
  • macOS / x64

Be careful :macOS Analysis is limited to user space code .

2、CPU Performance analysis

In this mode ,profiler Collect stack trace example , These include Java Method 、native call 、JVM Code and kernel functions .

In order to accurately generate Java and native Exact performance report of the code , A common method is to receive perf_events Generated call stack , And connect them with AsyncGetCallTrace The generated call stack . Besides Async-profiler It also provides a way to AsyncGetCallTrace Some cases of failure , The solution of recovering stack trace .

And converting addresses to Java Method name Java Compared with agents , Use perf_eventst The method has the following advantages :

  • It applies to older Java edition , Because it doesn't need -XX:+PreserveFramePointer, This parameter is only in JDK 8u60 And later ;
  • No need to introduce -XX:+PreserveFramePointer, Because it may lead to high performance overhead , In rare cases it may be as high as 10%;
  • There is no need to generate a mapping file to Java The code address maps to the method name ;
  • Using the interpreter frame ;
  • There is no need to generate for further analysis perf.data file ;

3、 Heap memory allocation analysis

async-profiler The analysis technology used has little impact on the performance of the system , It's not like bytecode detection or DTrace Detection may have a great impact on system performance . It also does not affect escape analysis or prevent JIT Optimize , Such as distribution elimination ,async-profiler Only the actual heap allocation is measured .

The analyzer has TLAB(Thread Local Allocation Buffer, That is, the thread allocates the cache locally ) Driven sampling function , It depends on HotSpot Specific callbacks to receive the following two TLAB notice :

  • The newly created TLAB When allocating objects in ;
  • stay TLAB When allocating objects on an external slow path .

This means that the parser does not apply to every TLAB Allocate for calculation , And will only calculate every N kB The distribution of , among N yes TLAB The average size of . This makes heap sampling very lightweight , It is also suitable for production environment . Although this collection method may also lead to incomplete data collection , But according to practical experience , This collection method usually reflects the top-level allocation source .

The sampling interval can be -i Options to adjust , for example ,-i 500k Will be divided equally among 500kb Space to get a sample . however , Less than TLAB The size of the interval will not take effect .

Similar to using a similar method Java Task control is different ,async-profiler Unwanted Java Flight Recorder Or any JDK Other business features of , It is based entirely on open source technology , And with OpenJDK Working together .

notes : If it is necessary to collect TLAB Information about ,JDK The minimum version requirement for is 7u40, Greater than or equal to these versions JDK Only then TLAB Callback function .

The heap analyzer requires HotSpot Debug symbols ,Oracle JDK They have been embedded in libjvm.so in , however OpenJDK At build time , Packed in a separate package , If you want to Debian/Ubuntu Installation on OpenJDK Debug symbols , Please run :

apt install openjdk-8-dbg

Or for OpenJDK stay CentOS、RHEL And others based on RPM On the distribution of , This can be used debuginfo-install To install :

debuginfo-install java-1.8.0-openjdk

4、Wall-clock analysis
Options -e wall tell async-profiler Average sampling of all threads in a given time period , You can run 、 Dormant or blocked threads for sampling , If you need to analyze the application startup time , You can use this option .

stay per-thread In mode ,Wall-clock Analysis can play a better role , By joining -t Parameter to enable this mode , Example :

./profiler.sh -e wall -t -i 5ms -f result.svg 8983

5、 compile
compile async-profiler, The following conditions are required :

  • JAVA HOME environment variable , And point to JDK The installation path ;
  • GCC( It can be done by apt install gcc And so on ).

And then through make Order to pack , The compiled agent binaries will be located in the build subdirectory , meanwhile , Small applications that can load agents into the target process jattach Will also compile to build Subdirectory .

6、 Basic usage

from Linux4.6 Start , If you need to use non root In a user initiated process perf_events, Capture information about the kernel call stack , Two system runtime variables need to be set , have access to sysctl Or set them as follows :

echo 1 > /proc/sys/kernel/perf_event_paranoid
echo 0 > /proc/sys/kernel/kptr_restrict

async-profiler adopt profiler.sh Script to start , And pass the command to the application that needs to be analyzed , A typical workflow :

  1. start-up Java Applications ;
  2. Attach the agent and start analyzing ;
  3. Run performance scenarios ;
  4. Stop analyzing .

The output of the agent ( Include analysis results ) Will be displayed in Java In the standard output of the application .

Example :

$ jps
9234 Jps
8983 Computey
$ ./profiler.sh start 8983
$ ./profiler.sh stop 8983

It can also be done through -d(duration) Parameter specifies the time of analysis , In seconds :

$ ./profiler.sh -d 30 8983

By default , The analysis frequency is 100Hz( Every time 10ms CPU Time ), Here is the output to Java Example output from the application terminal :

--- Execution profile ---
Total samples:           687
Unknown (native):        1 (0.15%)

--- 6790000000 (98.84%) ns, 679 samples
  [ 0] Primes.isPrime
  [ 1] Primes.primesThread
  [ 2] Primes.access$000
  [ 3] Primes$1.run
  [ 4] java.lang.Thread.run

... a lot of output omitted for brevity ...

          ns  percent  samples  top
  ----------  -------  -------  ---
  6790000000   98.84%      679  Primes.isPrime
    40000000    0.58%        4  __do_softirq

... more output omitted ...

This shows that the most affected method is Primes.isPrime, It is by Primes.primesThread Thread called .

7、 With Agent Way to start

If you need to JVM Analyze some code immediately after startup , Instead of waiting for the application to start up profiler.sh Script for analysis , Can be appended to the command line async-profiler Acting as agent . for example :

$ java -agentpath:/path/to/libasyncProfiler.so=start,file=profile.svg ...

Agent Library is through JVMTI Parameter interface configuration , The format of the parameter string is in Source code Description in ,profiler.sh The script actually converts the command line arguments to this format .

for example :

-e alloc Will be converted to event=alloc;
-f profile.svg Will be converted to file=profile.svg wait .

But some parameters are determined by profiler.sh The script directly handles . For example, parameters -d 5 Will lead to 3 Operations :

  1. Use start Command attach profiler agent;
  2. Sleep 5 second ;
  3. And then use stop Command to attach the proxy again .

8、 Look at the flame diagram

async-profiler Provide out of the box Flame graphics Support , Specify the parameters -o svg To dump the results of the analysis into an interactive... That can be viewed in all major browsers svg Images . in addition , If the destination file name starts with .SVG ending , Will automatically select SVG Output format .

Following commands :

$ jps
9234 Jps
8983 Computey
$ ./profiler.sh -d 30 -f /tmp/flamegraph.svg 8983

The following flame diagram may be generated :

9、 Analysis Option parameters

Here is profiler.sh A complete list of command line options that the script accepts :

start -  Start the analysis in semi-automatic mode , That is, when you explicitly call stop Before the command , The analyzer will always run ;
resume -  Start or resume a previously stopped profiling session , All the data collected above are still valid , Profiling options are not preserved between sessions , Should be specified again ;
stop -  Stop the analysis and print the report ;
status -  Print analysis status : Whether the analyzer is active and for how long ;
list -  Displays a list of available analysis events , This option still requires PID, Because the supported events may be caused by JVM Versions vary ;
-d N -  Analysis duration , In seconds . If not provided start、resume、stop or status Options , The profiler will run within the specified time period , And then automatically stop , Example :./profiler.sh - d 30 8983

-e event -  Specify the events to analyze , Such as :cpu、alloc、lock、cache misses etc. . Use list Parameter to view a complete list of available events .
 In distribution (alloc) Analysis mode , The top frame of each call trace is the class of the assigned object , A counter is a record in the heap ( Already allocated TLAB or TLAB Total size of objects other than ).
 In the lock (lock) In analysis mode , The top frame is a lock / Monitor class , The counter is entering this lock / Number of nanoseconds required by the monitor .
Linux Two special event types are supported on : Hardware breakpoints and kernel trace points :
-e mem:<func>[:rwx]  In function <func> Set read at / Write / Execute breakpoint .mem The format of the event is the same as perf-record identical . The execution breakpoint can also be specified by the function name , for example -e malloc Local... Will be tracked malloc All calls to functions ;
-e trace:<id>  Set kernel trace points . You can specify the trace point symbol name , for example -e syscalls:sys_enter_open All open system calls will be tracked ;

-i N -  Followed by ms( millisecond )、us( Microsecond ) or s( second ), Set the analysis interval in nanoseconds or other units . Calculation only CPU Time of activity ,CPU Do not collect samples when free , The default value is 10000000(10ms).
 Example :./profiler.sh - i 500us 8983

-j N -  Set up Java Stack analysis depth . If N Greater than default 2048, This option will be ignored .
 Example :./profiler.sh - j 30 8983

-b N -  Set the frame buffer size , To the buffer that should hold Java Method id The quantity of is in . If a message about insufficient frame buffer size is received , Please increase this value from the default value , Example :./profiler.sh - b 5000000 8983

-t -  Analyze each thread individually , Each stack trace will end in a frame representing a single thread , Example :./profiler.sh - t 8983
-s -  Print simple class names instead of FQN(Full qulified name Full class name );
-g -  Print method signature ;
-a -  By adding _[j] Suffixes to annotate Java Method name ;
-o fmt -  Specify the information to dump at the end of the analysis .fmt It can be one of the following options :
summary -  Dump basic configuration Statistics ;
traces[=N] -  Dump call trace ( most N Samples );
flat[=N] - dump flat profile( The front with the most calls N A way );

jfr -  With Java Mission Control Readable Java Flight Recorder Format dump Events , This does not require enabling JDK Business functions ;
collapsed[=C] -  With FlameGraph The format used by the script dumps the results of the call trace , This is a collection of call stacks , Each line is a semicolon separated list of frames , Followed by a counter .
svg[=C] -  Generate svg Flame diagram in format .
tree[=C] -  With HTML Format generate call tree .
--reverse  This option will generate a backtrace view .
C Is the counter type :
samples -  A counter is a number of samples for a given trace ;
total -  The counter is the total value of the metrics collected , For example, total allocated size .
 Summary , Tracing and unrolling can be combined .
 The default format is summary,traces=200,flat=200.

--title TITLE,--width PX,--height PX,-- minwidth PX,--reverse -FlameGraph Parameters ;
 Example :./profiler.sh - f profile.svg--title " Example CPU The configuration file " --minwidth 0.58983

-f FILENAME -  The file name to dump the configuration file information to .
%p -  Be extended to the target JVM Of PID;
%t -  Timestamp to the time of command invocation .
 Example : ./profiler.sh -o collapsed -f /tmp/traces-%t.txt 8983
--all-user -  Include only user mode events . When kernel analysis is affected by perf_event_paranoid When setting limits , This option is very useful .
--all-kernel  Indicates that only kernel mode events are included .
--sync-walk -  Preferred synchronization JVMTI Stack walker, instead of AsyncGetCallTrace. This option can improve the analysis JVM Runtime functions ( Such as VMThread::execute、G1CollectedHeap::humongus_obj_allocate etc. ) when Java Accuracy of stack trace , Unless you are absolutely certain , Otherwise do not use ! If not used properly , This mode will result in JVM collapse !
-v,--version -  Print the version of the profiler Library , If you specify PID, Get the version of the library loaded into the given process .

10、 Analyze the... In the container Java Applications

It can be analyzed from the inside of the container and the host system Docker or LXC Running in a container Java process .

When analyzing from the host ,pid Should be in the host namespace Java process ID. Use ps aux | grep java or docker top<container> Find the process ID.

async-profiler Should be run from the host by a privileged user - It will automatically switch to the correct pid/ Loading namespaces , And change the user credentials to match the target process . Also make sure that the target container can be accessed through the same absolute path as on the host libasyncProfiler.so.

By default ,Docker container Restricted pair perf_event_open syscall The interview of . therefore , To allow analysis in the container , You need to modify seccomp The configuration file , Or use --security-opt seccomp=unconfined Option to disable it completely . Besides , You may need to ,--cap-add SYS_ADMIN.

perhaps , If it cannot be changed Docker To configure , You can return to -e itimer Analysis mode , see also Troubleshooting .

11、 Limit

  • In most Linux In the system ,perf-events The maximum capture stack depth is 127 Call stack of frames , In the latest Linux On the kernel , This can be used sysctl kernel.perf_event_max_stack Or by writing /proc/sys/kernel/perf_event_max_stack File to configure ;
  • Profiler Assign to each thread of the target process 8kB Performance event buffer , When running under an unprivileged user , Please make sure /proc/sys/kernel/perf_event_mlock_kb It's big enough ( Greater than 8* Total number of threads ), Otherwise, the message will be printed “perf_event mmap failed:Operation not allowed”, And will not collect native stack traces ;   
  • No guarantee perf_events overflow Signals are passed to in a way that ensures that no other code is running Java Threads , This means that in some rare cases , The captured Java The stack may be different from the captured native ( user + kernel ) Stack mismatch ;     
  • On the stack Java Before frame , You will not see non Java frame , for example , If start_thread call JavaMain, then Java The code starts running , The first two frames will not be seen in the generated stack . On the other hand , You will see Java Code calls non Java frame ( User and kernel );    
  • If -XX:MaxJavaStackTraceDepth Parameter set to 0 Or negative , Will not collect Java Stack ;    
  • Analysis interval is too short , May be clone() And other methods that occupy more system resources , Therefore, it cannot serve the purpose of data collection and analysis , see also #97 issue, The solution is simply to increase the spacing ;    
  • If in JVM Agent not loaded at startup ( By using -agentpath Options ), Strongly recommended -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints JVM sign , Without these signs , The analyzer still works , But the results may not be accurate , for example , without -XX:+DebugNonSafepoints, It is very likely that the simple inline method will not appear in the profile . Attach when the agent is running CompiledMethodLoad when ,JVMTI Event enable debug information , But only for methods compiled after the event is opened ;

12、 common problem

1)Failed to change credentials to match the target process: Operation not permitted

because HotSpot Limitations of dynamic attachment mechanism ,Profiler Must be related to the target JVM Users with identical process owners ( And groups ) function , If the profiler is run by another user , It will attempt to automatically change the current user and group , For the root user, this may succeed , But not for other users , This leads to the above error .
    
2)Could not start attach mechanism: No such file or directory

Profiler Unable to get UNIX Domain socket and destination JVM Establish communication , It usually happens in one of the following situations :

  • socket socket connection /tmp/.java_pidNNN Be deleted Attach, Probably because of /tmp/ Under the table of contents , Deleted by other system cleanup programs , The check can be done by the following command :
lsof -p PID | grep java_pid

If it lists a socket file , But the file does not exist , So this is the problem described ;

  • JVM With -XX:+DisableAttachMechanism Option to start the ;
  • Java Process /tmp The directory is physically related to shell Of /tmp The catalog is different , because Java Is in a container or chroot Running in the environment .jattach Try to solve this problem automatically , However, it may lack the necessary permissions to do so, which can be checked by the following command :
strace build/jattach PID properties
  • JVM Busy , Unable to reach the safe point , for example ,JVM A long garbage collection is in progress , Check current JVM Whether busy commands :
kill-3 PID

Working well JVM The process should print thread dumps and heap information in its console ;

3)Failed to inject profiler into <pid>

Established and objective JVM The connection of , but JVM Unable to load profiler shared library , Make sure JVM The user of the process has access to libasyncProfiler.so Authority , The absolute path to access is exactly the same . For more information , Please see the #78 Issue.
    
4)Perf events unavailble. See stderr of the target process.

perf_event_open() System call failed , The error message is printed to the destination JVM In the error stream .
Typical reasons include :

  • /proc/sys/kernel/perf_event_paranoid Set to restricted mode (>=2);
  • seccomp Disable... In the container perf_event_open API;
  • The operating system runs under a hypervisor that does not virtualize performance counters ;
  • The current system does not support perf_event_open API, for example WSL.

If the configuration cannot be changed , You can return to using -e itimer Analysis mode . It is similar to cpu Pattern , But you don't need performance event support , But there is a disadvantage , Unable to collect information about kernel stack traces ;
    
5)No AllocTracer symbols found. Are JDK debug symbols installed?

May need to be installed with OpenJDK Debug the package of symbols , For more information , See allocation analysis .
    
Be careful , except HotSpot( for example Zing)JVM Beyond support , The rest of the JVM Allocation analysis is not supported .
    
6)VMStructs unavailable. Unsupported JVM?

JVM Shared library not exported gHotSpotVMStructs* Symbol - Obviously this is not a HotSpot JVM. Sometimes , Incorrectly constructed JDK It may also result in the same message ( Please see the 218 Issue), In these cases , install JDK Debugging symbols can solve problems ;
    
7)Could not parse symbols due to the OS bug

Async-profiler Cannot parse non Java Function name , because /proc/[pid]/maps The content in is corrupt , as everyone knows , Use Linux kernel 5.x function Ubuntu when , This problem occurs in the container . This is an operating system error , see also https://bugs.launchpad.net/Ubuntu/+source/Linux/+bug/1843018.
8、[frame_buffer_overflow]

This message in the output indicates that there is not enough space to store all call traces , Consider using -b Option to increase the framebuffer size .

原网站

版权声明
本文为[Fenglibin]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/173/202206221835236201.html