当前位置:网站首页>Talk about GC of JVM

Talk about GC of JVM

2022-06-24 13:24:00 Yiyan

Link to the original text :https://www.changxuan.top/?p=1457


introduction

JVM Medium GC It should be a clich é in technology blogs , There are also many articles with uneven quality on the Internet , It can be seen that most of them are “ Copy and paste ” Style . In writing this article , I asked myself “ Now I'm creating data garbage ?”.

Why should I write ? In fact, the main purpose of writing this blog is not to show it to others , I want to record how I feel about JVM in GC Understanding and understanding of . I think there is a kind of article that is mainly used for recording , Record the process of knowing and thinking about a thing . If you can translate your understanding of this thing into systematic and clear context words , So it should be said that you really understand it . therefore , This article is for recording , If my thinking process happens to help you with GC Of course, it's the best way to understand .

Text

JVM and GC

First of all, let's introduce to the students who haven't done enough homework JVM and GC These two nouns .

JVM The full name is Java Virtual Machine, That is to say Java virtual machine , Its main function is to execute bytecode files (class). But one thing to be clear is , Although it's called Java virtual machine , But it doesn't just work Java Bytecode file compiled by language code . As I said before “ The main function is to execute the bytecode file ”, So no matter what language it is, as long as the file format generated after your code is compiled conforms to JVM The specification of , Then you can be in JVM On the implementation . for example ,Kotlin、Groovy、Scala Other languages , After compiling, it can run in Java In the virtual machine . in addition , There's one called 《JVM Virtual machine specification 》 Things that are , According to this standard, anyone can realize their own Java virtual machine . So in the long history of Technological Development , There has been more than one Java virtual machine , image Classic VM、Exact VM、HotSpot VM. among HotSpot VM And we are most familiar with 、 The most commonly used one , What we mentioned later Java Virtual machines also default to HotSpot VM.

that GC What is it? ?GC The full name is Garbage Collect, namely “ garbage collection ”, Remember it's not on the street “ Garbage collection ”! Understand “ garbage collection ” You have to figure it out before GC and JVM What's the relationship ? according to 《Java Virtual machine specification 》 The provisions of the , The memory it manages is divided into several different areas ( Details refer to 《Java Memory area and memory overflow exception 》 This article ). There are two of these areas , One is called “ Pile up ”, One is called “ Method area ”(JDK 8 Before ). Because the memory allocation and use of these two areas are uncertain , And the heap takes up a lot of memory , therefore 《Java Virtual machine specification 》 The virtual machine will be required to implement “ garbage collection ”. But because the method area is special ( Storing content results in lower cost performance for recycling ), For the method area “ garbage collection ” There is no mandatory requirement for the implementation of , But be sure to implement... On the heap “ garbage collection ”. therefore , Next we're going to talk about the pile “ garbage collection ”.

new keyword

Yes Java Classmate , Most people know that through new Keyword to create an object . Because the high-level language does the bottom work really well , As a result, most students don't understand new What are the details behind it ? therefore , We need to be clear here JVM stay “ see ” To your new What actions will be performed after the keyword ? First of all to Class loading check , In the following order is Allocate memory Initialize zero value Set object header and perform init Method . Because this article is not written to JVM The explanation is comprehensive , So I won't introduce the above steps in detail . We see that the second step is Allocate memory , That's what we're focusing on . The memory allocation here , A space will be divided from the heap for the use of the new objects created .( notes : There are two ways of distribution , Free list and pointer collision )

See here , You should understand that creating a new object will take up part of the space on the heap , Even if the space is small , After all, it takes up . After all, the memory space on the machine is limited , It's impossible to say that you can only use it endlessly . So-called , To borrow or to return , It's not difficult to borrow again . for instance , In common Web In the project , Accompanied by a user's Http Request Many objects can be created , But users are receiving what they expect Http Response after , Some objects created in the middle will not be used . stay Java Programming does not require programmers to be responsible for recycling the objects they create , So this part of the work is left to JVM Virtual machines are responsible for . And here's the object to recycle , That's what I mentioned earlier “ garbage collection ” Garbage in .

How to determine whether an object needs to be recycled ?

“ To borrow or to return , It's not difficult to borrow again ”, But how to judge whether to return the memory allocated to the object ? In fact, it is to judge whether the object still has the value of existence , If you don't have it, you have to hurry to get the place “ Take off ” come out . If you create an object , If you don't quote it anywhere, you think you should end it “ life ” 了 . Based on this principle , There is Reference counting algorithm . Simply speaking , Is to set a reference counter in the object , Whenever there's a place to quote it , Add one to the counter value ; When the reference fails , The counter is subtracted by one . So as long as the reference counter of an object is found to be zero , Then think it's impossible to be reused , The memory it occupies in the heap can be returned . Smart students must see the shortcomings of this algorithm at a glance . That is when two objects refer to each other , Will cause their counters to be nonzero , It will become “ ever-young ” The object of . therefore ,JVM There's no such algorithm .

In addition to the reference counting algorithm , One more Reachability analysis algorithm . The basic idea of this algorithm is through a series of called “GC Roots” As the starting node set , Start with these nodes , Search down... Based on reference relationships , The path taken by the search process is called “ References to the chain ”(Reference Chain), If an object arrives at GC Roots There is no chain of references between , Then it is proved that this object has no value of existence . The algorithm is not complicated , among GC Roots It's worth noting that . stay Java In the technical system , Often as GC Roots There are the following types of : Objects referenced in the virtual machine stack ; Objects referenced by class static properties and constants in the method area ;Java Virtual machine internal reference, etc . Of course, it's not just the above objects ,JVM It will be adjusted dynamically according to different garbage collectors and collection areas GC Roots aggregate .

thus , We know why we have to GC ? Mainly in the JVM Which area of the managed memory does GC? And how to judge whether an object can be recycled ?( Think about it , If you can't answer it, you have to turn to the front and have a good look )

Generational collection

What should I do next ? Now that you know how to find the objects that need to be recycled , It must be thinking about how to recycle .

Don't rush to say “ Recycling is not to recover the allocated memory ”. Need to know , In the world of computer, there are some rules. One is to do things efficiently ; The second is to hope that the horse can run , I want to eat less grass . therefore ,JVM The garbage collector implemented in should use as few resources as possible , Don't put the cart before the horse ; In addition, the collection process should minimize the impact on user threads , Ensure that the user thread can work efficiently , So as to bring users a good experience .

Based on the above requirements , Smart people start thinking about how to be faster 、 better GC. First of all, I will introduce a “ Generational collection ” theory , Maybe it's sudden, but it's not hard to understand , It is put forward by people in order to better realize the collector . Generational collection relies on two generational hypotheses : One is the weak generational hypothesis , The vast majority of objects are born and perished ; The second is the strong generational hypothesis , The more objects survive the garbage collection process, the more difficult it is to recycle . Think carefully , It's reasonable . Several classic garbage collectors in common use will Java The heap is divided into different regions , Different areas are stored according to the age of the object .

Some curious students may say “ I don't divide areas ” What can be done ?

Think about it , If you don't divide the area, it means that all objects are stored in such a piece of memory . When you start collecting , Be sure to mark ( Different collectors have different tagging strategies , However, according to the current classic garbage collector strategy, all user threads will be suspended at a certain marking stage ) Objects that need to be recycled , During the marking process, all user threads accessing this memory area need to be suspended , This “ One size fits all ” The strategy is simple , But it's not the best . If , It is stored in different areas according to the age of each object ,“ Live and die ” Put the object in an area ,“ older ” Put the object in an area . In this way, different strategies can be used to recycle for different areas , about “ active ” The collection frequency of the area can be higher , For an area GC It does not affect the user thread to access other areas, etc .

In commercial Java In the virtual machine , Generally, at least Java The heap is divided into The new generation (Young Generation) and Old age (Old Generation) Two regions . The new generation , That's what we said before “ Live and die ” The area of the object . Of course, every time after recycling in the Cenozoic area , To achieve a certain degree “ Age ” And the older generation . Now? , We can analyze the characteristics of these two areas .

In the new generation GC go by the name of Minor GC/Young GC, Because the object here is not easy “ Survive “, Thus, a large amount of free memory should be generated after a collection . In the older generation GC go by the name of Major GC/Old GC, Each collection may only recycle a few objects . In addition to being collected separately in the new and old generations , The whole heap collection may also be triggered when the memory is seriously low (Full GC), Try to minimize Full GC Appearance .

Garbage collection algorithm

Now that I know why I want to Java The heap is divided into regions and the characteristics of each region , We can meet three kinds of “ Garbage collection ” Algorithm . You can also think of it as “ Garbage collection ” Methodology of .

“ Mark - eliminate ” Algorithm

“ Mark - eliminate ” Algorithm , Mark refers to mark object ( Mark the items to be recycled or not , As long as you can distinguish between two types of objects ), Clearing is to remove useless objects from memory . It can be seen that , This is a relatively simple strategy , First find and then clear . however , This algorithm will produce a lot of memory fragmentation , Due to the existence of too many memory fragments, there may be no allocation when large objects appear, resulting in triggering a garbage collection action again ; And as the number of objects increases, so does the time to mark and clear .

“ Mark - Copy ” Algorithm

“ Mark - Copy ” Algorithm , I don't want to introduce it too much . Let's talk about replication strategy , This algorithm divides the memory space into two parts , Each half . But only for JVM Use part of the memory , The other part is empty . When will the other part be used ? It happens on the part of memory used GC when , Put unmarked objects ( Survive ) All copied to another piece of free memory and put together , Then clean up all the memory data used before , Rotate two spaces . In fact, it's at the expense of space , To solve the problem of memory fragmentation . This method is also very friendly for memory allocation , If you don't care about the waste of space , It seems to be a particularly good way . however , stay “ Inch memory , An inch of gold ” In the , How can you not care about space ! therefore , An optimized version of the algorithm has been proposed ——Appel Recycling . Improved version , There is no simple and direct division of spatial equivalence into two parts , It's divided into three parts .Appel The new generation is divided into a larger one Eden Space and two smaller ones Survivor Space , Work with only Eden and One of them Survivor Space (HotSpot VM The default Eden:Survivor by 8:1). If it happens GC, Then we will Eden and Survivor Copy the surviving object in to another free Survivor In the space . The improved algorithm greatly reduces the size of spare space , Through this partition, the problem of wasting memory space is solved . This division also benefits from the fact that every time the new generation GC It recycles most of the features of the object . however , There may be spare space Survivor Too small , Lead to a time of GC After that, we can't put all the surviving objects , So we also need the elderly to do the distribution guarantee . This algorithm is more suitable for use in the new generation .

“ Mark - Arrangement ” Algorithm

“ Mark - Arrangement ” Algorithm , As its name suggests , After the tag is done , First move all the surviving objects to one end of the memory space , Then clear the memory beyond the boundary . By checking the memory after the mark “ Arrangement ” action , Thus avoiding “ Mark - eliminate ” The algorithm produces a lot of memory fragments . The algorithm diagram is as follows :

“ Mark - Arrangement ” Algorithm diagram

The classic garbage collector

“ Saying without practicing the fake trick , Just practice, don't talk about silly tricks ”, So we can't just talk about theoretical knowledge , Next, I'll introduce some garbage collectors . All in all , All the contents mentioned above are intended to serve the implementation of garbage collector in the end , After all, the collector is the ultimate “ work ” Of .

The classic garbage collector

See the picture above , Maybe it doesn't help you to know these collectors . therefore , You also need the following picture to help you understand and remember .

HotSpot Garbage collector for virtual machines

Serial The collector

Serial The collector ( Mark - Copy ), As can be seen from the figure, it is a collector working in the new generation . because “ Born early ”, So the strategy is simple , It's going on GC There is only one GC Threads work ( Single thread ), And you need to pause all the other worker threads (Stop The World). Of course , We can't just see its shortcomings , At present it is still HotSpot VM The default new generation garbage collector in client mode , Because of its simple strategy, it is the collector that consumes the least memory resources . In a single core processor server , Because there is no overhead of thread interaction , Just can get the highest single thread collection efficiency .

ParNew The collector

ParNew The collector ( Mark - Copy ), As Serial The multithreaded version of does not bring more innovation , Usually with CMS Cooperation . The number of collection threads it turns on by default is the same as the number of processor cores , But it can also be used -XX:ParallelGCThreads Parameter to set GC Number of threads .

Parallel Scavenge The collector

Parallel Scavenge The collector ( Mark - Copy ), It's also a new generation of multithreaded collectors that can collect data in parallel . It seeks to make JVM Achieve a controllable throughput ( Run user code time and run user code plus execution GC The actual ratio ).

 throughput  =  Run user code time /( Run user code time + Garbage run time )

For effective control ,Parallel Scavenge Two parameters are provided for precise control , One is to control the maximum garbage collection pause time -XX: MaxGCPauseMillis Parameters ; One is used to set the throughput size -XX: GCTimeRatio Parameters .

-XX: MaxGCPauseMillis The parameter can be set to a value greater than 0 Millisecond count , The collector tries to make sure every time GC It doesn't take more than that . Of course, if the setting is too small , In order to ensure that the collector does not exceed this value, it often does it frequently GC.

-XX: GCTimeRatio The parameter can be set to a value greater than 0 Less than 100 The integer of . Suppose it's set to n, that 1/(n+1) The ratio of garbage collection time to total time , That is, the system will spend no more than the total time 1/(n+1) For garbage collection .(GC The reference document is described as “-XX: GCTimeRatio=nnn, The ratio of garbage Collection time to application time is 1/(1+nnn)”). It can also be proved by simple formula deduction 1/(n+1) Is the reciprocal of throughput .

in addition , Another advantage of this collector is that it only needs to set basic parameters such as heap size , Maximum pause time , throughput , Other detailed parameters can be adjusted by itself , This is a Parallel Scavenge The adaptive adjustment strategy of .

Serial Old The collector

Serial Old The collector ( Mark - eliminate ), You can tell by the name that this is Serial The old version , It's also single threaded .

Parallel Old The collector

Parallel Old The collector ( Mark - Arrangement ), Through and with Parallel Scavenge Work together to form “ Throughput priority ” The collector combination of . In some cases where throughput or processor resources are scarce , You can give priority to this combination .

CMS The collector

CMS The collector ( Mark - eliminate ) Its full name is Concurrent Mark Sweep , The goal of this processor is to get the shortest recycle pause time . For systems that provide services in the form of websites on the deployment server , You can use this collector to bring users a good interactive experience .

CMS Of GC There are four steps :1. Initial marker ;2. Concurrent Tags ;3. Re label ;4. Concurrent elimination . During the initial and retag phases “Stop The Word”, But these two stages take a little time . Although it has concurrent collection 、 Advantages of low pause , But also very sensitive to processor resources . in addition , During concurrent marking and cleanup , Since the user thread is also executing at the same time, new objects to be recycled will be generated , This kind of object is called “ Floating garbage ”. Because of the tagging - Clear algorithm , It also faces the problem of memory fragmentation .

Garbage First The collector

Garbage First The collector ( Mark - Arrangement 、 Mark - Copy ) Also known as G1 The collector , In fact, this is a collector that is different from all previous collectors in dividing memory areas , It creates a collector for local design ideas and based on Region Memory layout form of . As a garbage collector mainly for server applications , The development team's expectation for it is to replace CMS .

G1 It is still designed according to the theory of generational collection , But not directly Java The pile is divided into the new generation and the old generation . It's the continuous Java The heap is divided into several independent areas of equal size (Region), Each region can act as a new generation of Eden、Survivor Space , Or old age space .G1 To play different roles Region Adopt different strategies to deal with . however , stay JVM There will always be some big objects in , So there's a special kind of Region——Humongous Area , Used to store large objects . The definition of a large object is that it is larger than one Region Half capacity objects .G1 Through the management of Java The granularity of the heap is refined to Region size , In this way, the memory space collected each time is Region An integral multiple of size . After refinement , With the corresponding data structure, we can control the pause time in a planned way to collect .

G1 Of GC There are four steps :1. Initial marker ;2. Concurrent Tags ;3. Final marker ;4. Screening and recovery . If the machine has large memory space, it can be used , Still recommended G1 Conduct GC Of .

Other collectors

There are some low latency collectors including Shenandoah The collector 、ZGC The collector .

summary

This article mainly talks about JVM in GC Relevant knowledge , Limited by space, some places are not introduced in detail . therefore , If you want to seriously study the relevant knowledge, you still need to read special books . in addition , In the appendix, some knowledge points that are not suitable for the above are added .

appendix

About quoting

In the text , Due to the structure of the article, the citation is not introduced too much . therefore , It is supplemented in the appendix . We can see , It appears many times in the text “ quote ” This word . Actually in JVM in , In order to manage and differentiate, the references are divided into four categories , According to the order of citation strength from strong to weak, they are strong references 、 Soft citation 、 Weak reference 、 Virtual reference .

What is a strong reference ? Let's take an example ,Object obj = new Object(); This is a strong quote . As long as the reference relationship exists , It is impossible for the garbage collector to recycle the referenced objects .

Soft references are used to describe some unnecessary objects , Only when the system runs out of memory does the garbage collector recycle the objects associated with soft references .

Weak references are less powerful , Once occurred GC The associated objects are recycled .

Virtual reference is the weakest kind of reference relationship , It has no effect at all on the survival of the object , It exists to receive a system notification when the object is recycled by the collector .

Recovery object

When an object is marked for the first time , In fact, it doesn't have to be recycled , Still have “ Coming back to life ” Of opportunity . After being marked for the first time , Then there will be another screening , The filter condition is whether this object is necessary to execute finalize() Method . If it is necessary to carry out , Then in execution finalize() I'll talk to you again GC Roots The objects in the collection can be associated to avoid being recycled in the second small-scale marking . Of course , If the object is not overridden finalize() Method , Or it has been called finalize() Method, then it's impossible “ Coming back to life ” 了 .

Reference material

[1] 《 In depth understanding of Java virtual machine 》

原网站

版权声明
本文为[Yiyan]所创,转载请带上原文链接,感谢
https://yzsam.com/2021/05/20210522180817471f.html