当前位置:网站首页>Go deep: the evolution of garbage collection

Go deep: the evolution of garbage collection

2022-06-23 22:50:00 wenxing

Stop the world It's about garbage collection (Garbage Collection,GC) A topic that can't be avoided , once Go Linguistic GC Mechanisms also threaten the response time of services ——Discord Technical team articles Why Discord is switching from Go to Rust it has been reviewed that Go Language GC The problems brought about by .Go It has been greatly improved through version iteration GC The problem of , Average time STW The time from 100+ms Down to 0.5ms—— What magic is it that makes the world hardly need to pause ? In this paper , We ask questions 、 The solution attempts to sort out the main process of this evolution .

What are the mechanisms for garbage collection , What are the pros and cons of each ?

Garbage collection mainly includes 「 Reference count 」 And 「 track 」 The mechanism of :

  • Reference count (Reference Counting): Add a counter to each object , Whenever the object is referenced / When dereferencing , The counter increases automatically / Self reduction 1; When the counter is reset to zero , The object should be recycled .
  • track (Tracing): Periodically traverse memory space , Start with several root storage objects and find the storage objects related to them , Then mark the remaining storage objects that are not associated , Finally, reclaim the memory space occupied by these unrelated storage objects .

These two methods have their own advantages and disadvantages :

Reference count

track

The whole program needs to be suspended

no

yes

The extra overhead of the runtime

Concurrency control of counters Memory space of the counter

nothing

Non recyclable conditions

Object circular reference *

nothing

* notes : Similar to the following code ,person And apartment Circular references between objects of , As a result, it can not be recycled by the garbage collector based on circular reference .

person = Person()
apartment = Apartment()
person.apartment = apartment
apartment.owner = person
person = None
apartment = None

among , The method of tracking is Java Multiple garbage collection algorithms 、Go Language, etc ; The method of reference counting is swiftObjective-CPython Etc ,C++ Smart pointers to can also be considered implementation of reference counting —— among Python Provides circular reference detection , and swiftObjective-C Then use ARC(Automatic Reference Counting), The coder needs to judge the reference type , Distinguish between strong and weak references , for example

class Apartment {
    weak var owner: Person?
}

Go What is the specific use of the tracking garbage collection mechanism ?

Go 1.5 after , Use the three color marking method for garbage collection .

Traditional marking - The idea of sweeping algorithm is 70 The era brought forward . Hang the user program at some point (STW,stop the world), Implement the garbage collection mechanism —— There are two stages :

  • Mark : Start marking objects from some root nodes , Step by step through the survival unit , Until no traversable units remain ; This step requires STW, Otherwise, it may be wrong Recycle objects newly created by users .
  • Clean : Reclaim the remaining unlabeled units .

Three color marking method is an improvement of the traditional marking stage , Divided into white ( Objects not scanned )、 gray ( Object to be traversed ) And black ( Object traversed ):

mark-with-colors.png
// 1.  Mark all objects on the stack as gray 
// 2.  Take out all gray objects and mark them black , Mark the objects referenced by this object as gray , Until there are no gray objects 
func mark() {
  for object in range getStackObjects() {
    markGray(object)
  }
  for {
    object, ok := getNextGrayObject()
    if !ok {
      break
    }
    markBlack(object)
    markGray(object.GetReferences())
  }
}

Tricolor marking algorithm can make STW Time is greatly reduced , Make the user program and garbage collection algorithm can be carried out concurrently . actually ,Go 1.10 After the version , Average time STW The time is reduced to 0.5ms

What makes user programs and tags - Sweeping algorithms cannot be executed concurrently ?

Because the user program interferes with the marking process . In order to avoid wrong cleaning objects , So we need STW.

that ,“ interfere ” What are there ?

  • The user created a new object —— The object remains white , May end up being incorrectly recycled ;
  • You dereference a white object from a gray object , And make a black object reference it —— The white object will not be scanned , Because the black object means that the relevant reference object has been scanned , Thus, the white object is incorrectly recycled .

Garbage collection algorithm when running in parallel with user program , How to ensure correctness ?

Depending on the write barrier (Write Barrier). The write barrier is intended to be a mechanism within the operating system , It ensures that the process of writing to the storage system is in a specific order ; In the garbage collection algorithm , The writing barrier is The specific code executed at each write .

We open the write barrier during marking , So as to try to avoid the interference of the user program to the marking process .Go 1.8 The previous method was to use Dijkstra stay 78 Method proposed in —— Insert write barrier (Insertion Write Barrier):

Every time a reference occurs , If the referenced object is white , Set it to gray , Be similar to :

func writePointer(slot, ptr) {
  if isWhite(ptr) {
    markGray(ptr)
  }
    *slot = ptr //  Set references 
}

This write barrier just solves the interference of the user program mentioned above with the marking process :

  • The object newly created by the user , Directly marked as gray , Avoid error recycling ;
  • When the parent node of a white object changes from a gray object to a black object , The object is marked gray , It also avoids error recycling .
insertion-write-barrier.png

This is because the insert write barrier satisfies Strong trichromatic invariance

Black objects don't point to white objects , It only points to gray objects or black objects .

What about the insert write barrier cost?

After the object on the stack is set to black , To avoid error recycling , Or you need to add a write barrier to the objects on the stack ( The overhead of frequent operations increases ), Or you need to rescan the stack (STW).

therefore ,Go stay 1.5 Version to 1.7 edition , After the insert write barrier is turned on , Only the pointer changes on the heap are grayed out , Do not change the pointer on the stack ; Mark the completed STW, The white objects on the stack will be marked again .

Go from 1.5 Every time before STW Time consuming from ~100ms Reduce the cost of this stage ~10ms.

How to further optimize the insert write barrier , Reduce STW Time consuming ?

Introduce here 1990 Year by year Yuasa The proposed delete write barrier (Deletion Write Barrier):

When a reference to an object is deleted , If the object is white , The object is grayed out . The code example is as follows :

The reliability of deleting the write barrier comes from its satisfaction Weak trichromatic invariance

The white object pointed by the black object must contain an accessible path from the gray object to multiple white objects

This ensures that the white object will delete the reference , Its own and child nodes can always be marked black during the marking phase , So as to avoid the dangling pointer caused by error recycling .

Final ,Go 1.8 Version introduces a hybrid write barrier that combines insert and delete write barriers (Hybrid Write Barrier):

func writePointer(slot, ptr) {
  if isWhite(*slot) {
    markGray(*slot) // [ Delete the write barrier ] When deleting a reference , Dereferenced objects are marked gray 
  }
  if isCurrentStackGray() {
    markGray(ptr) //  At present goroutine If the stack has not been scanned , Then the pointer points to the object marked gray 
    //  This is because Go 1.8 It has been set before , When the write barrier is turned on , All new objects are marked black 
    //  So where the pointer is goroutine The stack has not been scanned yet , The pointer is grayed out for further scanning 
    //  If the current pointer is goroutine When it is already black ,
    // *  The pointer has either been scanned ( gray / black )
    // * Or it is a new allocation object ( black )
  }
  *slot = ptr
}
Whether mixed write barriers solve the problem of inserting write barriers ?

The real problem is : Whether mixed write barriers are avoided 1. At the end of the tag STW Then rescan the stack ; 2. Open the write barrier to the objects on the stack ?

Inserting the write barrier requires rescanning the stack , The white object is referenced by the pointer of the black object on the stack ; Now because the write barrier is removed , Such white objects are grayed out . So there is no need to rescan the stack . And notice , The write barrier is to gray the white object without changing the color of the black object on the stack , Therefore, the performance loss of opening the write barrier to the objects on the stack is avoided .

therefore ,Go 1.8 The introduction of hybrid write barriers guarantees performance , It also reduces the cost of rescanning the stack STW expenses . Every time GC Of STW Time from insert write barrier 1.5 Version of ~10ms Down to 1.8 Of ~0.5ms.

Go 1.8 Why do you need to STW?

There are two more stages to be STW:

  • GC Preparation before you start , For example, set the write barrier ;
  • At the end of the mark , Rescan global variables 、 Scan the system stack 、 End the marking process, etc .
原网站

版权声明
本文为[wenxing]所创,转载请带上原文链接,感谢
https://yzsam.com/2021/12/202112101650212001.html