当前位置:网站首页>Yiwen teaches you to understand the stack operation in go

Yiwen teaches you to understand the stack operation in go

2022-06-24 16:54:00 luozhiyun

Please state the source of reprint ~, This article was published at luozhiyun The blog of :https://www.luozhiyun.com/archives/513

This article uses go Source code 15.7

Knowledge point

LInux Process layout in memory

Linux_stack

Each process in a multitasking operating system runs in its own memory sandbox . stay 32 In bit mode , It's always 4GB Memory address space , Memory allocation is the allocation of virtual memory to processes , When a process actually accesses a virtual memory address , The operating system triggers page missing interrupt , Allocate a corresponding space on the physical memory, and then establish a mapping relationship with it , So the virtual memory address that the process accesses , Will be automatically converted to a valid physical memory address , Then you can store and access the data .

Kernel space: Operating system kernel address space ;

Stack: Stack space , It is a local variable created temporarily by user stored program , The growth direction of stack is from high address to status address . In modern mainstream machine architecture ( for example x86) in , Stacks are growing down . However , There are also some processors ( for example B5000) Stacks grow up , There are also some architectures ( for example System Z) Allow custom stack growth direction , There are even some processors ( for example SPARC) It's the loop stack ;

Heap: Heap space , The heap is used to store the memory segments that are allocated dynamically during the process running , It's not fixed in size , Can expand or shrink dynamically ;

BBS segment:BSS paragraph , It stores global or static data , But it's the whole picture / Static uninitialized data ;

Data segment: Data segment , Usually refers to a block of memory used to store the initialized global variables in the program ;

Text segment: Code segment , An area of memory used to store the execution code of a program . The size of this area has been determined before the program runs , And the memory area is read-only .

Stack related concepts

In computer science, a call stack is a stack data structure that stores information about the active subroutines of a computer program.

In computer programming, a subroutine is a sequence of program instructions that performs a specific task, packaged as a unit.

The call stack call stack, Stack for short , It's a stack data structure , Used to store activities about computer programs subroutines Information . In computer programming ,subroutines Is a series of program instructions to perform a specific task , Packaged as a unit .

A stack frame is a frame of data that gets pushed onto the stack. In the case of a call stack, a stack frame would represent a function call and its argument data.

Stack frame stack frame It is also called frame frame Is the call relationship between functions stored in the call stack , Each frame corresponds to the function call and its parameter data .

With a function call, there must be a caller caller And the callee callee , If in function A in call function B,A yes caller,B yes callee.

The stack frame structure of the caller and callee is shown in the figure below :

Stack layout

Go The stack register in the assembly code is very vague , We only need to know about two registers BP and SP It's all right :

BP: Reference pointer register , Maintain the reference address of the current stack frame , To index variables and parameters , It's like an anchor , In other architectures, it's equivalent to a frame pointer FP, It's just x86 Under the architecture , Variables and parameters can be accessed through SP To index ;

SP: Stack pointer register , Always point to the top of the stack ;

Goroutine Stack operation

G Stack

stay Goroutine There is one of them. stack data structure , There are two attributes in it lo And hi, Describes the actual stack memory address :

  • stack.lo: The low address of the stack space ;
  • stack.hi: The high address of the stack space ;

stay Goroutine China will pass stackguard0 To determine whether stack growth is needed :

  • stackguard0:stack.lo + StackGuard, be used for stack overlow Detection of ;
  • StackGuard: Reserve size , Constant Linux Up for 928 byte ;
  • StackSmall: Constant size is 128 byte , Optimization for small function calls ;
  • StackBig: Constant size is 4096 byte ;

According to the size of the called function stack frame to determine whether the need for expansion :

  1. When stack frame size (FramSzie) Less than or equal to StackSmall(128) when , If SP Less than stackguard0 Then perform stack expansion ;
  2. When stack frame size (FramSzie) Greater than StackSmall(128) when , According to the formula SP - FramSzie + StackSmall and stackguard0 Compare , If Less than stackguard0 Then the expansion will be carried out ;
  3. When stack frame size (FramSzie) Greater than StackBig(4096) when , First of all, I will check stackguard0 Has it been transformed into StackPreempt Status quo ; And then according to the formula SP-stackguard0+StackGuard <= framesize + (StackGuard-StackSmall) Judge , If it is true Then the expansion will be carried out ;

It should be noted that , Because the stack is growing from high address to low address , So when comparing , They are all smaller than before expansion , We need your products here .

When performing stack expansion , Will allocate more stack memory space in memory space , Then copy everything from the old stack to the new stack , And modify the pointer to the corresponding variable of the old stack to point to the new stack again , Finally, destroy and reclaim the memory space of the old stack , So as to realize the dynamic expansion of the stack .

assembly

Here is a brief explanation of some of the problems that will be used in the later analysis Go Used in language Plan 9 assembly , In case you don't understand .

Assembly function

Let's take a look first plan9 The definition of assembly function of :

function

stack frame size: Parameter space containing local variables and extra calls to functions ;

arguments size: Contains parameters and the size of the return value , For example, the input parameter is 3 individual int64 type , The return value is 1 individual int64 type , So the return value is sizeof(int64) * 4;

Stack adjustment

The stack is adjusted by the hardware SP Register operation to achieve , for example :

SUBQ    $24, SP  //  Yes  sp  subtracting , Assign function stack frames to functions  
...
ADDQ    $24, SP  //  Yes  sp  add  , Clear function stack frame 

Because the stack is growing down , therefore SUBQ Yes SP Subtraction actually allocates stack frames to functions ,ADDQ Clear stack frame .

Common instructions

Addition and subtraction operation

ADDQ  AX, BX   // BX += AX
SUBQ  AX, BX   // BX -= AX

Data handling

Constant in plan9 For assembly $num Express , Can be negative , Decimal by default . The length of handling is determined by MOV Suffix determination .

MOVB $1, DI      // 1 byte
MOVW $0x10, BX   // 2 bytes
MOVD $1, DX      // 4 bytes
MOVQ $-10, AX     // 8 bytes

Another difference is in the use of MOVQ You can see the difference between brackets and no brackets .

//  Parentheses are references to pointers 
MOVQ (AX), BX   // => BX = *AX  take AX Point to memory area 8byte Assign a value to BX
MOVQ 16(AX), BX // => BX = *(AX + 16)

// Without brackets is a reference to a value 
MOVQ AX, BX     // => BX = AX  take AX The content stored in is assigned to BX, Pay attention to differences 

Jump

//  Jump unconditionally 
JMP addr   //  Jump to address , The address can be the address in the code 
JMP label  //  Jump to the tag , You can jump to the label position in the same function 
JMP 2(PC)  //  Based on the current instruction , forward / After jump  x  That's ok 

//  Jump conditionally 
JLS addr

Address operation :

LEAQ (AX)(AX*2), CX // => CX = AX + (AX * 2) = AX * 3

In the above code 2 representative scale,scale Can only be 0、2、4、8.

analysis

G The creation of

Because it's all in the stack Goroutine Upper , So first from G How to create and initialize the stack space is the first step . Because I am 《 Detailed explanation Go Language scheduling loop source code implementation https://www.luozhiyun.com/archives/448 》 We've already talked about that G The creation of , So here we will only explain the code of initialization part of stack .

G The creation of the runtime·newproc Create :

runtime.newproc

func newproc(siz int32, fn *funcval) {
	argp := add(unsafe.Pointer(&fn), sys.PtrSize)
	gp := getg()
	//  obtain  caller  Of  PC  register 
	pc := getcallerpc()
    //  Switch to  G0  Create 
	systemstack(func() {
		newg := newproc1(fn, argp, siz, gp, pc)
		...
	})
}

newproc The method switches to G0 On the call newproc1 Function G The creation of .

runtime.newproc1

const _StackMin = 2048
func newproc1(fn *funcval, argp unsafe.Pointer, narg int32, callergp *g, callerpc uintptr) *g {
	_g_ := getg()
	...
	_p_ := _g_.m.p.ptr()
	//  from  P  Get a new  G
	newg := gfget(_p_)
	//  If not, call  malg  Create 
	if newg == nil {
		newg = malg(_StackMin)
		casgstatus(newg, _Gidle, _Gdead)
		allgadd(newg) // publishes with a g->status of Gdead so GC scanner doesn't look at uninitialized stack.
	}
	...
	return newg
}

newproc1 The method is very long. , It's mainly about getting G , And then to get G Do some initialization work . We only look here malg Function call .

Calling malg Function will pass in a value of the minimum stack size :_StackMin(2048).

runtime.malg

func malg(stacksize int32) *g {
	//  establish  G  Structure 
	newg := new(g)
	if stacksize >= 0 {
		//  There will be  stacksize  The memory size required for system calls is reserved for each stack  _StackSystem
        //  stay  Linux/Darwin  On ( _StackSystem == 0 ) The bank does not change  stacksize  Size 
		stacksize = round2(_StackSystem + stacksize)
		//  Switch to  G0  by  newg  Initialize stack memory 
		systemstack(func() {
			newg.stack = stackalloc(uint32(stacksize))
		})
		//  Set up  stackguard0 , Used to determine whether to expand the stack 
		newg.stackguard0 = newg.stack.lo + _StackGuard
		newg.stackguard1 = ^uintptr(0) 
		*(*uintptr)(unsafe.Pointer(newg.stack.lo)) = 0
	}
	return newg
}

Calling malg The size of the incoming memory will be added with a _StackSystem Value is reserved for system calls ,round2 The function rounds the value passed in to 2 The index of . Then it will switch to G0 perform stackalloc Function for stack memory allocation .

It will be set after the assignment stackguard0 by stack.lo + _StackGuard, To judge whether stack expansion is needed , We'll talk about .

Initialization of stack

file location :src/runtime/stack.go

//  Global stack cache , Distribute  32KB The following memory 
var stackpool [_NumStackOrders]struct {
	item stackpoolItem
	_    [cpu.CacheLinePadSize - unsafe.Sizeof(stackpoolItem{})%cpu.CacheLinePadSize]byte
}

//go:notinheap
type stackpoolItem struct {
	mu   mutex
	span mSpanList
} 

//  Global stack cache , Distribute  32KB  Above memory 
var stackLarge struct {
	lock mutex
	free [heapAddrBits - pageShift]mSpanList // free lists by log_2(s.npages)
}

//  initialization stackpool/stackLarge Global variables 
func stackinit() {
	if _StackCacheSize&_PageMask != 0 {
		throw("cache size must be a multiple of page size")
	}
	for i := range stackpool {
		stackpool[i].item.span.init()
		lockInit(&stackpool[i].item.mu, lockRankStackpool)
	}
	for i := range stackLarge.free {
		stackLarge.free[i].init()
		lockInit(&stackLarge.lock, lockRankStackLarge)
	}
}

Before we do stack allocation, let's see what we do when we initialize the stack , It should be noted that ,stackinit Is in the call runtime·schedinit The initialization of the , Is in the call runtime·newproc What happened before .

Two global variables are initialized during stack initialization stackpool and stackLarge.stackpool Can assign less than 32KB Of memory ,stackLarge Used to allocate more than 32KB Stack space of .

Stack allocation

From the initialization of two global variables, we can also know , The stack will be allocated from different locations according to its size .

Stack memory allocation

file location :src/runtime/stack.go

func stackalloc(n uint32) stack { 
    //  there  G  yes  G0
	thisg := getg()
	...
	var v unsafe.Pointer
	//  stay  Linux  On ,_FixedStack = 2048、_NumStackOrders = 4、_StackCacheSize = 32768
	//  If the stack space requested is less than  32KB
	if n < _FixedStack<<_NumStackOrders && n < _StackCacheSize {
		order := uint8(0)
		n2 := n
		//  Greater than  2048 , that  for  loop   take  n2  except  2, until  n  Less than or equal to  2048
		for n2 > _FixedStack {
			// order  Except how many times 
			order++
			n2 >>= 1
		}
		var x gclinkptr
		//preemptoff != "",  stay  GC  It'll be set up when it's done , If the  GC  So from  stackpool  Distribute 
		// thisg.m.p = 0  In the system call and   change  P  Call when the number of , In case of , Then from  stackpool  Distribute 
		if stackNoCache != 0 || thisg.m.p == 0 || thisg.m.preemptoff != "" { 
			lock(&stackpool[order].item.mu)
			//  from  stackpool  Distribute 
			x = stackpoolalloc(order)
			unlock(&stackpool[order].item.mu)
		} else {
			//  from  P  Of  mcache  Allocate memory 
			c := thisg.m.p.ptr().mcache
			x = c.stackcache[order].list
			if x.ptr() == nil {
				//  Apply a piece of memory from the heap to fill in stackcache in 
				stackcacherefill(c, order)
				x = c.stackcache[order].list
			}
			//  Remove the head node of the linked list 
			c.stackcache[order].list = x.ptr().next
			c.stackcache[order].size -= uintptr(n)
		}
		//  Get the assigned span Memory block 
		v = unsafe.Pointer(x)
	} else {
		...
	}
    ...
	return stack{uintptr(v), uintptr(v) + uintptr(n)}
}

stackalloc According to the parameters passed in n The size of the distribution , stay Linux On if n Less than 32768 bytes, That is to say 32KB , Then it goes into the allocation logic of the stack .

The stack finger size is 2K/4K/8K/16K The stack , At the time of distribution , Will be calculated according to the size of different order value , If the stack size is 2K, that order Namely 0,4K Corresponding order Namely 1, And so on . On the one hand, it can reduce differences Goroutine Get lock conflicts of different stack sizes , On the other hand, you can cache the corresponding size in advance span , For quick access to .

thisg.m.p == 0 It can happen in a system call exitsyscall Or change P The number of procresize when ,thisg.m.preemptoff != "" It will happen in GC When . That is to say, in system call exitsyscall Or change P The number of students is changing , Or in GC When , From stackpool Allocate stack space , Otherwise, from mcache In order to get .

If mcache Corresponding stackcache Can't get , So called stackcacherefill Apply a piece of memory from the heap to fill in stackcache in .

The main thing is ,stackalloc Due to switching to G0 To call , therefore thisg yes G0, We can also pass 《 How to compile and debug Go runtime Source code https://www.luozhiyun.com/archives/506 》 This article is about how to debug :

func stackalloc(n uint32) stack { 
	thisg := getg()
	//  Add a line to print 
	if debug.schedtrace > 0 {
		print("stackalloc runtime: gp: gp=", thisg, ", goid=", thisg.goid, ", gp->atomicstatus=", readgstatus(thisg), "\n")
	}
	...
}

Let's take a look at stackpoolalloc And stackcacherefill function .

runtime.stackpoolalloc

func stackpoolalloc(order uint8) gclinkptr {
	list := &stackpool[order].item.span
	s := list.first
	lockWithRankMayAcquire(&mheap_.lock, lockRankMheap)
	if s == nil {
		// no free stacks. Allocate another span worth.
		//  Distribute... From the heap  mspan
        // _StackCacheSize = 32 * 1024
		s = mheap_.allocManual(_StackCacheSize>>_PageShift, &memstats.stacks_inuse)
		if s == nil {
			throw("out of memory")
		}
		//  Just assigned  span  The number of objects allocated must be  0
		if s.allocCount != 0 {
			throw("bad allocCount")
		}
		if s.manualFreeList.ptr() != nil {
			throw("bad manualFreeList")
		}
		//OpenBSD 6.4+  The system needs to do extra processing 
		osStackAlloc(s)
		// Linux  in  _FixedStack = 2048
		s.elemsize = _FixedStack << order
		//_StackCacheSize =  32 * 1024
		//  Here is the general  32KB  The size of the memory block is divided into elemsize Size block , Connect with a one-way list 
		//  Last  s.manualFreeList  It points to the end of the memory 
		for i := uintptr(0); i < _StackCacheSize; i += s.elemsize {
			x := gclinkptr(s.base() + i)
			x.ptr().next = s.manualFreeList
			s.manualFreeList = x
		}
		//  Insert into  list  List the head 
		list.insert(s)
	}
	x := s.manualFreeList
	//  Representatives have been assigned 
	if x.ptr() == nil {
		throw("span has no free stacks")
	}
	//  take  manualFreeList  Move back one unit 
	s.manualFreeList = x.ptr().next
	//  Count the memory blocks allocated 
	s.allocCount++
	//  Because the first memory block allocated is  nil
	//  So when the pointer is nil  The delegates were assigned when they arrived 
	//  Then you need to remove the object from  list  Remove the head node of 
	if s.manualFreeList.ptr() == nil {
		// all stacks in s are allocated.
		list.remove(s)
	}
	return x
}

stay stackpoolalloc We'll find... In the function stackpool Corresponding order Subscript span The head node of the list , If it's not empty , Then, the attribute of the head node is directly manualFreeList The node pointed to is removed from the linked list , And back to ;

If list.first It's empty , So called mheap_ Of allocManual Functions are allocated from the heap mspan, Specific memory allocation related articles can see my article :《 Detailed explanation Go Memory allocation source code implementation https://www.luozhiyun.com/archives/434 》.

from allocManual The function will be assigned 32KB Memory block size , Assign new ones span After that, according to elemsize Size will 32KB Memory for cutting , And then through one-way linked list string together and the last memory address assignment to manualFreeList .

For example, the current elemsize The memory size represented is 8KB size :

stackpool

runtime.stackcacherefill

func stackcacherefill(c *mcache, order uint8) { 
	var list gclinkptr
	var size uintptr
	lock(&stackpool[order].item.mu)
	//_StackCacheSize = 32 * 1024
	//  take  stackpool  The allocated memory forms a one-way linked list  list
	for size < _StackCacheSize/2 {
		x := stackpoolalloc(order)
		x.ptr().next = list
		list = x
		// _FixedStack = 2048
		size += _FixedStack << order
	}
	unlock(&stackpool[order].item.mu)
	c.stackcache[order].list = list
	c.stackcache[order].size = size
}

stackcacherefill Function will call stackpoolalloc from stackpool Get half of the space in and assemble it into list Linked list , Then put it into the stackcache Array .

Large stack memory allocation

func stackalloc(n uint32) stack { 
	thisg := getg() 
	var v unsafe.Pointer
	 
	if n < _FixedStack<<_NumStackOrders && n < _StackCacheSize {
		...
	} else {
		//  The requested memory space is too large , from  runtime.stackLarge  Check to see if there is any space left 
		var s *mspan
		//  How many will be allocated  span  page , 8KB  For a page 
		npage := uintptr(n) >> _PageShift
		//  Calculation  npage  It can be 2 How many times , Used as an index of blocks of different sizes of memory 
		log2npage := stacklog2(npage)
 
		lock(&stackLarge.lock)
		//  If  stackLarge  The corresponding linked list is not empty 
		if !stackLarge.free[log2npage].isEmpty() {
			// Get the head node of the linked list , And remove it from the list 
			s = stackLarge.free[log2npage].first
			stackLarge.free[log2npage].remove(s)
		}
		unlock(&stackLarge.lock)

		lockWithRankMayAcquire(&mheap_.lock, lockRankMheap)
		// Here is stackLarge Empty situation 
		if s == nil {
			//  Request new memory from the heap  span
			s = mheap_.allocManual(npage, &memstats.stacks_inuse)
			if s == nil {
				throw("out of memory")
			}
			// OpenBSD 6.4+  The system needs to do extra processing 
			osStackAlloc(s)
			s.elemsize = uintptr(n)
		}
		v = unsafe.Pointer(s.base())
	}
	...
	return stack{uintptr(v), uintptr(v) + uintptr(n)}
}

For large stack memory allocation , It will be viewed at runtime stackLarge Is there any space left in the game , If there is no space left , It will also call mheap_.allocManual Request new memory from the heap .

Stack expansion

Stack overflow detection

The compiler will execute when the object code is generated :src/cmd/internal/obj/x86/obj6.go:stacksplit Insert the corresponding instruction according to the frame size of the function stack , Check current goroutine Is there enough stack space for .

  1. When stack frame size (FramSzie) Less than or equal to StackSmall(128) when , If SP Less than stackguard0 Then perform stack expansion ;
  2. When stack frame size (FramSzie) Greater than StackSmall(128) when , According to the formula SP - FramSzie + StackSmall and stackguard0 Compare , If Less than stackguard0 Then the expansion will be carried out ;
  3. When stack frame size (FramSzie) Greater than StackBig(4096) when , First of all, I will check stackguard0 Has it been transformed into StackPreempt Status quo ; And then according to the formula SP-stackguard0+StackGuard <= framesize + (StackGuard-StackSmall) Judge , If it is true Then the expansion will be carried out ;

Let's take a look first Pseudo code It will be clearer :

When stack frame size (FramSzie) Less than or equal to StackSmall(128) when

CMPQ SP, stackguard
JEQ	label-of-call-to-morestack

When stack frame size (FramSzie) Greater than StackSmall(128) when :

LEAQ -xxx(SP), AX 
CMPQ AX, stackguard
JEQ	label-of-call-to-morestack

here AX = SP - framesize + StackSmall, And then execute CMPQ Give orders AX And stackguard Compare ;

When stack frame size (FramSzie) Greater than StackBig(4096) when

MOVQ	stackguard, SI // SI = stackguard
CMPQ	SI, $StackPreempt // compare SI ,StackPreempt
JEQ	label-of-call-to-morestack
LEAQ	StackGuard(SP), AX // AX = SP + StackGuard
SUBQ	SI, AX // AX = AX - SI =  SP + StackGuard -stackguard
CMPQ	AX, $(framesize+(StackGuard-StackSmall))

The pseudo code here will be relatively complicated , because G Inside stackguard0 It may be assigned to StackPreempt, So it's clear if they've been preempted , So you need to stackguard0 and StackPreempt Compare . Then we will perform the comparison : SP-stackguard+StackGuard <= framesize + (StackGuard-StackSmall), Add on both sides StackGuard To make sure that the value on the left is positive .

I hope you don't read on until you understand the above code .

The main thing is , In the execution code of some functions , The compiler intelligently adds NOSPLIT Mark , Stack overflow detection will be disabled after this flag is marked , You can find this tag in the following code :

Code location :cmd/internal/obj/x86/obj6.go

...
	if ctxt.Arch.Family == sys.AMD64 && autoffset < objabi.StackSmall && !p.From.Sym.NoSplit() {
		leaf := true
	LeafSearch:
		for q := p; q != nil; q = q.Link {
			...
		}

		if leaf {
			p.From.Sym.Set(obj.AttrNoSplit, true)
		}
	}
...

The general code logic should be : When the function is in the leaf node of the call chain , And the stack frame is less than StackSmall Byte time , Is automatically marked as NOSPLIT. alike , When we write code, we can also add //go:nosplit Force the designation of NOSPLIT attribute .

Stack overflow instance

Now let's write a simple example :

func main() {
	a, b := 1, 2
	_ = add1(a, b)
	_ = add2(a, b)
	_ = add3(a, b)
}

func add1(x, y int) int {
	_ = make([]byte, 20)
	return x + y
}

func add2(x, y int) int {
	_ = make([]byte, 200)
	return x + y
}

func add3(x, y int) int {
	_ = make([]byte, 5000)
	return x + y
}

And print out a compilation of it :

$ GOOS=linux GOARCH=amd64 go tool compile -S -N -l main.go

The above example uses three method calls to explain the three situations mentioned above :

main function

        0x0000 00000 (main.go:3)        TEXT    "".main(SB), ABIInternal, $48-0
        0x0000 00000 (main.go:3)        MOVQ    (TLS), CX 
        0x0009 00009 (main.go:3)        CMPQ    SP, 16(CX) // SP < stackguard  Then jump to  129 perform 
        0x0009 00009 (main.go:3)        CMPQ    SP, 16(CX)
        0x000d 00013 (main.go:3)        PCDATA  $0, $-2
        0x000d 00013 (main.go:3)        JLS     129
    	... 
        0x0081 00129 (main.go:3)        CALL    runtime.morestack_noctxt(SB)
         

First , We from TLS ( thread local storage) Load a value in the variable to CX register , And then SP and 16(CX) Compare , What is TLS?16(CX) What does it stand for ?

Actually TLS It's a pseudo register , It means thread-local storage, It stores G Structure . Let's have a look at runtime Source code for G The definition of :

type g struct { 
	stack       stack   // offset known to runtime/cgo
	stackguard0 uintptr // offset known to liblink
	...
}
type stack struct {
	lo uintptr
	hi uintptr
}

You can see stack Account for the 16bytes, therefore 16(CX) The corresponding is g.stackguard0. therefore CMPQ SP, 16(CX) This line of code is actually a comparison SP and stackguard size . If SP Less than stackguard , So it means that we have reached the threshold of growth , Will execute JLS Jump to the 129 That's ok , call runtime.morestack_noctxt Perform the next stack expansion operation .

add1

        0x0000 00000 (main.go:10)       TEXT    "".add1(SB), NOSPLIT|ABIInternal, $32-24

We see add1 Assembly function of , You can see that its stack size is only 32 , Didn't arrive StackSmall 128 bytes Size , And it's another callee Callees , So you can send it and add NOSPLIT Mark , This confirms my above conclusion .

add2

"".add2 STEXT size=148 args=0x18 locals=0xd0
        0x0000 00000 (main.go:15)       TEXT    "".add2(SB), ABIInternal, $208-24
        0x0000 00000 (main.go:15)       MOVQ    (TLS), CX
		// AX = SP  - 208 + 128 = SP -80
        0x0009 00009 (main.go:15)       LEAQ    -80(SP), AX //  Stack size greater than StackSmall =128,  Calculation  SP - FramSzie + StackSmall  And put in AX register 							
        0x000e 00014 (main.go:15)       CMPQ    AX, 16(CX) // AX < stackguard  Then jump to  138  perform 
        0x0012 00018 (main.go:15)       PCDATA  $0, $-2
        0x0012 00018 (main.go:15)       JLS     138
		...
        0x008a 00138 (main.go:15)       CALL    runtime.morestack_noctxt(SB)

add2 The stack frame size of the function is 208, Greater than StackSmall 128 bytes , So you can see that first of all from TLS Load a value in the variable to CX register .

Then execute the command LEAQ -80(SP), AX, But why is this -80 In fact, I was quite confused at that time , But it should be noted that the calculation formula here is : SP - FramSzie + StackSmall, When you directly insert it, you will find that it is -80, Then load this number into AX In the register .

Last call CMPQ AX, 16(CX),16(CX) As we have said above, it is equal to stackguard0 , So here's the comparison AX And stackguard0 Big and small , If less than, jump to 138 Do it runtime.morestack_noctxt.

add3

"".add3 STEXT size=157 args=0x18 locals=0x1390
        0x0000 00000 (main.go:20)       TEXT    "".add3(SB), ABIInternal, $5008-24
        0x0000 00000 (main.go:20)       MOVQ    (TLS), CX 
        0x0009 00009 (main.go:20)       MOVQ    16(CX), SI //  take  stackguard  Assign a value to   SI
        0x000d 00013 (main.go:20)       PCDATA  $0, $-2 
        0x000d 00013 (main.go:20)       CMPQ    SI, $-1314 //  take  stackguard < stackPreempt  The jump to  147  perform 
        0x0014 00020 (main.go:20)       JEQ     147 
        0x0016 00022 (main.go:20)       LEAQ    928(SP), AX // AX = SP +928
        0x001e 00030 (main.go:20)       SUBQ    SI, AX // AX -= stackguard
        0x0021 00033 (main.go:20)       CMPQ    AX, $5808 // framesize + 928 -128  = 5808, Compare  AX < 5808, execute 147
        0x0027 00039 (main.go:20)       JLS     147
        ...
        0x0093 00147 (main.go:20)       CALL    runtime.morestack_noctxt(SB)

add3 Function is directly assigned a 5000 bytes The array is on the stack , So the beginning is the same , Will be taken from TLS Load a value in the variable to CX register , And then stackguard0 Assign a value to SI register ;

The next step is to execute the command CMPQ SI, $-1314, It's actually more stackguard0 and StackPreempt Size , As for why -1314 In fact, it will be called directly when the assembly code is inserted StackPreempt Variable , This variable is written dead in the code :

Code location :cmd/internal/objabi/stack.go

const (
	StackPreempt = -1314 // 0xfff...fade
)

If it's not preempted , So go straight down LEAQ 928(SP), AX, This command is equivalent to AX = SP +_StackGuard, stay Linux in _StackGuard be equal to 928;

Next SUBQ SI, AX, This command is equal to AX -= stackguard0 ;

Finally, execute CMPQ AX, $5808, This 5808 It's actually framesize + _StackGuard - _StackSmall, If AX Less than 5808 So jump to 147 Do it runtime.morestack_noctxt function .

This is the end of stack overflow detection , I read other articles , It should not be as comprehensive as I explained , In particular, the stack frame size is larger than _StackBig Overflow detection in real time .

Stack expansion

runtime.morestack_noctxt It's implemented in assembly , It will call runtime·morestack, Now let's look at its implementation :

Code location :src/runtime/asm_amd64.s

TEXT runtime·morestack(SB),NOSPLIT,$0-0
	// Cannot grow scheduler stack (m->g0).
	//  Unable to grow the scheduler stack (m->g0)
	get_tls(CX)
	MOVQ	g(CX), BX
	MOVQ	g_m(BX), BX
	MOVQ	m_g0(BX), SI
	CMPQ	g(CX), SI
	JNE	3(PC)
	CALL	runtime·badmorestackg0(SB)
	CALL	runtime·abort(SB)
	//  Omit signal stack、morebuf and sched To deal with 
	...
	// Call newstack on m->g0's stack.
	//  stay  m->g0  Call on the stack  newstack.
	MOVQ	m_g0(BX), BX
	MOVQ	BX, g(CX)
	MOVQ	(g_sched+gobuf_sp)(BX), SP
	CALL	runtime·newstack(SB)
	CALL	runtime·abort(SB)	//  If  newstack  Return and crash  crash if newstack returns
	RET

runtime·morestack After the check sum assignment operation, it will switch to G0 call runtime·newstack To complete the expansion operation .

runtime·newstack

func newstack() {
	thisg := getg() 

	gp := thisg.m.curg
	 
	//  Initialize register related variables 
	morebuf := thisg.m.morebuf
	thisg.m.morebuf.pc = 0
	thisg.m.morebuf.lr = 0
	thisg.m.morebuf.sp = 0
	thisg.m.morebuf.g = 0
	...
	//  Check if it is preempted 
	preempt := atomic.Loaduintptr(&gp.stackguard0) == stackPreempt
 
	//  If it's preempted 
	if preempt {
		//  Check whether it can be preempted safely 
		//  If  M  There's a lock on it 
		//  If memory allocation is in progress 
		//  If preemption is explicitly prohibited 
		//  If  P  The status of is not  running
		//  Then there will be no preemption 
		if !canPreemptM(thisg.m) {
			//  To be here means not to be preempted ?
			// Let the goroutine keep running for now.
			// gp->preempt is set, so it will be preempted next time.
			gp.stackguard0 = gp.stack.lo + _StackGuard
			//  Trigger scheduler scheduling 
			gogo(&gp.sched) // never return
		}
	}

	if gp.stack.lo == 0 {
		throw("missing stack in newstack")
	}
	//  register  sp
	sp := gp.sched.sp
	if sys.ArchFamily == sys.AMD64 || sys.ArchFamily == sys.I386 || sys.ArchFamily == sys.WASM {
		// The call to morestack cost a word.
		sp -= sys.PtrSize
	} 
	...
	if preempt {
		// Need to shrink stack 
		if gp.preemptShrink { 
			gp.preemptShrink = false
			shrinkstack(gp)
		}
		//  By  runtime.suspendG  Function suspend 
		if gp.preemptStop {
			//  Passively cede control of the current processor 
			preemptPark(gp) // never returns
		}
 
		// Actively cede control of the current processor 
		gopreempt_m(gp) // never return
	}
 
	//  Calculate the new stack space is twice the original 
	oldsize := gp.stack.hi - gp.stack.lo
	newsize := oldsize * 2 
	... 
	// take  Goroutine  Switch to  _Gcopystack  state 
	casgstatus(gp, _Grunning, _Gcopystack)
 
	// Start stack copy 
	copystack(gp, newsize) 
	casgstatus(gp, _Gcopystack, _Grunning)
	gogo(&gp.sched)
}

newstack The first half of the function assumes responsibility for Goroutine The task of preemption , For those who are not clear about task preemption, please read my article :《 Analysis from source code Go Language based signal preemptive scheduling https://www.luozhiyun.com/archives/485 》.

Before starting stack copy, we calculate that the size of the new stack is twice that of the original stack , And then Goroutine State switch to _Gcopystack state .

Stack copy

func copystack(gp *g, newsize uintptr) { 
	old := gp.stack 
	//  The size of stack space currently used 
	used := old.hi - gp.sched.sp
 
	// Allocate new stack space 
	new := stackalloc(uint32(newsize))
	...
 
	//  Calculate the magnitude of the adjustment 
	var adjinfo adjustinfo
	adjinfo.old = old
	//  The range of the new stack and the old stack controls the movement of the pointer 
	adjinfo.delta = new.hi - old.hi
 
	//  adjustment  sudogs,  Communicate with  channel  Operation synchronization 
	ncopy := used
	if !gp.activeStackChans {
		...
		adjustsudogs(gp, &adjinfo)
	} else {
		//  Coming here means there are blocked  G  At present  G  Of channel  in , So to prevent concurrent operations , Need to get  channel  Lock of 
		 
		//  In all  sudog  Find the pointer with the largest address in 
		adjinfo.sghi = findsghi(gp, old) 
		//  For all  sudog  The associated  channel  locked , Then adjust the pointer. , And copy  sudog  Point to the data of the old stack to the new stack 
		ncopy -= syncadjustsudogs(gp, used, &adjinfo)
	} 
	//  Copy the whole memory from the source stack to the new stack 
	memmove(unsafe.Pointer(new.hi-ncopy), unsafe.Pointer(old.hi-ncopy), ncopy)
	//  Continue to adjust the stack  txt、defer、panic  Pointer to position 
	adjustctxt(gp, &adjinfo)
	adjustdefers(gp, &adjinfo)
	adjustpanics(gp, &adjinfo)
	if adjinfo.sghi != 0 {
		adjinfo.sghi += adjinfo.delta
	} 
	//  take  G  Switch the stack reference on to the new stack 
	gp.stack = new
	gp.stackguard0 = new.lo + _StackGuard // NOTE: might clobber a preempt request
	gp.sched.sp = new.hi - used
	gp.stktopsp += adjinfo.delta
 
	//  Reset the pointer on the new stack 
	gentraceback(^uintptr(0), ^uintptr(0), 0, gp, 0, nil, 0x7fffffff, adjustframe, noescape(unsafe.Pointer(&adjinfo)), 0)
 
	if stackPoisonCopy != 0 {
		fillstack(old, 0xfc)
	}
	// Free the memory space of the original stack 
	stackfree(old)
}
  1. copystack First of all, we will calculate the size of stack space , So when you copy the stack, you only need to copy the used space ;
  2. And then call stackalloc Function to allocate a block of memory from the heap ;
  3. Then compare the new stack with the old one hi Calculates the difference between the two pieces of memory delta, This delta Will be in the call adjustsudogs、adjustctxt When waiting for the function, judge the memory pointer position of the old stack , And then add delta Then we get the pointer position of the new stack , So you can adjust the pointer to the new stack as well ;
new_stack
  1. call memmove Copy the whole memory from the source stack to the new stack ;
  2. Then continue to call the function to adjust the pointer, continue to adjust the stack txt、defer、panic Pointer to position ;
  3. The following will G Switch the stack reference on to the new stack ;
  4. Last call stackfree Free the memory space of the original stack ;

Shrinkage of stack

The shrinkage of the stack occurs in GC The stage of scanning stack at :

func scanstack(gp *g, gcw *gcWork) {
	... 
	//  Do stack shrink 
	shrinkstack(gp)
	...
}

If it's not clear GC If you want to read my article :《Go Language GC Implementation principle and source code analysis https://www.luozhiyun.com/archives/475 》.

runtime.shrinkstack

shrinkstack I've masked some check functions for this function , Just leave the following core logic :

func shrinkstack(gp *g) {
	...
	oldsize := gp.stack.hi - gp.stack.lo
	newsize := oldsize / 2 
	//  When the shrunk size is smaller than the size of the smallest stack , No more contractions 
	if newsize < _FixedStack {
		return
	}
	avail := gp.stack.hi - gp.stack.lo
	//  Count the number of stacks currently in use , If  gp  Less than a quarter of the current stack is used , Then shrink the stack 
	//  The stack currently used includes  SP  All the contents of the stack and the stack protection space , To make sure there is  nosplit  Functional space 
	if used := gp.stack.hi - gp.sched.sp + _StackLimit; used >= avail/4 {
		return
	}
	//  Copy the old stack to the new shrunk stack 
	copystack(gp, newsize)
}

The size of the new stack will be reduced by half , If it is less than _FixedStack (2KB) So no more contractions . In addition, we will also calculate whether the current stack usage is insufficient 1/4 , If you use more than 1/4 Then there will be no contraction .

Finally, if you decide to shrink, call copystack Function stack copy logic .

summary

If you don't know the memory layout of the students , It may be difficult to understand , Because when we look at the heap, memory growth is from small to large , And the stack is growing in the opposite direction , When doing stack instruction operation, it will SP Instead of decreasing, the stack frame increases .

Besides that is Go It uses plan9 This compilation , There is less information , Looks like trouble , For a deeper understanding of this compilation, please see the following Reference Information .

Reference

Talk about it goroutine stack https://kirk91.github.io/posts/2d571d09/

Anatomy of a Program in Memory https://manybutfinite.com/post/anatomy-of-a-program-in-memory/

stack-frame-layout-on-x86-64 https://eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64

Further study of goroutine Stack http://www.huamo.online/2019/06/25/%E6%B7%B1%E5%85%A5%E7%A0%94%E7%A9%B6goroutine%E6%A0%88/

x86-64 Next function call and stack frame principle https://zhuanlan.zhihu.com/p/27339191

Call stack https://en.wikipedia.org/wiki/Call_stack

plan9 assembly Complete resolution https://github.com/cch123/golang-notes/blob/master/assembly.md

Go Inside the language (5): The runtime starts the process https://studygolang.com/articles/7211

Go Introduction to assembly https://github.com/go-internals-cn/go-internals/blob/master/chapter1_assembly_primer/README.md

Go Assembly by Example https://davidwong.fr/goasm/

https://golang.org/doc/asm

luozhiyun cool
原网站

版权声明
本文为[luozhiyun]所创,转载请带上原文链接,感谢
https://yzsam.com/2021/04/20210405234620768b.html