当前位置:网站首页>Yiwen teaches you to understand the stack operation in go
Yiwen teaches you to understand the stack operation in go
2022-06-24 16:54:00 【luozhiyun】
Please state the source of reprint ~, This article was published at luozhiyun The blog of :https://www.luozhiyun.com/archives/513
This article uses go Source code 15.7
Knowledge point
LInux Process layout in memory
Each process in a multitasking operating system runs in its own memory sandbox . stay 32 In bit mode , It's always 4GB Memory address space , Memory allocation is the allocation of virtual memory to processes , When a process actually accesses a virtual memory address , The operating system triggers page missing interrupt , Allocate a corresponding space on the physical memory, and then establish a mapping relationship with it , So the virtual memory address that the process accesses , Will be automatically converted to a valid physical memory address , Then you can store and access the data .
Kernel space: Operating system kernel address space ;
Stack: Stack space , It is a local variable created temporarily by user stored program , The growth direction of stack is from high address to status address . In modern mainstream machine architecture ( for example x86) in , Stacks are growing down . However , There are also some processors ( for example B5000) Stacks grow up , There are also some architectures ( for example System Z) Allow custom stack growth direction , There are even some processors ( for example SPARC) It's the loop stack ;
Heap: Heap space , The heap is used to store the memory segments that are allocated dynamically during the process running , It's not fixed in size , Can expand or shrink dynamically ;
BBS segment:BSS paragraph , It stores global or static data , But it's the whole picture / Static uninitialized data ;
Data segment: Data segment , Usually refers to a block of memory used to store the initialized global variables in the program ;
Text segment: Code segment , An area of memory used to store the execution code of a program . The size of this area has been determined before the program runs , And the memory area is read-only .
Stack related concepts
In computer science, a call stack is a stack data structure that stores information about the active subroutines of a computer program.
In computer programming, a subroutine is a sequence of program instructions that performs a specific task, packaged as a unit.
The call stack call stack, Stack for short , It's a stack data structure , Used to store activities about computer programs subroutines Information . In computer programming ,subroutines Is a series of program instructions to perform a specific task , Packaged as a unit .
A stack frame is a frame of data that gets pushed onto the stack. In the case of a call stack, a stack frame would represent a function call and its argument data.
Stack frame stack frame It is also called frame frame Is the call relationship between functions stored in the call stack , Each frame corresponds to the function call and its parameter data .
With a function call, there must be a caller caller And the callee callee , If in function A in call function B,A yes caller,B yes callee.
The stack frame structure of the caller and callee is shown in the figure below :
Go The stack register in the assembly code is very vague , We only need to know about two registers BP and SP It's all right :
BP: Reference pointer register , Maintain the reference address of the current stack frame , To index variables and parameters , It's like an anchor , In other architectures, it's equivalent to a frame pointer FP, It's just x86 Under the architecture , Variables and parameters can be accessed through SP To index ;
SP: Stack pointer register , Always point to the top of the stack ;
Goroutine Stack operation
stay Goroutine There is one of them. stack data structure , There are two attributes in it lo And hi, Describes the actual stack memory address :
- stack.lo: The low address of the stack space ;
- stack.hi: The high address of the stack space ;
stay Goroutine China will pass stackguard0 To determine whether stack growth is needed :
- stackguard0:
stack.lo + StackGuard, be used for stack overlow Detection of ; - StackGuard: Reserve size , Constant Linux Up for 928 byte ;
- StackSmall: Constant size is 128 byte , Optimization for small function calls ;
- StackBig: Constant size is 4096 byte ;
According to the size of the called function stack frame to determine whether the need for expansion :
- When stack frame size (FramSzie) Less than or equal to StackSmall(128) when , If SP Less than stackguard0 Then perform stack expansion ;
- When stack frame size (FramSzie) Greater than StackSmall(128) when , According to the formula
SP - FramSzie + StackSmalland stackguard0 Compare , If Less than stackguard0 Then the expansion will be carried out ; - When stack frame size (FramSzie) Greater than StackBig(4096) when , First of all, I will check stackguard0 Has it been transformed into StackPreempt Status quo ; And then according to the formula
SP-stackguard0+StackGuard <= framesize + (StackGuard-StackSmall)Judge , If it is true Then the expansion will be carried out ;
It should be noted that , Because the stack is growing from high address to low address , So when comparing , They are all smaller than before expansion , We need your products here .
When performing stack expansion , Will allocate more stack memory space in memory space , Then copy everything from the old stack to the new stack , And modify the pointer to the corresponding variable of the old stack to point to the new stack again , Finally, destroy and reclaim the memory space of the old stack , So as to realize the dynamic expansion of the stack .
assembly
Here is a brief explanation of some of the problems that will be used in the later analysis Go Used in language Plan 9 assembly , In case you don't understand .
Assembly function
Let's take a look first plan9 The definition of assembly function of :
stack frame size: Parameter space containing local variables and extra calls to functions ;
arguments size: Contains parameters and the size of the return value , For example, the input parameter is 3 individual int64 type , The return value is 1 individual int64 type , So the return value is sizeof(int64) * 4;
Stack adjustment
The stack is adjusted by the hardware SP Register operation to achieve , for example :
SUBQ $24, SP // Yes sp subtracting , Assign function stack frames to functions ... ADDQ $24, SP // Yes sp add , Clear function stack frame
Because the stack is growing down , therefore SUBQ Yes SP Subtraction actually allocates stack frames to functions ,ADDQ Clear stack frame .
Common instructions
Addition and subtraction operation :
ADDQ AX, BX // BX += AX SUBQ AX, BX // BX -= AX
Data handling :
Constant in plan9 For assembly $num Express , Can be negative , Decimal by default . The length of handling is determined by MOV Suffix determination .
MOVB $1, DI // 1 byte MOVW $0x10, BX // 2 bytes MOVD $1, DX // 4 bytes MOVQ $-10, AX // 8 bytes
Another difference is in the use of MOVQ You can see the difference between brackets and no brackets .
// Parentheses are references to pointers MOVQ (AX), BX // => BX = *AX take AX Point to memory area 8byte Assign a value to BX MOVQ 16(AX), BX // => BX = *(AX + 16) // Without brackets is a reference to a value MOVQ AX, BX // => BX = AX take AX The content stored in is assigned to BX, Pay attention to differences
Jump :
// Jump unconditionally JMP addr // Jump to address , The address can be the address in the code JMP label // Jump to the tag , You can jump to the label position in the same function JMP 2(PC) // Based on the current instruction , forward / After jump x That's ok // Jump conditionally JLS addr
Address operation :
LEAQ (AX)(AX*2), CX // => CX = AX + (AX * 2) = AX * 3
In the above code 2 representative scale,scale Can only be 0、2、4、8.
analysis
G The creation of
Because it's all in the stack Goroutine Upper , So first from G How to create and initialize the stack space is the first step . Because I am 《 Detailed explanation Go Language scheduling loop source code implementation https://www.luozhiyun.com/archives/448 》 We've already talked about that G The creation of , So here we will only explain the code of initialization part of stack .
G The creation of the runtime·newproc Create :
runtime.newproc
func newproc(siz int32, fn *funcval) {
argp := add(unsafe.Pointer(&fn), sys.PtrSize)
gp := getg()
// obtain caller Of PC register
pc := getcallerpc()
// Switch to G0 Create
systemstack(func() {
newg := newproc1(fn, argp, siz, gp, pc)
...
})
}newproc The method switches to G0 On the call newproc1 Function G The creation of .
runtime.newproc1
const _StackMin = 2048
func newproc1(fn *funcval, argp unsafe.Pointer, narg int32, callergp *g, callerpc uintptr) *g {
_g_ := getg()
...
_p_ := _g_.m.p.ptr()
// from P Get a new G
newg := gfget(_p_)
// If not, call malg Create
if newg == nil {
newg = malg(_StackMin)
casgstatus(newg, _Gidle, _Gdead)
allgadd(newg) // publishes with a g->status of Gdead so GC scanner doesn't look at uninitialized stack.
}
...
return newg
}newproc1 The method is very long. , It's mainly about getting G , And then to get G Do some initialization work . We only look here malg Function call .
Calling malg Function will pass in a value of the minimum stack size :_StackMin(2048).
runtime.malg
func malg(stacksize int32) *g {
// establish G Structure
newg := new(g)
if stacksize >= 0 {
// There will be stacksize The memory size required for system calls is reserved for each stack _StackSystem
// stay Linux/Darwin On ( _StackSystem == 0 ) The bank does not change stacksize Size
stacksize = round2(_StackSystem + stacksize)
// Switch to G0 by newg Initialize stack memory
systemstack(func() {
newg.stack = stackalloc(uint32(stacksize))
})
// Set up stackguard0 , Used to determine whether to expand the stack
newg.stackguard0 = newg.stack.lo + _StackGuard
newg.stackguard1 = ^uintptr(0)
*(*uintptr)(unsafe.Pointer(newg.stack.lo)) = 0
}
return newg
}Calling malg The size of the incoming memory will be added with a _StackSystem Value is reserved for system calls ,round2 The function rounds the value passed in to 2 The index of . Then it will switch to G0 perform stackalloc Function for stack memory allocation .
It will be set after the assignment stackguard0 by stack.lo + _StackGuard, To judge whether stack expansion is needed , We'll talk about .
Initialization of stack
file location :src/runtime/stack.go
// Global stack cache , Distribute 32KB The following memory
var stackpool [_NumStackOrders]struct {
item stackpoolItem
_ [cpu.CacheLinePadSize - unsafe.Sizeof(stackpoolItem{})%cpu.CacheLinePadSize]byte
}
//go:notinheap
type stackpoolItem struct {
mu mutex
span mSpanList
}
// Global stack cache , Distribute 32KB Above memory
var stackLarge struct {
lock mutex
free [heapAddrBits - pageShift]mSpanList // free lists by log_2(s.npages)
}
// initialization stackpool/stackLarge Global variables
func stackinit() {
if _StackCacheSize&_PageMask != 0 {
throw("cache size must be a multiple of page size")
}
for i := range stackpool {
stackpool[i].item.span.init()
lockInit(&stackpool[i].item.mu, lockRankStackpool)
}
for i := range stackLarge.free {
stackLarge.free[i].init()
lockInit(&stackLarge.lock, lockRankStackLarge)
}
} Before we do stack allocation, let's see what we do when we initialize the stack , It should be noted that ,stackinit Is in the call runtime·schedinit The initialization of the , Is in the call runtime·newproc What happened before .
Two global variables are initialized during stack initialization stackpool and stackLarge.stackpool Can assign less than 32KB Of memory ,stackLarge Used to allocate more than 32KB Stack space of .
Stack allocation
From the initialization of two global variables, we can also know , The stack will be allocated from different locations according to its size .
Stack memory allocation
file location :src/runtime/stack.go
func stackalloc(n uint32) stack {
// there G yes G0
thisg := getg()
...
var v unsafe.Pointer
// stay Linux On ,_FixedStack = 2048、_NumStackOrders = 4、_StackCacheSize = 32768
// If the stack space requested is less than 32KB
if n < _FixedStack<<_NumStackOrders && n < _StackCacheSize {
order := uint8(0)
n2 := n
// Greater than 2048 , that for loop take n2 except 2, until n Less than or equal to 2048
for n2 > _FixedStack {
// order Except how many times
order++
n2 >>= 1
}
var x gclinkptr
//preemptoff != "", stay GC It'll be set up when it's done , If the GC So from stackpool Distribute
// thisg.m.p = 0 In the system call and change P Call when the number of , In case of , Then from stackpool Distribute
if stackNoCache != 0 || thisg.m.p == 0 || thisg.m.preemptoff != "" {
lock(&stackpool[order].item.mu)
// from stackpool Distribute
x = stackpoolalloc(order)
unlock(&stackpool[order].item.mu)
} else {
// from P Of mcache Allocate memory
c := thisg.m.p.ptr().mcache
x = c.stackcache[order].list
if x.ptr() == nil {
// Apply a piece of memory from the heap to fill in stackcache in
stackcacherefill(c, order)
x = c.stackcache[order].list
}
// Remove the head node of the linked list
c.stackcache[order].list = x.ptr().next
c.stackcache[order].size -= uintptr(n)
}
// Get the assigned span Memory block
v = unsafe.Pointer(x)
} else {
...
}
...
return stack{uintptr(v), uintptr(v) + uintptr(n)}
}stackalloc According to the parameters passed in n The size of the distribution , stay Linux On if n Less than 32768 bytes, That is to say 32KB , Then it goes into the allocation logic of the stack .
The stack finger size is 2K/4K/8K/16K The stack , At the time of distribution , Will be calculated according to the size of different order value , If the stack size is 2K, that order Namely 0,4K Corresponding order Namely 1, And so on . On the one hand, it can reduce differences Goroutine Get lock conflicts of different stack sizes , On the other hand, you can cache the corresponding size in advance span , For quick access to .
thisg.m.p == 0 It can happen in a system call exitsyscall Or change P The number of procresize when ,thisg.m.preemptoff != "" It will happen in GC When . That is to say, in system call exitsyscall Or change P The number of students is changing , Or in GC When , From stackpool Allocate stack space , Otherwise, from mcache In order to get .
If mcache Corresponding stackcache Can't get , So called stackcacherefill Apply a piece of memory from the heap to fill in stackcache in .
The main thing is ,stackalloc Due to switching to G0 To call , therefore thisg yes G0, We can also pass 《 How to compile and debug Go runtime Source code https://www.luozhiyun.com/archives/506 》 This article is about how to debug :
func stackalloc(n uint32) stack {
thisg := getg()
// Add a line to print
if debug.schedtrace > 0 {
print("stackalloc runtime: gp: gp=", thisg, ", goid=", thisg.goid, ", gp->atomicstatus=", readgstatus(thisg), "\n")
}
...
}Let's take a look at stackpoolalloc And stackcacherefill function .
runtime.stackpoolalloc
func stackpoolalloc(order uint8) gclinkptr {
list := &stackpool[order].item.span
s := list.first
lockWithRankMayAcquire(&mheap_.lock, lockRankMheap)
if s == nil {
// no free stacks. Allocate another span worth.
// Distribute... From the heap mspan
// _StackCacheSize = 32 * 1024
s = mheap_.allocManual(_StackCacheSize>>_PageShift, &memstats.stacks_inuse)
if s == nil {
throw("out of memory")
}
// Just assigned span The number of objects allocated must be 0
if s.allocCount != 0 {
throw("bad allocCount")
}
if s.manualFreeList.ptr() != nil {
throw("bad manualFreeList")
}
//OpenBSD 6.4+ The system needs to do extra processing
osStackAlloc(s)
// Linux in _FixedStack = 2048
s.elemsize = _FixedStack << order
//_StackCacheSize = 32 * 1024
// Here is the general 32KB The size of the memory block is divided into elemsize Size block , Connect with a one-way list
// Last s.manualFreeList It points to the end of the memory
for i := uintptr(0); i < _StackCacheSize; i += s.elemsize {
x := gclinkptr(s.base() + i)
x.ptr().next = s.manualFreeList
s.manualFreeList = x
}
// Insert into list List the head
list.insert(s)
}
x := s.manualFreeList
// Representatives have been assigned
if x.ptr() == nil {
throw("span has no free stacks")
}
// take manualFreeList Move back one unit
s.manualFreeList = x.ptr().next
// Count the memory blocks allocated
s.allocCount++
// Because the first memory block allocated is nil
// So when the pointer is nil The delegates were assigned when they arrived
// Then you need to remove the object from list Remove the head node of
if s.manualFreeList.ptr() == nil {
// all stacks in s are allocated.
list.remove(s)
}
return x
}stay stackpoolalloc We'll find... In the function stackpool Corresponding order Subscript span The head node of the list , If it's not empty , Then, the attribute of the head node is directly manualFreeList The node pointed to is removed from the linked list , And back to ;
If list.first It's empty , So called mheap_ Of allocManual Functions are allocated from the heap mspan, Specific memory allocation related articles can see my article :《 Detailed explanation Go Memory allocation source code implementation https://www.luozhiyun.com/archives/434 》.
from allocManual The function will be assigned 32KB Memory block size , Assign new ones span After that, according to elemsize Size will 32KB Memory for cutting , And then through one-way linked list string together and the last memory address assignment to manualFreeList .
For example, the current elemsize The memory size represented is 8KB size :
runtime.stackcacherefill
func stackcacherefill(c *mcache, order uint8) {
var list gclinkptr
var size uintptr
lock(&stackpool[order].item.mu)
//_StackCacheSize = 32 * 1024
// take stackpool The allocated memory forms a one-way linked list list
for size < _StackCacheSize/2 {
x := stackpoolalloc(order)
x.ptr().next = list
list = x
// _FixedStack = 2048
size += _FixedStack << order
}
unlock(&stackpool[order].item.mu)
c.stackcache[order].list = list
c.stackcache[order].size = size
}stackcacherefill Function will call stackpoolalloc from stackpool Get half of the space in and assemble it into list Linked list , Then put it into the stackcache Array .
Large stack memory allocation
func stackalloc(n uint32) stack {
thisg := getg()
var v unsafe.Pointer
if n < _FixedStack<<_NumStackOrders && n < _StackCacheSize {
...
} else {
// The requested memory space is too large , from runtime.stackLarge Check to see if there is any space left
var s *mspan
// How many will be allocated span page , 8KB For a page
npage := uintptr(n) >> _PageShift
// Calculation npage It can be 2 How many times , Used as an index of blocks of different sizes of memory
log2npage := stacklog2(npage)
lock(&stackLarge.lock)
// If stackLarge The corresponding linked list is not empty
if !stackLarge.free[log2npage].isEmpty() {
// Get the head node of the linked list , And remove it from the list
s = stackLarge.free[log2npage].first
stackLarge.free[log2npage].remove(s)
}
unlock(&stackLarge.lock)
lockWithRankMayAcquire(&mheap_.lock, lockRankMheap)
// Here is stackLarge Empty situation
if s == nil {
// Request new memory from the heap span
s = mheap_.allocManual(npage, &memstats.stacks_inuse)
if s == nil {
throw("out of memory")
}
// OpenBSD 6.4+ The system needs to do extra processing
osStackAlloc(s)
s.elemsize = uintptr(n)
}
v = unsafe.Pointer(s.base())
}
...
return stack{uintptr(v), uintptr(v) + uintptr(n)}
} For large stack memory allocation , It will be viewed at runtime stackLarge Is there any space left in the game , If there is no space left , It will also call mheap_.allocManual Request new memory from the heap .
Stack expansion
Stack overflow detection
The compiler will execute when the object code is generated :src/cmd/internal/obj/x86/obj6.go:stacksplit Insert the corresponding instruction according to the frame size of the function stack , Check current goroutine Is there enough stack space for .
- When stack frame size (FramSzie) Less than or equal to StackSmall(128) when , If SP Less than stackguard0 Then perform stack expansion ;
- When stack frame size (FramSzie) Greater than StackSmall(128) when , According to the formula
SP - FramSzie + StackSmalland stackguard0 Compare , If Less than stackguard0 Then the expansion will be carried out ; - When stack frame size (FramSzie) Greater than StackBig(4096) when , First of all, I will check stackguard0 Has it been transformed into StackPreempt Status quo ; And then according to the formula
SP-stackguard0+StackGuard <= framesize + (StackGuard-StackSmall)Judge , If it is true Then the expansion will be carried out ;
Let's take a look first Pseudo code It will be clearer :
When stack frame size (FramSzie) Less than or equal to StackSmall(128) when :
CMPQ SP, stackguard JEQ label-of-call-to-morestack
When stack frame size (FramSzie) Greater than StackSmall(128) when :
LEAQ -xxx(SP), AX CMPQ AX, stackguard JEQ label-of-call-to-morestack
here AX = SP - framesize + StackSmall, And then execute CMPQ Give orders AX And stackguard Compare ;
When stack frame size (FramSzie) Greater than StackBig(4096) when :
MOVQ stackguard, SI // SI = stackguard CMPQ SI, $StackPreempt // compare SI ,StackPreempt JEQ label-of-call-to-morestack LEAQ StackGuard(SP), AX // AX = SP + StackGuard SUBQ SI, AX // AX = AX - SI = SP + StackGuard -stackguard CMPQ AX, $(framesize+(StackGuard-StackSmall))
The pseudo code here will be relatively complicated , because G Inside stackguard0 It may be assigned to StackPreempt, So it's clear if they've been preempted , So you need to stackguard0 and StackPreempt Compare . Then we will perform the comparison : SP-stackguard+StackGuard <= framesize + (StackGuard-StackSmall), Add on both sides StackGuard To make sure that the value on the left is positive .
I hope you don't read on until you understand the above code .
The main thing is , In the execution code of some functions , The compiler intelligently adds NOSPLIT Mark , Stack overflow detection will be disabled after this flag is marked , You can find this tag in the following code :
Code location :cmd/internal/obj/x86/obj6.go
...
if ctxt.Arch.Family == sys.AMD64 && autoffset < objabi.StackSmall && !p.From.Sym.NoSplit() {
leaf := true
LeafSearch:
for q := p; q != nil; q = q.Link {
...
}
if leaf {
p.From.Sym.Set(obj.AttrNoSplit, true)
}
}
... The general code logic should be : When the function is in the leaf node of the call chain , And the stack frame is less than StackSmall Byte time , Is automatically marked as NOSPLIT. alike , When we write code, we can also add //go:nosplit Force the designation of NOSPLIT attribute .
Stack overflow instance
Now let's write a simple example :
func main() {
a, b := 1, 2
_ = add1(a, b)
_ = add2(a, b)
_ = add3(a, b)
}
func add1(x, y int) int {
_ = make([]byte, 20)
return x + y
}
func add2(x, y int) int {
_ = make([]byte, 200)
return x + y
}
func add3(x, y int) int {
_ = make([]byte, 5000)
return x + y
}And print out a compilation of it :
$ GOOS=linux GOARCH=amd64 go tool compile -S -N -l main.go
The above example uses three method calls to explain the three situations mentioned above :
main function
0x0000 00000 (main.go:3) TEXT "".main(SB), ABIInternal, $48-0
0x0000 00000 (main.go:3) MOVQ (TLS), CX
0x0009 00009 (main.go:3) CMPQ SP, 16(CX) // SP < stackguard Then jump to 129 perform
0x0009 00009 (main.go:3) CMPQ SP, 16(CX)
0x000d 00013 (main.go:3) PCDATA $0, $-2
0x000d 00013 (main.go:3) JLS 129
...
0x0081 00129 (main.go:3) CALL runtime.morestack_noctxt(SB)
First , We from TLS ( thread local storage) Load a value in the variable to CX register , And then SP and 16(CX) Compare , What is TLS?16(CX) What does it stand for ?
Actually TLS It's a pseudo register , It means thread-local storage, It stores G Structure . Let's have a look at runtime Source code for G The definition of :
type g struct {
stack stack // offset known to runtime/cgo
stackguard0 uintptr // offset known to liblink
...
}
type stack struct {
lo uintptr
hi uintptr
} You can see stack Account for the 16bytes, therefore 16(CX) The corresponding is g.stackguard0. therefore CMPQ SP, 16(CX) This line of code is actually a comparison SP and stackguard size . If SP Less than stackguard , So it means that we have reached the threshold of growth , Will execute JLS Jump to the 129 That's ok , call runtime.morestack_noctxt Perform the next stack expansion operation .
add1
0x0000 00000 (main.go:10) TEXT "".add1(SB), NOSPLIT|ABIInternal, $32-24
We see add1 Assembly function of , You can see that its stack size is only 32 , Didn't arrive StackSmall 128 bytes Size , And it's another callee Callees , So you can send it and add NOSPLIT Mark , This confirms my above conclusion .
add2
"".add2 STEXT size=148 args=0x18 locals=0xd0
0x0000 00000 (main.go:15) TEXT "".add2(SB), ABIInternal, $208-24
0x0000 00000 (main.go:15) MOVQ (TLS), CX
// AX = SP - 208 + 128 = SP -80
0x0009 00009 (main.go:15) LEAQ -80(SP), AX // Stack size greater than StackSmall =128, Calculation SP - FramSzie + StackSmall And put in AX register
0x000e 00014 (main.go:15) CMPQ AX, 16(CX) // AX < stackguard Then jump to 138 perform
0x0012 00018 (main.go:15) PCDATA $0, $-2
0x0012 00018 (main.go:15) JLS 138
...
0x008a 00138 (main.go:15) CALL runtime.morestack_noctxt(SB)add2 The stack frame size of the function is 208, Greater than StackSmall 128 bytes , So you can see that first of all from TLS Load a value in the variable to CX register .
Then execute the command LEAQ -80(SP), AX, But why is this -80 In fact, I was quite confused at that time , But it should be noted that the calculation formula here is : SP - FramSzie + StackSmall, When you directly insert it, you will find that it is -80, Then load this number into AX In the register .
Last call CMPQ AX, 16(CX),16(CX) As we have said above, it is equal to stackguard0 , So here's the comparison AX And stackguard0 Big and small , If less than, jump to 138 Do it runtime.morestack_noctxt.
add3
"".add3 STEXT size=157 args=0x18 locals=0x1390
0x0000 00000 (main.go:20) TEXT "".add3(SB), ABIInternal, $5008-24
0x0000 00000 (main.go:20) MOVQ (TLS), CX
0x0009 00009 (main.go:20) MOVQ 16(CX), SI // take stackguard Assign a value to SI
0x000d 00013 (main.go:20) PCDATA $0, $-2
0x000d 00013 (main.go:20) CMPQ SI, $-1314 // take stackguard < stackPreempt The jump to 147 perform
0x0014 00020 (main.go:20) JEQ 147
0x0016 00022 (main.go:20) LEAQ 928(SP), AX // AX = SP +928
0x001e 00030 (main.go:20) SUBQ SI, AX // AX -= stackguard
0x0021 00033 (main.go:20) CMPQ AX, $5808 // framesize + 928 -128 = 5808, Compare AX < 5808, execute 147
0x0027 00039 (main.go:20) JLS 147
...
0x0093 00147 (main.go:20) CALL runtime.morestack_noctxt(SB)add3 Function is directly assigned a 5000 bytes The array is on the stack , So the beginning is the same , Will be taken from TLS Load a value in the variable to CX register , And then stackguard0 Assign a value to SI register ;
The next step is to execute the command CMPQ SI, $-1314, It's actually more stackguard0 and StackPreempt Size , As for why -1314 In fact, it will be called directly when the assembly code is inserted StackPreempt Variable , This variable is written dead in the code :
Code location :cmd/internal/objabi/stack.go
const ( StackPreempt = -1314 // 0xfff...fade )
If it's not preempted , So go straight down LEAQ 928(SP), AX, This command is equivalent to AX = SP +_StackGuard, stay Linux in _StackGuard be equal to 928;
Next SUBQ SI, AX, This command is equal to AX -= stackguard0 ;
Finally, execute CMPQ AX, $5808, This 5808 It's actually framesize + _StackGuard - _StackSmall, If AX Less than 5808 So jump to 147 Do it runtime.morestack_noctxt function .
This is the end of stack overflow detection , I read other articles , It should not be as comprehensive as I explained , In particular, the stack frame size is larger than _StackBig Overflow detection in real time .
Stack expansion
runtime.morestack_noctxt It's implemented in assembly , It will call runtime·morestack, Now let's look at its implementation :
Code location :src/runtime/asm_amd64.s
TEXT runtime·morestack(SB),NOSPLIT,$0-0 // Cannot grow scheduler stack (m->g0). // Unable to grow the scheduler stack (m->g0) get_tls(CX) MOVQ g(CX), BX MOVQ g_m(BX), BX MOVQ m_g0(BX), SI CMPQ g(CX), SI JNE 3(PC) CALL runtime·badmorestackg0(SB) CALL runtime·abort(SB) // Omit signal stack、morebuf and sched To deal with ... // Call newstack on m->g0's stack. // stay m->g0 Call on the stack newstack. MOVQ m_g0(BX), BX MOVQ BX, g(CX) MOVQ (g_sched+gobuf_sp)(BX), SP CALL runtime·newstack(SB) CALL runtime·abort(SB) // If newstack Return and crash crash if newstack returns RET
runtime·morestack After the check sum assignment operation, it will switch to G0 call runtime·newstack To complete the expansion operation .
runtime·newstack
func newstack() {
thisg := getg()
gp := thisg.m.curg
// Initialize register related variables
morebuf := thisg.m.morebuf
thisg.m.morebuf.pc = 0
thisg.m.morebuf.lr = 0
thisg.m.morebuf.sp = 0
thisg.m.morebuf.g = 0
...
// Check if it is preempted
preempt := atomic.Loaduintptr(&gp.stackguard0) == stackPreempt
// If it's preempted
if preempt {
// Check whether it can be preempted safely
// If M There's a lock on it
// If memory allocation is in progress
// If preemption is explicitly prohibited
// If P The status of is not running
// Then there will be no preemption
if !canPreemptM(thisg.m) {
// To be here means not to be preempted ?
// Let the goroutine keep running for now.
// gp->preempt is set, so it will be preempted next time.
gp.stackguard0 = gp.stack.lo + _StackGuard
// Trigger scheduler scheduling
gogo(&gp.sched) // never return
}
}
if gp.stack.lo == 0 {
throw("missing stack in newstack")
}
// register sp
sp := gp.sched.sp
if sys.ArchFamily == sys.AMD64 || sys.ArchFamily == sys.I386 || sys.ArchFamily == sys.WASM {
// The call to morestack cost a word.
sp -= sys.PtrSize
}
...
if preempt {
// Need to shrink stack
if gp.preemptShrink {
gp.preemptShrink = false
shrinkstack(gp)
}
// By runtime.suspendG Function suspend
if gp.preemptStop {
// Passively cede control of the current processor
preemptPark(gp) // never returns
}
// Actively cede control of the current processor
gopreempt_m(gp) // never return
}
// Calculate the new stack space is twice the original
oldsize := gp.stack.hi - gp.stack.lo
newsize := oldsize * 2
...
// take Goroutine Switch to _Gcopystack state
casgstatus(gp, _Grunning, _Gcopystack)
// Start stack copy
copystack(gp, newsize)
casgstatus(gp, _Gcopystack, _Grunning)
gogo(&gp.sched)
}newstack The first half of the function assumes responsibility for Goroutine The task of preemption , For those who are not clear about task preemption, please read my article :《 Analysis from source code Go Language based signal preemptive scheduling https://www.luozhiyun.com/archives/485 》.
Before starting stack copy, we calculate that the size of the new stack is twice that of the original stack , And then Goroutine State switch to _Gcopystack state .
Stack copy
func copystack(gp *g, newsize uintptr) {
old := gp.stack
// The size of stack space currently used
used := old.hi - gp.sched.sp
// Allocate new stack space
new := stackalloc(uint32(newsize))
...
// Calculate the magnitude of the adjustment
var adjinfo adjustinfo
adjinfo.old = old
// The range of the new stack and the old stack controls the movement of the pointer
adjinfo.delta = new.hi - old.hi
// adjustment sudogs, Communicate with channel Operation synchronization
ncopy := used
if !gp.activeStackChans {
...
adjustsudogs(gp, &adjinfo)
} else {
// Coming here means there are blocked G At present G Of channel in , So to prevent concurrent operations , Need to get channel Lock of
// In all sudog Find the pointer with the largest address in
adjinfo.sghi = findsghi(gp, old)
// For all sudog The associated channel locked , Then adjust the pointer. , And copy sudog Point to the data of the old stack to the new stack
ncopy -= syncadjustsudogs(gp, used, &adjinfo)
}
// Copy the whole memory from the source stack to the new stack
memmove(unsafe.Pointer(new.hi-ncopy), unsafe.Pointer(old.hi-ncopy), ncopy)
// Continue to adjust the stack txt、defer、panic Pointer to position
adjustctxt(gp, &adjinfo)
adjustdefers(gp, &adjinfo)
adjustpanics(gp, &adjinfo)
if adjinfo.sghi != 0 {
adjinfo.sghi += adjinfo.delta
}
// take G Switch the stack reference on to the new stack
gp.stack = new
gp.stackguard0 = new.lo + _StackGuard // NOTE: might clobber a preempt request
gp.sched.sp = new.hi - used
gp.stktopsp += adjinfo.delta
// Reset the pointer on the new stack
gentraceback(^uintptr(0), ^uintptr(0), 0, gp, 0, nil, 0x7fffffff, adjustframe, noescape(unsafe.Pointer(&adjinfo)), 0)
if stackPoisonCopy != 0 {
fillstack(old, 0xfc)
}
// Free the memory space of the original stack
stackfree(old)
}- copystack First of all, we will calculate the size of stack space , So when you copy the stack, you only need to copy the used space ;
- And then call stackalloc Function to allocate a block of memory from the heap ;
- Then compare the new stack with the old one hi Calculates the difference between the two pieces of memory delta, This delta Will be in the call adjustsudogs、adjustctxt When waiting for the function, judge the memory pointer position of the old stack , And then add delta Then we get the pointer position of the new stack , So you can adjust the pointer to the new stack as well ;
- call memmove Copy the whole memory from the source stack to the new stack ;
- Then continue to call the function to adjust the pointer, continue to adjust the stack txt、defer、panic Pointer to position ;
- The following will G Switch the stack reference on to the new stack ;
- Last call stackfree Free the memory space of the original stack ;
Shrinkage of stack
The shrinkage of the stack occurs in GC The stage of scanning stack at :
func scanstack(gp *g, gcw *gcWork) {
...
// Do stack shrink
shrinkstack(gp)
...
}If it's not clear GC If you want to read my article :《Go Language GC Implementation principle and source code analysis https://www.luozhiyun.com/archives/475 》.
runtime.shrinkstack
shrinkstack I've masked some check functions for this function , Just leave the following core logic :
func shrinkstack(gp *g) {
...
oldsize := gp.stack.hi - gp.stack.lo
newsize := oldsize / 2
// When the shrunk size is smaller than the size of the smallest stack , No more contractions
if newsize < _FixedStack {
return
}
avail := gp.stack.hi - gp.stack.lo
// Count the number of stacks currently in use , If gp Less than a quarter of the current stack is used , Then shrink the stack
// The stack currently used includes SP All the contents of the stack and the stack protection space , To make sure there is nosplit Functional space
if used := gp.stack.hi - gp.sched.sp + _StackLimit; used >= avail/4 {
return
}
// Copy the old stack to the new shrunk stack
copystack(gp, newsize)
}The size of the new stack will be reduced by half , If it is less than _FixedStack (2KB) So no more contractions . In addition, we will also calculate whether the current stack usage is insufficient 1/4 , If you use more than 1/4 Then there will be no contraction .
Finally, if you decide to shrink, call copystack Function stack copy logic .
summary
If you don't know the memory layout of the students , It may be difficult to understand , Because when we look at the heap, memory growth is from small to large , And the stack is growing in the opposite direction , When doing stack instruction operation, it will SP Instead of decreasing, the stack frame increases .
Besides that is Go It uses plan9 This compilation , There is less information , Looks like trouble , For a deeper understanding of this compilation, please see the following Reference Information .
Reference
Talk about it goroutine stack https://kirk91.github.io/posts/2d571d09/
Anatomy of a Program in Memory https://manybutfinite.com/post/anatomy-of-a-program-in-memory/
stack-frame-layout-on-x86-64 https://eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64
Further study of goroutine Stack http://www.huamo.online/2019/06/25/%E6%B7%B1%E5%85%A5%E7%A0%94%E7%A9%B6goroutine%E6%A0%88/
x86-64 Next function call and stack frame principle https://zhuanlan.zhihu.com/p/27339191
Call stack https://en.wikipedia.org/wiki/Call_stack
plan9 assembly Complete resolution https://github.com/cch123/golang-notes/blob/master/assembly.md
Go Inside the language (5): The runtime starts the process https://studygolang.com/articles/7211
Go Introduction to assembly https://github.com/go-internals-cn/go-internals/blob/master/chapter1_assembly_primer/README.md
Go Assembly by Example https://davidwong.fr/goasm/
边栏推荐
- Prometheus deployment
- Memo list: useful commands for ffmpeg command line tools
- Popular explanation [redirection] and its practice
- Principle analysis of robot hardware in the loop system
- 区块哈希游戏竞猜系统开发(成熟代码)
- The problem is as big as the middle stage
- FPGA project development: experience sharing of lmk04821 chip project development based on jesd204b (I)
- The mystery of redis data migration capacity
- A survey on dynamic neural networks for natural language processing, University of California
- Tencent cloud database mysql:sql flow restriction
猜你喜欢

Problems encountered in the work of product manager
![[go] concurrent programming channel](/img/6a/d62678467bbc6dfb6a50ae42bacc96.jpg)
[go] concurrent programming channel

Ui- first lesson
![[leetcode108] convert an ordered array into a binary search tree (medium order traversal)](/img/e1/0fac59a531040d74fd7531e2840eb5.jpg)
[leetcode108] convert an ordered array into a binary search tree (medium order traversal)

A survey on model compression for natural language processing (NLP model compression overview)

Cognition and difference of service number, subscription number, applet and enterprise number (enterprise wechat)

Ps\ai and other design software pondering notes

Applet wxss

A survey on dynamic neural networks for natural language processing, University of California

A survey of training on graphs: taxonomy, methods, and Applications
随机推荐
[tke] whether to configure SNAT when the container accesses services outside the node
How to access tke cluster API interface with certificate or token
If only 2 people are recruited, can the enterprise do a good job in content risk control?
What does the router pin mean?
实现TypeScript运行时类型检查
What is the difference between a network card and a port
Virtual machine virtual disk recovery case tutorial
Building a cross public chain platform to solve DAPP development problems
Teach you to write a classic dodge game
05. Tencent cloud IOT device side learning -- mqtt protocol client implementation
TVP experts talk about geese factory middleware: innovating forward and meeting the future
Video intelligent analysis platform easycvr derivative video management platform menu bar small screen adaptive optimization
Memo list: useful commands for ffmpeg command line tools
Regular expression learning artifact!
Tencent blue whale Zhiyun community version v6.0.3 was officially released together with the container management platform!
Robot toolbox matlab robotics toolbox
Go path customized project path package dependency
Try catch finally implementation mechanism
NFT元宇宙源码搭建解析与介绍
Kubernetes popular series: getting started with container Foundation