当前位置：网站首页>What is memory out of order access?

What is memory out of order access?

2022-06-24 00:59:00 【Embedded Island】

What is memory out of order access ？

It is more and more interesting to dig into the underlying principles of the computer , Today, let's talk about memory out of order execution .

First of all, let me ask you a question ： Will the program we write be executed in the given order ？

There seems to be no doubt about it . But know how to compile 、 Link principle “ Bottom ” knowledge , It is not easy to draw a conclusion . Especially when multithreading involves memory sharing without locking , It will also expose the problem .

So I'm sorry , In some cases , The order of execution of program instructions will change , This gives rise to what we call Memory disorder problem .

Disordered execution technology is that the processor optimizes the original order of the code in order to improve the operation speed .

But also very lucky , We can take the hand “ Disorder ” Corrected to “ The order ”.

Memory out of order access is generally divided into two types ： Compile out of order and execute out of order . Below we give examples to illustrate the phenomenon and introduce the methods to avoid disorder .

1. Compilation disorder

The fundamental reason for compiler out of order optimization is that the processor can only analyze a small block of instructions at a time , But compilers can do code analysis on a large scale , So as to make better strategies .

Let's write two simple lines of program to reproduce the performance of compilation disorder .

int x, y, z;
void fun(){
    x = y;
    z = 1;
}

adopt gcc View the compiled assembly instructions , Here we use O3 Optimization grade ：

gcc -S demo.c -O3

Intercept a piece of code that we focus on ：

fun:
.LFB0:
	.cfi_startproc
	endbr64
	movl	$1, z(%rip)  " z = 1
	movl	y(%rip), %eax
	movl	%eax, x(%rip) " x = y
	ret
	.cfi_endproc

obviously , The compiler is switched x = y; z = 1; The execution order of the two statements .

So how to solve the trouble caused by disordered compilation ？ There are several solutions ：

Compile optimization level
volatile
Compiler barrier
Lock

1.1 Compile optimization level

We will adjust the compilation optimization level to O0, Observe the effect .

gcc -S demo.c -O0

fun:
.LFB0:
	.cfi_startproc
	endbr64
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register 6
	movl	y(%rip), %eax   " x = y
	movl	%eax, x(%rip)
	movl	$1, z(%rip)	  " z = 1
	nop
	popq	%rbp
	.cfi_def_cfa 7, 8
	ret
	.cfi_endproc

Generally, hardware devices are compiled with -Os The optimization level of , Be situated between -O2 And -O3 Between . The difference between the following ：

-Os stay -O2 To minimize the size of the object code ;
-O3 I will try my best to improve the running speed , Even if you increase the size of the object code

1.2 Use volatile

volatile We are not unfamiliar with keywords , Access was volatile When modifying variables , Force access to values in memory , Not in the cache . use volatile The declared variable indicates that the variable can change at any time , The operation related to this variable , Don't compile optimization , To avoid mistakes

volatile Official description

therefore , Use volatile Modifying variables , That is to use O3 Level optimization does not change the order of statements .

volatile int x, y, z;
void fun(){
    x = y;
    z = 1;
}

Compilation result ：

fun:
.LFB0:
	.cfi_startproc
	endbr64
	movl	y(%rip), %eax
	movl	%eax, x(%rip)
	movl	$1, z(%rip)
	ret
	.cfi_endproc

1.3 Compiler barrier

Linux The kernel provides functions barrier(), It is used to make the compiler ensure that the memory access before it is completed before the memory access after it . This prevents before compiling the barrier code And after compiling the barrier code Compilation disorder occurs .

#define barrier() _asm_ _volatile_("": : :"memory")

Continue rewriting the source program ：

 int x, y, z;
void fun(){
    x = y;
    __asm__ __volatile__("": : :"memory");
    z = 1;
}

Compilation result ：

fun:
.LFB0:
	.cfi_startproc
	endbr64
	movl	y(%rip), %eax
	movl	%eax, x(%rip)
	movl	$1, z(%rip)
	ret
	.cfi_endproc

1.4 Lock

Locking shared memory is necessary , This can save a lot of trouble .

#include <pthread.h>
pthread_mutex_t m;

 int x, y, z;
void fun(){
    pthread_mutex_lock(&m);
    x = y;
    pthread_mutex_unlock(&m);
    z = 1;
}

Compilation result ：

fun:
.LFB1:
	.cfi_startproc
	endbr64
	subq	$8, %rsp
	.cfi_def_cfa_offset 16
	leaq	m(%rip), %rdi
	call	[email protected]
	movl	y(%rip), %eax
	leaq	m(%rip), %rdi
	movl	%eax, x(%rip)
	call	[email protected]
	movl	$1, z(%rip)
	addq	$8, %rsp
	.cfi_def_cfa_offset 8
	ret
	.cfi_endproc

2. Disordered operation

Runtime ,CPU It can execute instructions out of order .

Early processors were ordered processors （in-order processors）, Always execute instructions in the order written by the developer , If the input operand of the instruction （input operands） Unavailable （ Usually because of the need to get... From memory ）, Then the processor will not instead execute the instructions available to the input operands , Instead, wait for the current input operand to be available .

by comparison , Out of order processor （out-of-order processors） Will first process those instructions that have available input operands （ Instead of sequential execution ） Thus avoiding waiting , Improved efficiency . On modern computers , The processor runs much faster than memory , An ordered processor can process a large number of instructions while waiting for available data . Even if modern processors are out of order , But in a single CPU On , Instructions can be obtained and executed sequentially through the instruction queue , The results are returned to the register heap in queue order （ Please refer to http:// http://en.wikipedia.org/wiki/Out-of-order_execution）, This makes all memory access operations seem to be executed in the order of the program code , Therefore, the memory barrier is unnecessary （ The premise is that without considering compiler optimization ）.

Here is an example to illustrate the phenomenon and the solution .

/*=============================================================================
*
* Author: Terrance[[email protected]]
*
*  official account ： Embedded island 
*
* Last modified: 2021-11-13 23:02
*
* Filename: cpuchaos.c
*
* Description:  Memory disordered execution access and prevention 
*
=============================================================================*/
#define _GNU_SOURCE
#include  <stdio.h>
#include  <stdlib.h>
#include  <pthread.h>
#include  <string.h>

int x, y, p, q;
int runtime = 0;

static pthread_barrier_t barrier_start;
static pthread_barrier_t barrier_end;

static void *thread1(void *args)
{
    for (; ;){
        pthread_barrier_wait(&barrier_start);
        x = 1;
#ifdef CPU_MEM_FENCE
        __asm__ __volatile__("mfence":::"memory"); // CPU Memory barrier 
#endif
        p = y;
        pthread_barrier_wait(&barrier_end);
    }
    return NULL;
}

static void *thread2(void *args)
{
    for (; ;){
        pthread_barrier_wait(&barrier_start);
        y = 1;
#ifdef CPU_MEM_FENCE
        __asm__ __volatile__("mfence":::"memory");
#endif
        q = x;
        pthread_barrier_wait(&barrier_end);
    }
    return NULL;
}
void start(void)
{
    x = y = p = q = 0;
}

void end(void)
{
    ++runtime;
    printf("[%d] %d %d\n", runtime, p, q);

    /*  Disorder occurs , To terminate the program  */
    if (p == 0 && q == 0){ 
        puts("chaos coming!");
        exit(-1);
    }

}

int main(int argc, char *argv[])
{
    int err;
    pthread_t t1, t2;

    err = pthread_barrier_init(&barrier_start, NULL, 3);
    if (err != 0){
        perror("pthread_barrier_init");
        exit(-1);
    }

    err = pthread_barrier_init(&barrier_end, NULL, 3);
    if (err != 0){
        perror("pthread_barrier_init");
        exit(-1);
    }

    /* create thread */
    err = pthread_create(&t1, NULL, thread1, NULL);
    if (err != 0){
        perror("pthread_create");
        exit(-1);
    }

    err = pthread_create(&t2, NULL, thread2, NULL);
    if (err != 0){
        perror("pthread_create");
        exit(-1);
    }
    /*  Threads 1 Bound to the CPU0 perform  */
    cpu_set_t cst;
    CPU_ZERO(&cst);
    CPU_SET(0, &cst);
    err = pthread_setaffinity_np(t1, sizeof(cst), &cst);
    if (err != 0){
        perror("pthread_setaffinity_np");
        exit(-1);
    }
	
    /*  Threads 2 Bound to the CPU1 perform  */
    CPU_ZERO(&cst);
    CPU_SET(1, &cst);
    err = pthread_setaffinity_np(t2, sizeof(cst), &cst);
    if (err != 0){
        perror("pthread_setaffinity_np");
        exit(-1);
    }

    for (;;){
        start();
        pthread_barrier_wait(&barrier_start);
        pthread_barrier_wait(&barrier_end);
        end();
    }

    return 0;
}

# linux @ ubuntu in ~/codelab/c/Nov [21:35:52] 
$ gcc cpuchaos.c  -o chaos -lpthread

# linux @ ubuntu in ~/codelab/c/Nov [21:35:53] 
$ ./chaos 
[1] 1 0
[2] 1 0
[3] 1 0
[4] 1 0
[5] 1 0
[6] 1 0
[7] 0 1
......
[6000] 0 1
[6001] 1 0
[6002] 1 0
[6003] 1 0
[6004] 1 0
[6005] 0 0
chaos coming!

There was a disorder , Termination of procedure .

# linux @ ubuntu in ~/codelab/c/Nov [21:35:58] C:255
$ gcc cpuchaos.c  -o chaos -lpthread -DCPU_MEM_FENCE

# linux @ ubuntu in ~/codelab/c/Nov [21:37:54] 
$ ./chaos 
[1] 1 0
[2] 1 0
[3] 1 0
[4] 1 0
[5] 1 0
[6] 1 0
[7] 0 1
......
[405185] 0 1
[405186] 0 1
[405187] 0 1
[405188] 0 1
[405189] 0 1
[405190] 0 1
[405191] 0 1
[405192] 0 1
[405193] 0 1
^C

ran 40 There have been thousands of times without disorder , Memory barrier is in effect .

however , If the hardware product is a single core, there is no need to worry about disordered execution .

3. summary

This paper discusses the memory disorder phenomenon , Including compilation disorder and execution disorder . So for shared data , This lock can basically avoid the memory optimization problem .

Welcome to WeChat official account. ： Embedded island ！

原网站

版权声明
本文为[Embedded Island]所创，转载请带上原文链接，感谢
https://yzsam.com/2021/11/20211121165532863x.html