当前位置:网站首页>What is memory out of order access?
What is memory out of order access?
2022-06-24 00:59:00 【Embedded Island】
What is memory out of order access ?
It is more and more interesting to dig into the underlying principles of the computer , Today, let's talk about memory out of order execution .
First of all, let me ask you a question : Will the program we write be executed in the given order ?
There seems to be no doubt about it . But know how to compile 、 Link principle “ Bottom ” knowledge , It is not easy to draw a conclusion . Especially when multithreading involves memory sharing without locking , It will also expose the problem .
So I'm sorry , In some cases , The order of execution of program instructions will change , This gives rise to what we call Memory disorder problem .
Disordered execution technology is that the processor optimizes the original order of the code in order to improve the operation speed .
But also very lucky , We can take the hand “ Disorder ” Corrected to “ The order ”.
Memory out of order access is generally divided into two types : Compile out of order and execute out of order . Below we give examples to illustrate the phenomenon and introduce the methods to avoid disorder .
1. Compilation disorder
The fundamental reason for compiler out of order optimization is that the processor can only analyze a small block of instructions at a time , But compilers can do code analysis on a large scale , So as to make better strategies .
Let's write two simple lines of program to reproduce the performance of compilation disorder .
int x, y, z;
void fun(){
x = y;
z = 1;
}adopt gcc View the compiled assembly instructions , Here we use O3 Optimization grade :
gcc -S demo.c -O3
Intercept a piece of code that we focus on :
fun: .LFB0: .cfi_startproc endbr64 movl $1, z(%rip) " z = 1 movl y(%rip), %eax movl %eax, x(%rip) " x = y ret .cfi_endproc
obviously , The compiler is switched x = y; z = 1; The execution order of the two statements .
So how to solve the trouble caused by disordered compilation ? There are several solutions :
- Compile optimization level
- volatile
- Compiler barrier
- Lock
1.1 Compile optimization level
We will adjust the compilation optimization level to O0, Observe the effect .
gcc -S demo.c -O0
fun: .LFB0: .cfi_startproc endbr64 pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq %rsp, %rbp .cfi_def_cfa_register 6 movl y(%rip), %eax " x = y movl %eax, x(%rip) movl $1, z(%rip) " z = 1 nop popq %rbp .cfi_def_cfa 7, 8 ret .cfi_endproc
Generally, hardware devices are compiled with -Os The optimization level of , Be situated between -O2 And -O3 Between . The difference between the following :
- -Os stay -O2 To minimize the size of the object code ;
- -O3 I will try my best to improve the running speed , Even if you increase the size of the object code
1.2 Use volatile
volatile We are not unfamiliar with keywords , Access was volatile When modifying variables , Force access to values in memory , Not in the cache . use volatile The declared variable indicates that the variable can change at any time , The operation related to this variable , Don't compile optimization , To avoid mistakes
therefore , Use volatile Modifying variables , That is to use O3 Level optimization does not change the order of statements .
volatile int x, y, z;
void fun(){
x = y;
z = 1;
}Compilation result :
fun: .LFB0: .cfi_startproc endbr64 movl y(%rip), %eax movl %eax, x(%rip) movl $1, z(%rip) ret .cfi_endproc
1.3 Compiler barrier
Linux The kernel provides functions barrier(), It is used to make the compiler ensure that the memory access before it is completed before the memory access after it . This prevents before compiling the barrier code And after compiling the barrier code Compilation disorder occurs .
#define barrier() _asm_ _volatile_("": : :"memory")Continue rewriting the source program :
int x, y, z;
void fun(){
x = y;
__asm__ __volatile__("": : :"memory");
z = 1;
}Compilation result :
fun: .LFB0: .cfi_startproc endbr64 movl y(%rip), %eax movl %eax, x(%rip) movl $1, z(%rip) ret .cfi_endproc
1.4 Lock
Locking shared memory is necessary , This can save a lot of trouble .
#include <pthread.h>
pthread_mutex_t m;
int x, y, z;
void fun(){
pthread_mutex_lock(&m);
x = y;
pthread_mutex_unlock(&m);
z = 1;
}Compilation result :
fun: .LFB1: .cfi_startproc endbr64 subq $8, %rsp .cfi_def_cfa_offset 16 leaq m(%rip), %rdi call [email protected] movl y(%rip), %eax leaq m(%rip), %rdi movl %eax, x(%rip) call [email protected] movl $1, z(%rip) addq $8, %rsp .cfi_def_cfa_offset 8 ret .cfi_endproc
2. Disordered operation
Runtime ,CPU It can execute instructions out of order .
Early processors were ordered processors (in-order processors), Always execute instructions in the order written by the developer , If the input operand of the instruction (input operands) Unavailable ( Usually because of the need to get... From memory ), Then the processor will not instead execute the instructions available to the input operands , Instead, wait for the current input operand to be available .
by comparison , Out of order processor (out-of-order processors) Will first process those instructions that have available input operands ( Instead of sequential execution ) Thus avoiding waiting , Improved efficiency . On modern computers , The processor runs much faster than memory , An ordered processor can process a large number of instructions while waiting for available data . Even if modern processors are out of order , But in a single CPU On , Instructions can be obtained and executed sequentially through the instruction queue , The results are returned to the register heap in queue order ( Please refer to http:// http://en.wikipedia.org/wiki/Out-of-order_execution), This makes all memory access operations seem to be executed in the order of the program code , Therefore, the memory barrier is unnecessary ( The premise is that without considering compiler optimization ).
Here is an example to illustrate the phenomenon and the solution .
/*============================================================================= * * Author: Terrance[[email protected]] * * official account : Embedded island * * Last modified: 2021-11-13 23:02 * * Filename: cpuchaos.c * * Description: Memory disordered execution access and prevention * =============================================================================*/ #define _GNU_SOURCE #include <stdio.h> #include <stdlib.h> #include <pthread.h> #include <string.h> int x, y, p, q; int runtime = 0; static pthread_barrier_t barrier_start; static pthread_barrier_t barrier_end; static void *thread1(void *args) { for (; ;){ pthread_barrier_wait(&barrier_start); x = 1; #ifdef CPU_MEM_FENCE __asm__ __volatile__("mfence":::"memory"); // CPU Memory barrier #endif p = y; pthread_barrier_wait(&barrier_end); } return NULL; } static void *thread2(void *args) { for (; ;){ pthread_barrier_wait(&barrier_start); y = 1; #ifdef CPU_MEM_FENCE __asm__ __volatile__("mfence":::"memory"); #endif q = x; pthread_barrier_wait(&barrier_end); } return NULL; } void start(void) { x = y = p = q = 0; } void end(void) { ++runtime; printf("[%d] %d %d\n", runtime, p, q); /* Disorder occurs , To terminate the program */ if (p == 0 && q == 0){ puts("chaos coming!"); exit(-1); } } int main(int argc, char *argv[]) { int err; pthread_t t1, t2; err = pthread_barrier_init(&barrier_start, NULL, 3); if (err != 0){ perror("pthread_barrier_init"); exit(-1); } err = pthread_barrier_init(&barrier_end, NULL, 3); if (err != 0){ perror("pthread_barrier_init"); exit(-1); } /* create thread */ err = pthread_create(&t1, NULL, thread1, NULL); if (err != 0){ perror("pthread_create"); exit(-1); } err = pthread_create(&t2, NULL, thread2, NULL); if (err != 0){ perror("pthread_create"); exit(-1); } /* Threads 1 Bound to the CPU0 perform */ cpu_set_t cst; CPU_ZERO(&cst); CPU_SET(0, &cst); err = pthread_setaffinity_np(t1, sizeof(cst), &cst); if (err != 0){ perror("pthread_setaffinity_np"); exit(-1); } /* Threads 2 Bound to the CPU1 perform */ CPU_ZERO(&cst); CPU_SET(1, &cst); err = pthread_setaffinity_np(t2, sizeof(cst), &cst); if (err != 0){ perror("pthread_setaffinity_np"); exit(-1); } for (;;){ start(); pthread_barrier_wait(&barrier_start); pthread_barrier_wait(&barrier_end); end(); } return 0; }
# linux @ ubuntu in ~/codelab/c/Nov [21:35:52] $ gcc cpuchaos.c -o chaos -lpthread # linux @ ubuntu in ~/codelab/c/Nov [21:35:53] $ ./chaos [1] 1 0 [2] 1 0 [3] 1 0 [4] 1 0 [5] 1 0 [6] 1 0 [7] 0 1 ...... [6000] 0 1 [6001] 1 0 [6002] 1 0 [6003] 1 0 [6004] 1 0 [6005] 0 0 chaos coming!
There was a disorder , Termination of procedure .
# linux @ ubuntu in ~/codelab/c/Nov [21:35:58] C:255 $ gcc cpuchaos.c -o chaos -lpthread -DCPU_MEM_FENCE # linux @ ubuntu in ~/codelab/c/Nov [21:37:54] $ ./chaos [1] 1 0 [2] 1 0 [3] 1 0 [4] 1 0 [5] 1 0 [6] 1 0 [7] 0 1 ...... [405185] 0 1 [405186] 0 1 [405187] 0 1 [405188] 0 1 [405189] 0 1 [405190] 0 1 [405191] 0 1 [405192] 0 1 [405193] 0 1 ^C
ran 40 There have been thousands of times without disorder , Memory barrier is in effect .
however , If the hardware product is a single core, there is no need to worry about disordered execution .
3. summary
This paper discusses the memory disorder phenomenon , Including compilation disorder and execution disorder . So for shared data , This lock can basically avoid the memory optimization problem .
Welcome to WeChat official account. : Embedded island !
边栏推荐
- Isn't this another go bug?
- 对抗训练理论分析:自适应步长快速对抗训练
- Social order in the meta universe
- 跨域和JSONP
- skywalking 安装部署实践
- Dart series: using generators in dart
- C语言:百马百担问题求驮法
- Arm learning (7) symbol table and debugging
- Social recruitment interview is indispensable -- 1000 interview questions for Android engineers from Internet companies
- [ICPR 2021] tiny object detection in aerial images
猜你喜欢

【虹科案例】3D数据如何成为可操作的信息?– 对象检测和跟踪

【小程序】编译预览小程序时,出现-80063错误提示

ShardingSphere-proxy-5.0.0容量范围分片的实现(五)

Icml'22 | progcl: rethinking difficult sample mining in graph contrast learning

WinSCP和PuTTY的安装和使用

C语言:关于矩阵右移问题

这不会又是一个Go的BUG吧?

13 `bs_duixiang.tag标签`得到一个tag对象

An accident caused by a MySQL misoperation, and the "high availability" cannot withstand it!

实时计算框架:Spark集群搭建与入门案例
随机推荐
[Hongke case] how can 3D data become operable information Object detection and tracking
分别用SVM、贝叶斯分类、二叉树、CNN实现手写数字识别
MIP nerf: anti aliasing multiscale neural radiation field iccv2021
Using anydesk remote control for intranet penetration horizontal movement
C language: sorting with custom functions
The concept of TP FP TN FN in machine learning
VS2022保存格式化插件
setfacl命令的基本用法
Building a digital software factory -- panoramic interpretation of one-stop Devops platform
Error reported using worker: uncaught domexception: failed to construct 'worker': script at***
Kitten paw: FOC control 15-mras method of PMSM
用一个软件纪念自己故去的母亲,这或许才是程序员最大的浪漫吧
Law / principle / rule / rule / theorem / axiom / essence / Law
[day 25] given an array of length N, count the number of occurrences of each number | count hash
钟珊珊:被爆锤后的工程师会起飞|OneFlow U
Tiktok practice ~ one click registration and login process of mobile phone number and password (restrict mobile terminal login)
应用配置管理,基础原理分析
[CVPR 2020] conference version: a physics based noise formation model for extreme low light raw denoising
[redis advanced ziplist] if someone asks you what is a compressed list? Please dump this article directly to him.
[OSG] OSG development (04) - create multiple scene views