当前位置:网站首页>What is memory out of order access?
What is memory out of order access?
2022-06-24 00:59:00 【Embedded Island】
What is memory out of order access ?
It is more and more interesting to dig into the underlying principles of the computer , Today, let's talk about memory out of order execution .
First of all, let me ask you a question : Will the program we write be executed in the given order ?
There seems to be no doubt about it . But know how to compile 、 Link principle “ Bottom ” knowledge , It is not easy to draw a conclusion . Especially when multithreading involves memory sharing without locking , It will also expose the problem .
So I'm sorry , In some cases , The order of execution of program instructions will change , This gives rise to what we call Memory disorder problem .
Disordered execution technology is that the processor optimizes the original order of the code in order to improve the operation speed .
But also very lucky , We can take the hand “ Disorder ” Corrected to “ The order ”.
Memory out of order access is generally divided into two types : Compile out of order and execute out of order . Below we give examples to illustrate the phenomenon and introduce the methods to avoid disorder .
1. Compilation disorder
The fundamental reason for compiler out of order optimization is that the processor can only analyze a small block of instructions at a time , But compilers can do code analysis on a large scale , So as to make better strategies .
Let's write two simple lines of program to reproduce the performance of compilation disorder .
int x, y, z;
void fun(){
x = y;
z = 1;
}adopt gcc View the compiled assembly instructions , Here we use O3 Optimization grade :
gcc -S demo.c -O3
Intercept a piece of code that we focus on :
fun: .LFB0: .cfi_startproc endbr64 movl $1, z(%rip) " z = 1 movl y(%rip), %eax movl %eax, x(%rip) " x = y ret .cfi_endproc
obviously , The compiler is switched x = y; z = 1; The execution order of the two statements .
So how to solve the trouble caused by disordered compilation ? There are several solutions :
- Compile optimization level
- volatile
- Compiler barrier
- Lock
1.1 Compile optimization level
We will adjust the compilation optimization level to O0, Observe the effect .
gcc -S demo.c -O0
fun: .LFB0: .cfi_startproc endbr64 pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq %rsp, %rbp .cfi_def_cfa_register 6 movl y(%rip), %eax " x = y movl %eax, x(%rip) movl $1, z(%rip) " z = 1 nop popq %rbp .cfi_def_cfa 7, 8 ret .cfi_endproc
Generally, hardware devices are compiled with -Os The optimization level of , Be situated between -O2 And -O3 Between . The difference between the following :
- -Os stay -O2 To minimize the size of the object code ;
- -O3 I will try my best to improve the running speed , Even if you increase the size of the object code
1.2 Use volatile
volatile We are not unfamiliar with keywords , Access was volatile When modifying variables , Force access to values in memory , Not in the cache . use volatile The declared variable indicates that the variable can change at any time , The operation related to this variable , Don't compile optimization , To avoid mistakes
therefore , Use volatile Modifying variables , That is to use O3 Level optimization does not change the order of statements .
volatile int x, y, z;
void fun(){
x = y;
z = 1;
}Compilation result :
fun: .LFB0: .cfi_startproc endbr64 movl y(%rip), %eax movl %eax, x(%rip) movl $1, z(%rip) ret .cfi_endproc
1.3 Compiler barrier
Linux The kernel provides functions barrier(), It is used to make the compiler ensure that the memory access before it is completed before the memory access after it . This prevents before compiling the barrier code And after compiling the barrier code Compilation disorder occurs .
#define barrier() _asm_ _volatile_("": : :"memory")Continue rewriting the source program :
int x, y, z;
void fun(){
x = y;
__asm__ __volatile__("": : :"memory");
z = 1;
}Compilation result :
fun: .LFB0: .cfi_startproc endbr64 movl y(%rip), %eax movl %eax, x(%rip) movl $1, z(%rip) ret .cfi_endproc
1.4 Lock
Locking shared memory is necessary , This can save a lot of trouble .
#include <pthread.h>
pthread_mutex_t m;
int x, y, z;
void fun(){
pthread_mutex_lock(&m);
x = y;
pthread_mutex_unlock(&m);
z = 1;
}Compilation result :
fun: .LFB1: .cfi_startproc endbr64 subq $8, %rsp .cfi_def_cfa_offset 16 leaq m(%rip), %rdi call [email protected] movl y(%rip), %eax leaq m(%rip), %rdi movl %eax, x(%rip) call [email protected] movl $1, z(%rip) addq $8, %rsp .cfi_def_cfa_offset 8 ret .cfi_endproc
2. Disordered operation
Runtime ,CPU It can execute instructions out of order .
Early processors were ordered processors (in-order processors), Always execute instructions in the order written by the developer , If the input operand of the instruction (input operands) Unavailable ( Usually because of the need to get... From memory ), Then the processor will not instead execute the instructions available to the input operands , Instead, wait for the current input operand to be available .
by comparison , Out of order processor (out-of-order processors) Will first process those instructions that have available input operands ( Instead of sequential execution ) Thus avoiding waiting , Improved efficiency . On modern computers , The processor runs much faster than memory , An ordered processor can process a large number of instructions while waiting for available data . Even if modern processors are out of order , But in a single CPU On , Instructions can be obtained and executed sequentially through the instruction queue , The results are returned to the register heap in queue order ( Please refer to http:// http://en.wikipedia.org/wiki/Out-of-order_execution), This makes all memory access operations seem to be executed in the order of the program code , Therefore, the memory barrier is unnecessary ( The premise is that without considering compiler optimization ).
Here is an example to illustrate the phenomenon and the solution .
/*============================================================================= * * Author: Terrance[[email protected]] * * official account : Embedded island * * Last modified: 2021-11-13 23:02 * * Filename: cpuchaos.c * * Description: Memory disordered execution access and prevention * =============================================================================*/ #define _GNU_SOURCE #include <stdio.h> #include <stdlib.h> #include <pthread.h> #include <string.h> int x, y, p, q; int runtime = 0; static pthread_barrier_t barrier_start; static pthread_barrier_t barrier_end; static void *thread1(void *args) { for (; ;){ pthread_barrier_wait(&barrier_start); x = 1; #ifdef CPU_MEM_FENCE __asm__ __volatile__("mfence":::"memory"); // CPU Memory barrier #endif p = y; pthread_barrier_wait(&barrier_end); } return NULL; } static void *thread2(void *args) { for (; ;){ pthread_barrier_wait(&barrier_start); y = 1; #ifdef CPU_MEM_FENCE __asm__ __volatile__("mfence":::"memory"); #endif q = x; pthread_barrier_wait(&barrier_end); } return NULL; } void start(void) { x = y = p = q = 0; } void end(void) { ++runtime; printf("[%d] %d %d\n", runtime, p, q); /* Disorder occurs , To terminate the program */ if (p == 0 && q == 0){ puts("chaos coming!"); exit(-1); } } int main(int argc, char *argv[]) { int err; pthread_t t1, t2; err = pthread_barrier_init(&barrier_start, NULL, 3); if (err != 0){ perror("pthread_barrier_init"); exit(-1); } err = pthread_barrier_init(&barrier_end, NULL, 3); if (err != 0){ perror("pthread_barrier_init"); exit(-1); } /* create thread */ err = pthread_create(&t1, NULL, thread1, NULL); if (err != 0){ perror("pthread_create"); exit(-1); } err = pthread_create(&t2, NULL, thread2, NULL); if (err != 0){ perror("pthread_create"); exit(-1); } /* Threads 1 Bound to the CPU0 perform */ cpu_set_t cst; CPU_ZERO(&cst); CPU_SET(0, &cst); err = pthread_setaffinity_np(t1, sizeof(cst), &cst); if (err != 0){ perror("pthread_setaffinity_np"); exit(-1); } /* Threads 2 Bound to the CPU1 perform */ CPU_ZERO(&cst); CPU_SET(1, &cst); err = pthread_setaffinity_np(t2, sizeof(cst), &cst); if (err != 0){ perror("pthread_setaffinity_np"); exit(-1); } for (;;){ start(); pthread_barrier_wait(&barrier_start); pthread_barrier_wait(&barrier_end); end(); } return 0; }
# linux @ ubuntu in ~/codelab/c/Nov [21:35:52] $ gcc cpuchaos.c -o chaos -lpthread # linux @ ubuntu in ~/codelab/c/Nov [21:35:53] $ ./chaos [1] 1 0 [2] 1 0 [3] 1 0 [4] 1 0 [5] 1 0 [6] 1 0 [7] 0 1 ...... [6000] 0 1 [6001] 1 0 [6002] 1 0 [6003] 1 0 [6004] 1 0 [6005] 0 0 chaos coming!
There was a disorder , Termination of procedure .
# linux @ ubuntu in ~/codelab/c/Nov [21:35:58] C:255 $ gcc cpuchaos.c -o chaos -lpthread -DCPU_MEM_FENCE # linux @ ubuntu in ~/codelab/c/Nov [21:37:54] $ ./chaos [1] 1 0 [2] 1 0 [3] 1 0 [4] 1 0 [5] 1 0 [6] 1 0 [7] 0 1 ...... [405185] 0 1 [405186] 0 1 [405187] 0 1 [405188] 0 1 [405189] 0 1 [405190] 0 1 [405191] 0 1 [405192] 0 1 [405193] 0 1 ^C
ran 40 There have been thousands of times without disorder , Memory barrier is in effect .
however , If the hardware product is a single core, there is no need to worry about disordered execution .
3. summary
This paper discusses the memory disorder phenomenon , Including compilation disorder and execution disorder . So for shared data , This lock can basically avoid the memory optimization problem .
Welcome to WeChat official account. : Embedded island !
边栏推荐
- 【CVPR 2020】会议版本:A Physics-based Noise Formation Model for Extreme Low-light Raw Denoising
- 杂乱的知识点
- Application configuration management, basic principle analysis
- What do NLP engineers do? What is the work content?
- 钟珊珊:被爆锤后的工程师会起飞|OneFlow U
- WinSCP和PuTTY的安装和使用
- Is it safe to open an account online? What conditions need to be met?
- Real time computing framework: Flink cluster construction and operation mechanism
- 逻辑的定义
- After the deployment of Beidou navigation system, why didn't we launch a high-precision map similar to Google maps?
猜你喜欢

【CVPR 2022】高分辨率小目标检测:Cascaded Sparse Query for Accelerating High-Resolution Smal Object Detection

利用Scanorama高效整合异质单细胞转录组

LSF opens job idle information to view the CPU time/elapse time usage of the job

Real time computing framework: Spark cluster setup and introduction case

对抗训练理论分析:自适应步长快速对抗训练

【小程序】实现双列商品效果
![[ICPR 2021] tiny object detection in aerial images](/img/40/6d346f357a858f3787eeba58262801.png)
[ICPR 2021] tiny object detection in aerial images
![[applet] when compiling the preview applet, a -80063 error prompt appears](/img/4e/722d76aa0ca3576164fbed4e2c4db2.png)
[applet] when compiling the preview applet, a -80063 error prompt appears

ShardingSphere-proxy-5.0.0容量范围分片的实现(五)

Everything I see is the category of my precise positioning! Open source of a new method for saliency map visualization
随机推荐
ARM学习(7) symbol 符号表以及调试
Common core resource objects of kubernetes
【Redis进阶之ZipList】如果再有人问你什么是压缩列表?请把这篇文章直接甩给他。
钟珊珊:被爆锤后的工程师会起飞|OneFlow U
通达信股票开户是安全的吗?
Dart series: using generators in dart
【CVPR 2022】高分辨率小目标检测:Cascaded Sparse Query for Accelerating High-Resolution Smal Object Detection
【小程序】实现双列商品效果
WinSCP和PuTTY的安装和使用
【ICCV Workshop 2021】基于密度图的小目标检测:Coarse-grained Density Map Guided Object Detection in Aerial Images
【SPRS J P & RS 2022】小目标检测模块:A Normalized Gaussian Wasserstein Distance for Tiny Object Detection
飞桨产业级开源模型库:加速企业AI任务开发与应用
Everything I see is the category of my precise positioning! Open source of a new method for saliency map visualization
[image detection saliency map] calculation of fish eye saliency map based on MATLAB distortion prompt [including Matlab source code 1903]
【ICPR 2021】遥感图中的密集小目标检测:Tiny Object Detection in Aerial Images
If you want to open an account for stock trading, is it safe to open an account online-
[applet] when compiling the preview applet, a -80063 error prompt appears
使用worker报错:Uncaught DOMException: Failed to construct ‘Worker’: Script at***
产业互联网时代将依靠源自于产业本身的产品、技术和模式来实现的
一次 MySQL 误操作导致的事故,「高可用」都顶不住了!