当前位置:网站首页>Different implementation of CAS operation under arm and x86
Different implementation of CAS operation under arm and x86
2022-06-23 15:48:00 【User 4415180】
cmpxchg yes X86 Compare exchange instructions , This instruction is widely used in atomic operations and various synchronization primitives implemented by major underlying systems , such as linux kernel ,JVM,GCC Compiler, etc. ,cmpxchg Compare exchange instructions , understand cmpxchg Understand atomic operation first .
intel P6 And the latest series of processors ensure that the following operations are atomic :1. Read and write a byte .2. Reading and writing 16 Bit aligned word .3. Reading and writing 32 Bit aligned doubleword .4. Reading and writing 64 Bit aligned quadword .5. Reading and writing 16 position ,32 position ,64 Bit in cache line Unaligned words in . So ordinary load store Instructions are atomic .cache The consistency protocol guarantees that there can be no two cpu Write a memory at the same time . about cmpxchg This comparison exchange instruction is certainly not atomic ,intel yes CISC Complex instruction set architecture , When the internal pipeline is executing , I'm sure it will cmpxchg Instructions are translated into several microcodes for execution ( contrast ARM Reduced instruction set ). So Intel offers... For some instructions LOCK Prefix to ensure the atomicity of the instruction .Intel 64 and IA-32 The processor provides LOCK# The signal , This signal is automatically set during certain critical memory operations , To lock the system bus or equivalent link . When the output signal is asserted , Requests from other processors or bus agents to control the bus are blocked . about Intel386,Intel486 and Pentium processor , Instructions that explicitly lock will result in LOCK# Signal setting . It is the responsibility of the hardware designer to use... In the system hardware LOCK# Signals to control memory access between processors . about P6 And newer processor families , If the accessed storage area is cached inside the processor , be LOCK# Signals are not usually asserted ; contrary , Locking applies only to the processor's cache . about Intel486 and Pentium processor ,LOCK# The signal is LOCK Always set on the bus during operation , This is true even if the locked memory area is cached in the processor . So the performance will be reduced a lot , Lead to others cpu Unable to access memory . about P6 And newer processor families , If in LOCK The memory area locked during the operation is cached during execution LOCK The operation is written back to the processor as memory and is completely contained in the cache line , The processor may not assert on the bus LOCK# The signal . contrary , It will internally modify the memory location and allow its cache consistency mechanism , To ensure that the operation is performed atomically . This operation is called “ Cache lock ”. The cache consistency mechanism automatically prevents two or more processors that cache the same memory area from modifying the data in that area at the same time .
In order to understand more clearly cmxchg, Need to see at the same time ARM and x86 The implementation under the two architectures is one RISC, One CISC,linux The kernel provides implementations under two architectures .linux The atomic variables of the kernel are defined as follows :
// Atomic variable
typedef struct {
volatile int counter; //volatile Prevents the compiler from buffering variables into registers
} atomic_t;First look at ARM Under the architecture ,ARM Architecture is a reduced instruction set , Not provided cmpxchg This complex instruction , And everything else RISC Architecture also provides LL/SC( Link loading , Conditional storage ) operation , This operation is the basis of many atomic operations .ARMv8 Instruction is LDXR\STXR,ARMv7 Instruction is LDREX\STREX, Be the same in essentials while differing in minor points , All belong to exclusive access , Need to have local monitor and global monitor In combination with . These two instructions usually need to appear in pairs .ldrex Is to fetch data from memory and put it into a register , The monitor then marks this address as exclusive ,strex Will first test whether it is current cpu The monopoly of , If yes, the storage successfully returns 0, If not, the storage fails and returns 1. for example cpu0 Address m Mark as exclusive , stay strex Before execution , The thread is called out ,cpu1 call ldrex Will clear cpu0 The monopoly of , And mark yourself as exclusive , And then execute strxr, then cpu0 The thread of is rescheduled , Execute at this time strex Will fail , Because your exclusive bit is cleared . This will also lead to post entry ldrex Threads of may execute before those that enter first . Address marked as exclusive calls strex The exclusive flag will be cleared after .
/**
* Compare ptr->counter and old If the values of are equal , be ptr->counter = new, And back to old, otherwise ptr->counter unchanged
* return ptr->counter
*/
static inline int atomic_cmpxchg(atomic_t *ptr, int old, int new)
{
unsigned long oldval, res;
smp_mb(); // Memory barrier , Guarantee cmpxchg Will not be performed in front of the barrier
do {
__asm__ __volatile__("@ atomic_cmpxchg\n"
"ldrex %1, [%2]\n" // Exclusive access , The monitor will monopolize this address flag and will ptr->counter to oldvalue
"mov %0, #0\n" //res = 0
"teq %1, %3\n" // test oldvalue Whether and old Equality means ptr->counter and old
// Exclusive access is successful and if it is equal new Assign a value to ptr->counter, Otherwise, do not execute this instruction
"strexeq %0, %4, [%2]\n"
: "=&r" (res), "=&r" (oldval)
: "r" (&ptr->counter), "Ir" (old), "r" (new)
: "cc");
} while (res); //while res Because strexeq The instruction is an exclusive memory access instruction from , At this time, the access may not be marked , and res by 1
smp_mb();// Memory barrier , Guarantee cmpxchg Will not be performed behind the barrier
return oldval;
}x86 The architecture is similar :
/*
* according to size Size comparison swap bytes , Word or double word , If you return old The exchange is successful , Otherwise the exchange fails
*/
static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old,
unsigned long new, int size)
{
unsigned long prev;
switch (size) {
case 1:
__asm__ __volatile__(LOCK_PREFIX "cmpxchgb %b1,%2"
: "=a"(prev)
: "q"(new), "m"(*__xg(ptr)), "0"(old)
: "memory");
return prev;
case 2:
__asm__ __volatile__(LOCK_PREFIX "cmpxchgw %w1,%2"
: "=a"(prev)
: "r"(new), "m"(*__xg(ptr)), "0"(old)
: "memory");
return prev;
//eax = old, Compare %2 = ptr->counter and eax Whether it is equal or not , If they are equal ZF Set up , And put %1 = new assignment
// to ptr->counter, return old value , otherwise ZF eliminate , And will ptr->counter Assign a value to eax
case 4:
__asm__ __volatile__(LOCK_PREFIX "cmpxchgl %1,%2"
: "=a"(prev)
: "r"(new), "m"(*__xg(ptr)), "0"(old) //0 Express eax = old
: "memory");
return prev;
}
return old;
}stay cmpxchg The instruction is preceded by lock Prefix , Ensure that when operating , Will not let others cpu Operate on the same memory . Keep the whole operation atomic . By comparison, though X86 It takes only one instruction , But the processor must have turned this instruction into a class RISC Microcode of .
边栏推荐
- Nfnet: extension of NF RESNET without BN's 4096 super batch size training | 21 year paper
- 32. Compose 优美的触摸动画
- Detailed steps for MySQL dual master configuration
- Sectigo(Comodo)证书的由来
- [cloud based co creation] intelligent supply chain plan: improve the decision-making level of the supply chain and help enterprises reduce costs and increase efficiency
- Important knowledge of golang: mutex
- mysql 系列:总体架构概述
- Half wave loss equal thickness and equal inclination interference
- golang 重要知识:atomic 原子操作
- Personal summary of system design and Analysis Course Project
猜你喜欢

MQ消息中间件理论详解

自监督学习(SSL)Self-Supervised Learning

电子学会图形化一级编程题解析:猫捉老鼠

The work and development steps that must be done in the early stage of the development of the source code of the live broadcasting room

Jsr303 data verification

mysql事务与锁
Sorting out and summarizing the handling schemes for the three major exceptions of redis cache

【无标题】激光焊接在医疗中的应用

JS中的pop()元素

Leetcode 450.删除二叉搜索树中的结点
随机推荐
Origin of sectigo (Comodo) Certificate
stylegan2:analyzing and improving the image quality of stylegan
FPGA 常用缩写及单词在工程领域内的意义
重卡界销售和服务的“扛把子”,临沂广顺深耕产品全生命周期服务
[普通物理] 半波损失 等厚与等倾干涉
Important knowledge of golang: sync Once explanation
Explore the "store" on the cloud. The cloud store is newly upgraded!
[opencv450] salt and pepper noise demo
【opencv450】椒盐噪声demo
How to open a stock account? Is online account opening safe?
Important knowledge of golang: waitgroup parsing
Half wave loss equal thickness and equal inclination interference
Which platform is a good place to open a futures account? Is it safe to open an online futures account?
How strong is Jingdong's takeout after entering meituan and starving the hinterland?
golang 重要知识:sync.Cond 机制
Sorting out and summarizing the handling schemes for the three major exceptions of redis cache
MQ消息中间件理论详解
MIPI C-PHY协议你了解吗?手机高速接口之一
MySQL advanced statement 2
php 二维数组插入