当前位置:网站首页>Different implementation of CAS operation under arm and x86

Different implementation of CAS operation under arm and x86

2022-06-23 15:48:00 User 4415180

  cmpxchg yes X86 Compare exchange instructions , This instruction is widely used in atomic operations and various synchronization primitives implemented by major underlying systems , such as linux kernel ,JVM,GCC Compiler, etc. ,cmpxchg Compare exchange instructions , understand cmpxchg Understand atomic operation first .

   intel P6 And the latest series of processors ensure that the following operations are atomic :1. Read and write a byte .2. Reading and writing 16 Bit aligned word .3. Reading and writing 32 Bit aligned doubleword .4. Reading and writing 64 Bit aligned quadword .5. Reading and writing 16 position ,32 position ,64 Bit in cache line Unaligned words in . So ordinary load store Instructions are atomic .cache The consistency protocol guarantees that there can be no two cpu Write a memory at the same time . about cmpxchg This comparison exchange instruction is certainly not atomic ,intel yes CISC Complex instruction set architecture , When the internal pipeline is executing , I'm sure it will cmpxchg Instructions are translated into several microcodes for execution ( contrast ARM Reduced instruction set ). So Intel offers... For some instructions LOCK Prefix to ensure the atomicity of the instruction .Intel 64 and IA-32 The processor provides LOCK# The signal , This signal is automatically set during certain critical memory operations , To lock the system bus or equivalent link . When the output signal is asserted , Requests from other processors or bus agents to control the bus are blocked . about Intel386,Intel486 and Pentium processor , Instructions that explicitly lock will result in LOCK# Signal setting . It is the responsibility of the hardware designer to use... In the system hardware LOCK# Signals to control memory access between processors . about P6 And newer processor families , If the accessed storage area is cached inside the processor , be LOCK# Signals are not usually asserted ; contrary , Locking applies only to the processor's cache . about Intel486 and Pentium processor ,LOCK# The signal is LOCK Always set on the bus during operation , This is true even if the locked memory area is cached in the processor . So the performance will be reduced a lot , Lead to others cpu Unable to access memory . about P6 And newer processor families , If in LOCK The memory area locked during the operation is cached during execution LOCK The operation is written back to the processor as memory and is completely contained in the cache line , The processor may not assert on the bus LOCK# The signal . contrary , It will internally modify the memory location and allow its cache consistency mechanism , To ensure that the operation is performed atomically . This operation is called “ Cache lock ”. The cache consistency mechanism automatically prevents two or more processors that cache the same memory area from modifying the data in that area at the same time .

  In order to understand more clearly cmxchg, Need to see at the same time ARM and x86 The implementation under the two architectures is one RISC, One CISC,linux The kernel provides implementations under two architectures .linux The atomic variables of the kernel are defined as follows :

// Atomic variable 
typedef struct {
	volatile int counter; //volatile Prevents the compiler from buffering variables into registers 
} atomic_t;

First look at ARM Under the architecture ,ARM Architecture is a reduced instruction set , Not provided cmpxchg This complex instruction , And everything else RISC Architecture also provides LL/SC( Link loading , Conditional storage ) operation , This operation is the basis of many atomic operations .ARMv8 Instruction is LDXR\STXR,ARMv7 Instruction is LDREX\STREX, Be the same in essentials while differing in minor points , All belong to exclusive access , Need to have local monitor and global monitor In combination with . These two instructions usually need to appear in pairs .ldrex Is to fetch data from memory and put it into a register , The monitor then marks this address as exclusive ,strex Will first test whether it is current cpu The monopoly of , If yes, the storage successfully returns 0, If not, the storage fails and returns 1. for example cpu0 Address m Mark as exclusive , stay strex Before execution , The thread is called out ,cpu1 call ldrex Will clear cpu0 The monopoly of , And mark yourself as exclusive , And then execute strxr, then cpu0 The thread of is rescheduled , Execute at this time strex Will fail , Because your exclusive bit is cleared . This will also lead to post entry ldrex Threads of may execute before those that enter first . Address marked as exclusive calls strex The exclusive flag will be cleared after .

/**
 *   Compare ptr->counter and old If the values of are equal , be ptr->counter = new, And back to old, otherwise ptr->counter unchanged 
 *  return ptr->counter
 */
static inline int atomic_cmpxchg(atomic_t *ptr, int old, int new)
{
	unsigned long oldval, res;

	smp_mb(); // Memory barrier , Guarantee cmpxchg Will not be performed in front of the barrier 

	do {
		__asm__ __volatile__("@ atomic_cmpxchg\n"
		"ldrex	%1, [%2]\n" // Exclusive access , The monitor will monopolize this address flag and will ptr->counter to oldvalue
		"mov	%0, #0\n"   //res = 0
		"teq	%1, %3\n"   // test oldvalue Whether and old Equality means ptr->counter and old

		// Exclusive access is successful and if it is equal new Assign a value to ptr->counter, Otherwise, do not execute this instruction 
		"strexeq %0, %4, [%2]\n" 
		    : "=&r" (res), "=&r" (oldval)
		    : "r" (&ptr->counter), "Ir" (old), "r" (new)
		    : "cc");
	} while (res);  //while res Because strexeq The instruction is an exclusive memory access instruction from , At this time, the access may not be marked , and res by 1

	smp_mb();// Memory barrier , Guarantee cmpxchg Will not be performed behind the barrier 

	return oldval;
}

x86 The architecture is similar :

/*
 *   according to size Size comparison swap bytes , Word or double word , If you return old The exchange is successful , Otherwise the exchange fails 
 */
static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old,
				      unsigned long new, int size)
{
	unsigned long prev;
	switch (size) {
	case 1:
		__asm__ __volatile__(LOCK_PREFIX "cmpxchgb %b1,%2"
				     : "=a"(prev)
				     : "q"(new), "m"(*__xg(ptr)), "0"(old)
				     : "memory");
		return prev;
	case 2:
		__asm__ __volatile__(LOCK_PREFIX "cmpxchgw %w1,%2"
				     : "=a"(prev)
				     : "r"(new), "m"(*__xg(ptr)), "0"(old)
				     : "memory");
		return prev;
		//eax = old, Compare %2 = ptr->counter and eax Whether it is equal or not , If they are equal ZF Set up , And put %1 = new assignment 
		// to ptr->counter, return old value , otherwise ZF eliminate , And will ptr->counter Assign a value to eax
	case 4:
		__asm__ __volatile__(LOCK_PREFIX "cmpxchgl %1,%2"
				     : "=a"(prev)
				     : "r"(new), "m"(*__xg(ptr)), "0"(old)  //0 Express eax = old
				     : "memory");
		return prev;
	}
	return old;
}

stay cmpxchg The instruction is preceded by lock Prefix , Ensure that when operating , Will not let others cpu Operate on the same memory . Keep the whole operation atomic . By comparison, though X86 It takes only one instruction , But the processor must have turned this instruction into a class RISC Microcode of .

原网站

版权声明
本文为[User 4415180]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/174/202206231502315272.html