当前位置:网站首页>Show you how to distinguish several kinds of parallelism
Show you how to distinguish several kinds of parallelism
2022-06-22 02:10:00 【Huawei cloud developer Alliance】
Abstract : in application , The main factor that affects the parallel speedup ratio is serial computing 、 Parallel computing and parallel overhead .
This article is shared from Huawei cloud community 《 High performance computing (2)—— Ten thousand Zhang tall buildings rise from the ground 》, author : I'm a big watermelon .
storage
From the physical division Shared memory and distributed memory Are two basic parallel computer storage methods besides Distributed shared memory It is also an increasingly important parallel computer storage method .

Instructions and data
- [ Small particle size ] According to the number of instructions and data that a parallel computer can execute at the same time Parallel computers can be divided into SIMD Single-Instruction Multiple-Data SIMD parallel computer and MIMD Multiple-Instruction Multiple-Data Multi instruction multi data parallel computer
- [ Large particle size ] According to different programs and data executed at the same time And put forward the SPMD Single-Program Multuple-Data Single program multi data parallel computer and MPMD Multiple-ProgramMultiple-Data Multi program multi data parallel computer
According to the simultaneous execution of instructions and data , Computer systems can be divided into the following four categories :
- Single processor , Single data (SISD)
- Single processor , More data (SIMD)
- Multiprocessor , Single data (MISD)
- Multiprocessor , More data (MIMD)

SISD
Single processor and single data are “ single CPU Machine ”, It executes instructions on a single data stream . stay SISD in , Instructions are executed sequentially .
For each of these “CPU The clock ”,CPU Execute in the following order :
- Fetch: CPU From a memory area ( register ) Get data and instructions
- Decode: CPU Decoding instructions
- Execute: The execution is performed on the data , Save the result in another register

This architecture ( feng · The Neumann system ) The main elements of are :
- Central memory unit : Store instructions and data
- CPU: Used to obtain instructions from memory units / data , Decode the instructions and execute them sequentially
- I/O System : The input and output streams of a program
Traditional single processor computers are classic SISD System . The figure below shows CPU stay Fetch、Decode、Execute Which units are used in the steps of :

MISD
In this model , Yes n A processor , Each has its own control unit , Share the same memory unit . In every one of them CPU The middle of the hour , The data obtained from memory will be processed by all processors at the same time , Each processor processes according to the instructions sent by its own control unit . under these circumstances , Parallelism is actually instruction level parallelism , Multiple instructions operate on the same data . The problem model that can make rational use of this architecture is quite special , For example, data encryption . therefore ,MISD There is not much use in reality , It is more about being an abstract model .

SIMD
SIMD The computer consists of several independent processors , Each has its own local memory , Can be used to store data . All processors work under a single instruction stream ; specifically , There is n Data streams , Each processor handles one . All processors process each step at the same time , Execute the same instructions on different data .
Many questions can be used SIMD Computer architecture to solve . Another interesting feature of this architecture is , The algorithm of this architecture is very well designed , Analyze and implement . The limitation is , Only can be decomposed into many small problems ( Small problems should be independent , Can be executed by the same instructions in any order ) Can be solved with this architecture . Many supercomputers are designed using this architecture . for example Connection Machine(1985 Year of Thinking Machine) and MPP(NASA-1983). We are in chapter six GPU Python Programming will be exposed to advanced modern graphics processors (GPU), There are many built-in processors SIMD processing unit , This architecture is widely used today .
MIMD
In the ferin classification , This computer is the most widely used 、 It is also the most powerful kind . This architecture has n A processor ,n Instruction streams ,n Data streams . Each processor has its own control unit and local memory , Give Way MIMD Architecture SIMD The computing power of the architecture is stronger . Each processor works under the instruction flow allocated by the independent control unit ; therefore , The processor can run different programs on different data , In this way, completely different sub problems or even single large problems can be solved . stay MIMD in , Architecture is implemented through thread or process level parallelism , This also means that the processor generally works asynchronously . This type of computer is usually used to solve problems that do not have a unified structure 、 No use SIMD To solve the problem . Now , Many computers use this intermediate architecture , For example, supercomputers , Computer network, etc . However , There is a problem that must be considered : Asynchronous algorithms are very difficult to design 、 Analyze and implement .

Concurrency and parallelism

Parallel type

Several parallel distinctions



Program 、 Threads 、 Processes and hyper threads
- Program A program is an ordered set of instructions . It doesn't have any running meaning , It is just a static entity file in the hard disk and other storage space of the computer system . such as Linux Under the system binary excutable,windows Under the system exe
- process A process is a system resource management entity maintained by the operating system under dynamic conditions . A process has its own lifecycle , It reflects the whole dynamic process of a program running on a certain data set . It needs to be loaded into memory , Click open one exe Is to start a process
- Threads . A thread is an entity of a process , It's a smaller basic unit that can run independently than a process , It is the basic unit scheduled and allocated by the system . Threads themselves basically do not own system resources , Have only a few essential resources in operation ( Such as program counter 、 A set of registers and call stack ), But it shares all the resources owned by the process with other threads belonging to the same process , Multiple threads of the same process can execute concurrently , Thus, the utilization rate of system resources is improved
- hyper-threading Hyper threading technology is to use special hardware instructions , Simulate two logic cores into two physical chips , Let a single CPU Can perform thread level parallel computing , Compatible with multithreaded operating system and software . Generally one CPU Corresponding to a thread , Through hyper threading, such as 8 nucleus 16 Threads
A cliche , The difference and relationship between thread and process :
- The execution of a program has at least one process , A process contains at least one thread ( The main thread ).
- Thread partition scale is smaller than process , So multithreaded programs have higher concurrency .
- A process is an independent unit of the system for resource allocation and scheduling , Thread is CPU Basic unit of dispatch and dispatch . Allow multiple threads to share their resources within the same process .
- Processes have separate memory units , That is, processes are independent of each other ; Multiple threads in the same process share memory . therefore , Threads can communicate with each other through read and write operations to the memory that is visible to them , The communication between processes needs the help of message transmission .
- Each thread has an entry for the program to run , An exit for sequential execution sequences and program runs , But threads cannot execute alone , Must depend on the process , The process controls the execution of multiple threads
- Processes have more corresponding states than threads , So the cost of creating or destroying a process is much higher than that of creating or destroying a thread . therefore , The process exists for a long time , Threads dynamically derive and merge as the computation progresses .
- One thread can create and undo another . Moreover, multiple threads in the same process share all the resources owned by the process ; At the same time, processes can also execute in parallel , Thus, the utilization of system resources is better improved .
Thread binding
A computer system is composed of one or more physical processors and memory , The running program divides the memory into two parts , One part is the storage area used by shared variables , The other part is the storage area for the private variables of each thread . Thread binding Is to bind a thread to a fixed processor , Thus, a one-to-one mapping relationship is established between threads and processors . If you do not bind threads , Threads may run on different processors at different time slices . We know , Each processor has its own multi-level cache , If the thread cuts back and forth , that cache The hit rate is certainly not high , Program performance is also affected . Binding through threads , The program can get higher cache Utilization to improve program performance .c++ How to bind threads in can be referred to https://www.cnblogs.com/wenqiang/p/6049978.html

Parallel algorithm evaluation
In theory ,n A the same cpu Theoretically, it can provide n Times the computing power .
But in practice , The parallel overhead will cause the total execution time to be unable to reduce linearly . These expenses are :
- Thread creation and destruction 、 Thread to thread communication 、 The overhead caused by factors such as synchronization between threads .
- There is computing code that cannot be parallelized , Cause the calculation to be completed by a single thread , Other threads are idle .
- Cost of competition for shared resources .
- Because of each cpu The imbalance of workload distribution and the limitation of memory bandwidth , One or more threads are idle due to lack of work or because they cannot continue executing while waiting for a specific event to occur .
Parallel speedup ( Speedup ratio )
The definition of speedup ratio is the execution time of sequential programs divided by the execution time of parallel programs that calculate the same result

In style ,t_sts For one CPU The serial execution time required for the program to complete the task ;t_ptp by n star CPU The time required to complete the task in parallel . Due to serial execution time t_sts by n star CPU Parallel execution completes the And parallel execution time t_ptp There are many ways to define it . This leads to the definition of five different acceleration ratios , The relative acceleration ratio 、 Actual acceleration ratio 、 Absolute acceleration ratio 、 Asymptotic actual acceleration ratio and asymptotic relative acceleration ratio .
Parallel efficiency ( efficiency )
in application , The main factors that affect the parallel speedup ratio are Serial computing 、 Parallel computing and parallel overhead Three aspects . In general , The parallel speedup ratio is less than CPU The number of . however , Sometimes there is a strange phenomenon , That is, parallel programs can be faster than serial programs n Times the speed , It is called superlinear acceleration ratio . The reason for superlinear acceleration is CPU The accessed data resides in their respective caches Cache in , The capacity of cache is smaller than that of memory , But the speed of reading and writing is much higher than that of memory .
Another major criterion for measuring parallel algorithms is parallel efficiency , It represents multiple CPU In parallel computing, a single CPU Average acceleration ratio of .

The ideal parallel efficiency is 1 Indicate all CPU Working at full capacity . Usually , The parallel efficiency will be less than 1, And with CPU Decrease as the quantity increases .
Scalability
Scalability measures the ability of parallel machines to run efficiently , Represents the computing power proportional to the number of processors ( Execution speed ). If the size of the problem and the number of processors increase at the same time , Performance will not degrade .
Amdal's law (Ahmdal’s law)
Amdal's law is widely used in processor design and parallel algorithm design . It indicates that the maximum speedup that a program can achieve is limited by the serial portion of the program .$S=1/(1-p) $ in 1-p1−p The serial part of a program . It means , For example, a program 90% The code is parallel , But there is still 10% Serial code of , Then the maximum speedup that can be achieved by an infinite number of processors in the system is still 9.
Gustafson's law (Gustafson’s law)
Gustafson's law is derived after considering the following situations :
- When the scale of the problem increases , The serial part of the program remains unchanged .
- When increasing the number of processors , Each processor still performs the same tasks .
Gustafson's law states the acceleration ratio S(P)=P-\alpha (P-1)S(P)=P−α(P−1), PP Is the number of processors , SS For the speedup ratio ,\alphaα Is a non parallel part of a parallel processor . As a contrast , Amdal's law compares the execution time of a single processor with the parallel execution time . So amdal's law is based on a fixed problem scale , It assumes that the overall workload of the program does not change with the size of the machine ( That is, the number of processors ) And change . Gustafson's law complements the lack that amdal's law does not consider the total amount of resources needed to solve the problem . Gustafson's law solves this problem , It shows that the best way to set the time allowed for parallel solutions is to consider all computing resources and based on this kind of information .
Click to follow , The first time to learn about Huawei's new cloud technology ~
边栏推荐
- Courses learned
- idea----bookmark
- Return to Chengdu to start my software testing career
- Chapter 19 signal lamp image simulation control technology based on speech recognition
- 本周一问 | -leaf 这个属性的含义?
- What is your understanding of interface testing?
- AHA C language Chapter 5 the good play is later (Lecture 24-25)
- Chapter 25 digital watermarking technology based on Wavelet Transform
- Individual problem solution of the 298th round of force deduction
- 微信小程序影视评论交流平台系统毕业设计毕设(4)开题报告
猜你喜欢

word文档转markdown文档?

Atguigu---- filter

Mobile app test method
![[chapter 01 image defogging technology based on histogram optimization - full system matlab intelligent driving depth learning]](/img/ba/a63bd93812cabef82a187a361d8487.png)
[chapter 01 image defogging technology based on histogram optimization - full system matlab intelligent driving depth learning]

剑指offer 26:树的子结构

Common shortcut keys in Excel summary of shortcut keys in Excel

微信小程序影视评论交流平台系统毕业设计毕设(6)开题答辩PPT

基于DPDK的高效包处理系统

通信尾纤常用尾纤简介
Audio and video learning route and learning materials recommendation
随机推荐
Ansible 配置文件
Chapter 18 build a general video processing tool based on GUI matlab application GUI implementation
本周一问 | -leaf 这个属性的含义?
Chapter 25 digital watermarking technology based on Wavelet Transform
[Chapter 17 corner feature detection based on Harris -- actual combat of MATLAB machine learning project]
excel常用快捷键excel快捷键汇总
word中mathtype公式右编号右对齐
Mobile app test method
es-object vs nested vs has_ child and has_ parent
Google Earth engine (GEE) - line chart of time series image combining VCI index and TCI temperature (Guatemala and El Salvador as examples)
rt_thread线程管理
[essay] the Expo that studied the RN ecology for one day yesterday is really awesome. It works well from development and construction to deployment.
创建rt_thread线程
Shell脚本语法概览
sql server递归查询
[chapter 04 answer sheet recognition based on Hough change]
理财产品到期赎回时间是什么意思?
论文笔记: 多标签学习 ACkEL
Shadertoy realizes simple compass
Completion of graduation design of wechat small program film and television review and exchange platform system (4) opening report