当前位置:网站首页>Getting started with OpenMP
Getting started with OpenMP
2022-06-25 07:39:00 【AI little white dragon】
OpenMP yes Open MultiProcessing Abbreviation . Can be in Visual Studio perhaps gcc Use in .
Hello World
Save the following code as omp.cc
#include <iostream> #include <omp.h> int main() { #pragma omp parallel for for (char i = 'a'; i <= 'z'; i++) std::cout << i << std::endl; return 0; } |
then g++ omp.cc -fopenmp That's all right.
Parallelization of loops
OpenMP Our designers hope to provide a simple way for programmers to write multithreaded programs without knowing how to create and destroy threads . So they designed some pragma, Instructions and functions to enable the compiler to insert threads in the right place. Most loops only need to be inserted in for Insert a... Before pragma Parallelization can be realized . and , By leaving these annoying details to the compiler , You can spend more time deciding where to multithread and optimize data structures
The following example puts 32 Bit RGB Color changes to 8 Bit grayscale data , You just need to for Add a sentence before pragma Parallelization can be realized
#pragma omp parallel for for (int i = 0; i < pixelCount; i++) { grayBitmap[i] = (uint8_t)(rgbBitmap[i].r * 0.229 + rgbBitmap[i].g * 0.587 + rgbBitmap[i].b * 0.114); } |
Amazing , First , This example uses “work sharing”, When “work sharing” Used in for In the cycle , Each loop is assigned to a different thread , And promise to execute only once .OpenMP Determines how many threads need to be opened , Destroy and create , All you have to do is tell OpenMP Where to be threaded .
OpenMP There are five requirements for multithreaded loops :
Loop variables ( Namely i) Must be a signed integer , None of the others .
The comparison condition of the loop must be < <= > >= One of the
The increment part of the loop must be increased or decreased by a constant value ( That is, each cycle is constant ).
If the comparison symbol is < <=, So each cycle i Should increase , On the contrary, it should be reduced
The loop must be something that is not weird , You cannot jump from an internal loop to an external loop ,goto and break Can only jump inside a loop , The exception must be caught inside the loop .
If your cycle does not meet these conditions , Then we have to rewrite
Check if it supports OpenMP
#ifndef _OPENMP fprintf(stderr, "OpenMP not supported"); #endif |
Avoid data dependency and competition
When a loop satisfies the above five conditions , It is still possible that the parallelization cannot be reasonably realized due to data dependency . When there is a data dependency between two different iterations , That's what happens .
// Assume that the array has been initialized to 1 #pragma omp parallel for for (int i = 2; i < 10; i++) { factorial[i] = i * factorial[i-1]; } |
The compiler will multithread the loop , But it can not achieve the acceleration effect we want , The resulting array contains the wrong structure . Because each iteration depends on a different iteration , This is called the race condition . To solve this problem, we can only rewrite the loop or choose a different algorithm .
Race conditions are difficult to detect , Because it is also possible that the program is executed in the order you want .
Manage public and private data
Basically every loop reads and writes data , Determine which data is common between threads , It is the responsibility of the programmer that the data is private to the thread . When data is set to public , All threads access the same memory address , When data is made private , Each thread has its own copy . By default , Except for loop variables , All data is set to be public . There are two ways to make variables private :
Declare variables inside the loop , Be careful not to be static Of
adopt OpenMP Directive declares private variables
// The following example is wrong int temp; // Declared outside the loop #pragma omp parallel for for (int i = 0; i < 100; i++) { temp = array[i]; array[i] = doSomething(temp); } |
It can be corrected in the following two ways
// 1. Declare variables inside the loop #pragma omp parallel for for (int i = 0; i < 100; i++) { int temp = array[i]; array[i] = doSomething(temp); } |
// 2. adopt OpenMP Instructions describe private variables int temp; #pragma omp parallel for private(temp) for (int i = 0; i < 100; i++) { temp = array[i]; array[i] = doSomething(temp); } |
Reductions
A common loop is to accumulate variables , Regarding this ,OpenMP There are special statements
For example, the following procedure :
int sum = 0; for (int i = 0; i < 100; i++) { sum += array[i]; // sum You need to be private to parallelize , But it must be public to produce the correct results } |
In the above program ,sum Neither public nor private is right , To solve this problem ,OpenMP Provides reduction sentence ;
int sum = 0; #pragma omp parallel for reduction(+:sum) for (int i = 0; i < 100; i++) { sum += array[i]; } |
Internal implementation ,OpenMP Private... Is provided for each thread sum Variable , When the thread exits ,OpenMP Then add the parts of each thread together to get the final result .
Of course ,OpenMP You can do more than just add up , All cumulative operations are possible , The following table :
Cyclic scheduling
Load balancing is the most important factor affecting performance in multithreaded programs , Only load balancing can ensure that all cores are busy , There will be no idle time . Without load balancing , Some threads end much earlier than others , The possibility of processor idle waste optimization .
In circulation , It is often due to the large time difference between each iteration and the destruction of load balance . You can usually check the source code to see if the loop changes . In most cases, each iteration may find approximately the same time , When this condition cannot be met , You may be able to find a subset that took approximately the same amount of time . for example , Sometimes all even numbered loops take the same time as all odd numbered loops , Sometimes it is possible that the first half of the cycle and the second half of the cycle take similar time . On the other hand , Sometimes you may not find a set of loops that take the same amount of time . Anyway , You should provide this information to OpenMP, In this way, we can make OpenMP There is a better chance to optimize the loop .
By default ,OpenMP Think that all loop iterations run at the same time , And that leads to this OpenMP Will divide different iterations equally into different cores , And let them be distributed to minimize memory access conflicts , This is because loops typically access memory linearly , Therefore, allocating the loop according to the first half and the second half can minimize the conflict . However, this is probably the best way to access memory , But load balancing may not be the best way , And conversely, the best load balancing may also destroy memory access . Therefore, a compromise must be made .
OpenMP Load balancing uses the following syntax
#pragma omp parallel for schedule(kind [, chunk size]) |
among kind These can be the following types , and chunk size Must be a circularly invariant positive integer
Example
#pragma omp parallel for for (int i = 0; i < numElements; i++) { array[i] = initValue; initValue++; } |
Obviously, there is a race condition in this cycle , Each loop depends on initValue This variable , We need to get rid of it .
#pragma omp parallel for for (int i = 0; i < numElements; i++) { array[i] = initValue + i; } |
That's it , Because now we have not let initValue To be dependent
therefore , For a loop , We should try our best to loop-variant Variables are built on i On .
边栏推荐
- shell小技巧(一百三十四)简单的键盘输入记录器
- How comfortable it is to use Taijiquan to talk about distributed theory!
- CPDA | how to start the growth path of data analysts?
- [pytest] modify the logo and parameterization in the allure Report
- What if there is no point in data visualization?
- Without "rice", you can cook "rice". Strategy for retrieving missing ground points under airborne lidar forest using "point cloud intelligent mapping"
- realsense d455 semantic_slam实现语义八叉树建图
- Ca-is1200u current detection isolation amplifier has been delivered in batch
- VectorDraw Web Library 10.10
- Tempest HDMI leak receive 1
猜你喜欢
Ltpowercad II and ltpowerplanner III
Common functions of OrCAD schematic
Full range of isolator chips with integrated isolated power supply
Three years of continuous decline in revenue, Tiandi No. 1 is trapped in vinegar drinks
Selection of Hongmeng page menu
Chuantuwei ca-is3720lw alternative material No. iso7820fdw
ELK + filebeat日志解析、日志入库优化 、logstash过滤器配置属性
Domestic MCU perfectly replaces STM chip model of Italy France
对链表进行插入排序[dummy统一操作+断链核心--被动节点]
[introduction to UVM== > episode_9] ~ register model, integration of register model, general methods of register model, application scenarios of register model
随机推荐
MySQL facet 01
Introduction to Sichuan Tuwei ca-is3082w isolated rs-485/rs-422 transceiver
TEMPEST HDMI泄漏接收 2
[batch dos-cmd command - summary and summary] - commands related to Internet access and network communication (Ping, Telnet, NSLOOKUP, ARP, tracert, ipconfig)
Three years of continuous decline in revenue, Tiandi No. 1 is trapped in vinegar drinks
Sichuan Tuwei ca-is3105w fully integrated DC-DC converter
This year, I graduated
Chuantu microelectronics 𞓜 subminiature package isolated half duplex 485 transceiver
Hanxin's trick: consistent hashing
smartBugs安装小问题总结
Domestic MCU perfectly replaces STM chip model of Italy France
基于地面点稀少的LiDAR点云的茂密森林蓄积量估算
Modular programming of digital light intensity sensor module gy-30 (main chip bh1750fvi) controlled by single chip microcomputer (under continuous updating)
TEMPEST HDMI泄漏接收 1
OAuth 2.0一键登录那些事
Tempest HDMI leak receive 1
【批處理DOS-CMD命令-匯總和小結】-cmd擴展命令、擴展功能(cmd /e:on、cmd /e:off)
Path planner based on time potential function in dynamic environment
【批处理DOS-CMD命令-汇总和小结】-应用程序启动和调用、服务和进程操作命令(start、call、)
Harmony美食菜单界面