当前位置：网站首页>Neon optimization 1: how to optimize software performance and reduce power consumption?

Neon optimization 1: how to optimize software performance and reduce power consumption?

2022-06-27 05:26:00 【To know】

NEON Optimize 1： Software performance optimization 、 How to reduce the power consumption of hardware ？

background

For mobile terminals or embedded devices and other scenarios, cutting-edge technologies can also be used , Products often have some complex algorithm models , But because of Algorithm overhead is too high , Resulting in poor real-time performance 、 High power consumption Other questions , Performance optimization at the end side is required .

How to do this without changing the effect of the algorithm , Reduce the time complexity of algorithm code , It has become a problem that many engineers have to face .

The basis of performance optimization MCPS and MIPS

First , Before performance optimization , A specific performance measure should be found , namely MIPS/MCPS.

MIPS：million instructions per second, The number of instructions consumed per second when the program is running
MCPS：million instructions per second, The number of cycles per second that the program is running

MIPS and MCPS The difference between

MIPS Is the number of instructions , There is little difference between soft imitation and hard imitation on different platforms
MCPS It's the number of cycles , Due to hardware optimization , There may be different platforms MCPS Different , Even better than MIPS Still small .

General soft imitation results ,MIPS All ratio MCPS Small , Because soft imitation tools RVDS Of CPI The minimum capacity is 1, Hard simulation results can be obtained directly MCPS Count . Hard imitation time , well CPU Can do CPI Less than 1, namely 1 Multiple cycle instructions , Specific view ：link.

With a single-execution-unit processor, the best CPI attainable is 1. However, with a multiple-execution-unit processor, one may achieve even better CPI values (CPI < 1).

MIPS Calculation DEMO

Directory structure ：

src
- main.c
  - void test(int* arr, int len);
- vpu.h
- vpu.s

Computational code ：

#include <stdio.h>

#define MIPS_COUNT_ARM_CORTEX

#ifdef MIPS_COUNT_ARM_CORTEX
#include "v7_pmu.h"
#endif

#ifdef MIPS_COUNT_ARM_CORTEX
#define MILLION_UNIT (1000000.f)
#define KILO_UNIT (1000.f)
#define FRAME_LEN_MS (10.f) // 10ms
#define COUNT_NUM 1000
unsigned int counter0;
unsigned int cycle_count1;
unsigned int cycle_count2;
unsigned int cur_time = 0;
long double cur_time_tmp = 0.0;
double avg_time = 0; 
unsigned long avg_time_tmp = 0;
unsigned int peak_time = 0;
float cycle2mips_coef = (1 / MILLION_UNIT) / (FRAME_LEN_MS / KILO_UNIT);  // unit: mips
#endif


void main(void) {
    
    // set mannual
    cnt = COUNT_NUM;

    while(cnt--) {
    
#ifdef MIPS_COUNT_ARM_CORTEX
        enable_pmu();                // Enable the PMU
        reset_ccnt();                // Reset the CCNT (cycle counter)
        reset_pmn();                 // Reset the configurable counters
        pmn_config(0, 0x03);         // Configure counter 0 to count event code 0x03
        enable_ccnt();               // Enable CCNT
        enable_pmn(0);               // Enable counter
        counter0 = read_pmn(0);      // Read counter 0
        cycle_count1 = read_ccnt();  // Read Core cycle
#endif

        // test();

#ifdef MIPS_COUNT_ARM_CORTEX
        cycle_count2 = read_ccnt();
        cur_time = cycle_count2 - cycle_count1;
        // 10^6 => million cycle, *1000/frmeLms => second
        cur_time_tmp = (float)cur_time * cycle2mips_coef; // mips
        avg_time_tmp += (unsigned int)cur_time_tmp;
        if (cur_time > peak_time) {
    
                peak_time = cur_time;
        }
        printf("%.2f mips \n", cur_time_tmp);
#endif
    }

#ifdef MIPS_COUNT_ARM_CORTEX
    avg_time = (double)avg_time_tmp / COUNT_NUM;
    printf("max %.2f mips \n", (float)peak_time * cycle2mips_coef);
    printf("avg %.2f mips \n", avg_time);
#endif
}

The module functions that calculate the overhead are usually placed in the related functions to test the overhead, such as test() Before and after , You can get the separate MIPS expenses , Of course , It can also be obtained by multiplying the overhead of the overall program by the proportion of the overhead of the related functions , But the calculation is inconvenient , It's not recommended here .

Test tools and processes

The tools needed

Soft copy testing tools usually use ARM The company's RVDS（RealView Development Suite） Development Kit , Simulate various kernel processes , Get the overhead data .

Hard copy test tools usually use Andriod Built in platform simpleperf Tools , Push the executable file directly to the mobile phone to run , Grab in real time CPU Data to get the actual cost data , And draw a diagram , Commonly known as flame diagram .

Soft and hard imitation optimization process

Soft copy process
- install RVDS Software
- Configure the code engineering environment
- Run through code
- Write overhead calculation code
- Simulation Profile
- Get the hotspot function and overhead baseline
- Code optimization
- Test hotspot function overhead
Hard copy process
- Similar to the soft copy process
- It is recommended to soft copy , Re hard imitation
- involves IO Read / write and other overhead issues , Soft simulation cannot simulate the actual operation , The hard imitation result shall prevail

With the hotspot overhead function , You can optimize the related instruction set and code .

Summary

This article shares the background and basic concepts of performance optimization , Time complexity calculation MCPS and MIPS, As well as test tools and software and hardware imitation process . Next share NEON Optimization cases and experiences .

原网站

版权声明
本文为[To know]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/178/202206270524032155.html