当前位置:网站首页>Halide:: generator instructions
Halide:: generator instructions
2022-07-24 02:02:00 【Kangzai】
Halide::Generator Generator instructions
We have been able to use Halide To call OpenCL Back end , But our code is always used after establishing the calculation diagram and scheduling realize Instantiate to run .
This is obviously for us AI Model optimization kernel function 、 These operations have a great impact when reasoning runs . And it can't be stripped out Halide This huge dependency Library , Lead to
The interface is too complex due to too many dependencies in the final reasoning .
1、 Preface
The main work of this time is to try Halide::Generator Generator to generate kernel functions and how to use different back-end experiments .
The key information is as follows :
- Multi calculation diagram
- Dispatch gpu constraint
- Input and output definition information
- Automatic scheduling operation
- Generator call example
2、 Core code overview
#include <stdio.h>
#include "Halide.h"
#include "HalideBuffer.h"
#include "clock.h"
using namespace Halide;
using namespace Halide::Tools;
// Try kernel generation .
class image_f4:public Halide::Generator<image_f4>{
public:
// Define input buffer, Picture data , Three dimensional data
Input<Buffer<uint8_t>> input{"input", 3};
// Define the output buffer, Picture data , Three dimensional data , both shape Exactly the same .
Output<Buffer<uint8_t>> output{"output", 3};
void generate()
{
Var x, y, c, i, ii, xo, yo, xi, yi;
Func lut;
Func curved;
Func padded;
lut(i) = cast<uint8_t>(clamp(pow(i / 255.0f, 1.2f) * 255.0f, 0, 255));
// Augment the input with a boundary condition.
padded(x, y, c) = input(clamp(x, 0, input.width() - 1),
clamp(y, 0, input.height() - 1), c);
// Cast it to 16-bit to do the math.
Func padded16;
padded16(x, y, c) = cast<uint16_t>(padded(x, y, c));
// Next we sharpen it with a five-tap filter.
Func sharpen;
sharpen(x, y, c) = (padded16(x, y, c) * 2 -
(padded16(x - 1, y, c) +
padded16(x, y - 1, c) +
padded16(x + 1, y, c) +
padded16(x, y + 1, c)) /
4);
curved(x, y, c) = lut(sharpen(x, y, c));
lut.compute_root();
Var block, thread;
lut.split(i, block, thread, 16);
lut.gpu_blocks(block)
.gpu_threads(thread);
output(x, y, c) = curved(x, y, c);
}
void schedule()
{
/* THE SCHEDULE */
// input.set_estimates({
{0, 1024}, {0, 1024}, {1, 3}});
// output.set_estimates({
{0, 1024}, {0, 1024}, {1, 3}});
}
};
HALIDE_REGISTER_GENERATOR(image_f4, image_f4)
int main(int argc, char **argv) {
return Halide::Internal::generate_filter_main(argc, argv, std::cerr);
}
3、 Specific instructions and precautions
3.1、 Generator use process
- 1、 Build based on parent Halide::Generator Core generator for
class image_f4:public Halide::Generator<image_f4>{}
- 2、 Declare output input buffer Information
// Define input buffer, Picture data , Three dimensional data
Input<Buffer<uint8_t>> input{"input", 3};
// Define the output buffer, Picture data , Three dimensional data , both shape Exactly the same .
Output<Buffer<uint8_t>> output{"output", 3};
- 3、 Declare the kernel function halide The realization of calculation chart
void generate()
{
Var x, y, c, i, ii, xo, yo, xi, yi;
Func lut;
Func curved;
Func padded;
lut(i) = cast<uint8_t>(clamp(pow(i / 255.0f, 1.2f) * 255.0f, 0, 255));
// Augment the input with a boundary condition.
padded(x, y, c) = input(clamp(x, 0, input.width() - 1),
clamp(y, 0, input.height() - 1), c);
// Cast it to 16-bit to do the math.
Func padded16;
padded16(x, y, c) = cast<uint16_t>(padded(x, y, c));
// Next we sharpen it with a five-tap filter.
Func sharpen;
sharpen(x, y, c) = (padded16(x, y, c) * 2 -
(padded16(x - 1, y, c) +
padded16(x, y - 1, c) +
padded16(x + 1, y, c) +
padded16(x, y + 1, c)) /
4);
curved(x, y, c) = lut(sharpen(x, y, c));
/*----------- Custom scheduling -----------*/
lut.compute_root();
Var block, thread;
lut.split(i, block, thread, 16);
lut.gpu_blocks(block)
.gpu_threads(thread);
/*----------- Custom scheduling -----------*/
output(x, y, c) = curved(x, y, c);
}
- 4、 Automatic scheduling settings ( Optional )
void schedule()
{
/* THE SCHEDULE */
// input.set_estimates({
{0, 1024}, {0, 1024}, {1, 3}});
// output.set_estimates({
{0, 1024}, {0, 1024}, {1, 3}});
}
- 5、 Register code generation operation
HALIDE_REGISTER_GENERATOR(image_f4, image_f4)
// Subsequently passed argv Pass parameters to generate image_f4 Kernel Implementation
int main(int argc, char **argv) {
return Halide::Internal::generate_filter_main(argc, argv, std::cerr);
}
- 6、 Command line operations
if [ ! -d "./halide_generate_file" ]; then
mkdir halide_generate_file
else
rm -rf halide_generate_file/*
fi
# Suppose the overview code is compiled into test Executable program
# target=x86-64-linux-opencl -r GPU The setting of these parameters determines that the target platform will inevitably use opencl Realization
./test -g image_f4 -e c_header,c_source -o halide_generate_file target=x86-64-linux-opencl -r GPU
# Then it will be in halide_generate_file Generate relevant codes under the folder
3.2、 Overview of generated code content
- opencl The kernel code is as follows :
/* OpenCL C x86-64-linux-opencl*/
#pragma OPENCL FP_CONTRACT ON
inline float float_from_bits(unsigned int x) {
return as_float(x);}
inline float nan_f32() {
return NAN; }
inline float neg_inf_f32() {
return -INFINITY; }
inline float inf_f32() {
return INFINITY; }
inline bool is_nan_f32(float x) {
return isnan(x); }
inline bool is_inf_f32(float x) {
return isinf(x); }
inline bool is_finite_f32(float x) {
return isfinite(x); }
#define sqrt_f32 sqrt
#define sin_f32 sin
#define cos_f32 cos
#define exp_f32 exp
#define log_f32 log
#define abs_f32 fabs
#define floor_f32 floor
#define ceil_f32 ceil
#define round_f32 round
#define trunc_f32 trunc
#define pow_f32 pow
#define asin_f32 asin
#define acos_f32 acos
#define tan_f32 tan
#define atan_f32 atan
#define atan2_f32 atan2
#define sinh_f32 sinh
#define asinh_f32 asinh
#define cosh_f32 cosh
#define acosh_f32 acosh
#define tanh_f32 tanh
#define atanh_f32 atanh
#define fast_inverse_f32 native_recip
#define fast_inverse_sqrt_f32 native_rsqrt
#define halide_unused(x)
__kernel void _at_least_one_kernel(int x) {
}
// Address spaces for _kernel_f0_s0_v3_v9___block_id_x
#define __address_space__f0 __global
__kernel void _kernel_f0_s0_v3_v9___block_id_x(
__address_space__f0 uchar *restrict _f0,
__local int16* __shared)
{
int _f0_s0_v3_v9___block_id_x = get_group_id(0);
int ___thread_id_x = get_local_id(0);
int _0 = _f0_s0_v3_v9___block_id_x * 16;
int _1 = _0 + ___thread_id_x;
float _2 = (float)(_1);
float _3 = float_from_bits(998277249 /* 0.00392157 */);
float _4 = _2 * _3;
float _5 = float_from_bits(1067030938 /* 1.2 */);
float _6 = pow_f32(_4, _5);
float _7 = float_from_bits(1065353216 /* 1 */);
float _8 = min(_6, _7);
float _9 = float_from_bits(0 /* 0 */);
float _10 = max(_8, _9);
float _11 = float_from_bits(1132396544 /* 255 */);
float _12 = _10 * _11;
uchar _13 = (uchar)(_12);
_f0[_1] = _13;
} // kernel _kernel_f0_s0_v3_v9___block_id_x
#undef __address_space__f0
4、 Follow up plan and arrangement
Here we are , We can already use halide Continue the generation of kernel function , But we still need to go through the process of how to use kernel functions .
边栏推荐
- Is Huatai Securities safe to open an account? How to handle it
- NetCore-如何保证ICollection或List私有化不被外部修改?
- 145-keep-alive的初步使用
- 暑假第三周
- Ora-12899 error caused by nchar character
- Win11 highlights of win11 system
- Exchange 2010通配符SSL证书安装文档
- About routing
- On the possibility and limitation of defi in the metauniverse
- Is it safe for Huatai Securities to open an account? Is it true? Is it formal
猜你喜欢

jenkins多任務並發構建
![[untitled]](/img/36/8dc8aa76fbcd7fdb86cd0b7b4338c7.jpg)
[untitled]

Decrypt redis to help the e-commerce seckill system behind the double 11
![[code case] website confession wall & to do list (including complete source code)](/img/90/c98295ce16551c775380ad6a912956.png)
[code case] website confession wall & to do list (including complete source code)

浅谈元宇宙中DeFi的可能性和局限性

原生组件、小程序与客户端通信原理、video、map、canvas、picker等运行原理

MD5 encryption and decryption website test, is MD5 encryption still safe?

解决script标签写在元素节点前面无法获取元素节点的问题

Phantom core is about to close? Is there a future for digital collections?

医院无线网络系统设计
随机推荐
暑假第三周
Draw pictures with canvas
Mysql database authorization learning
141. Circular linked list
深入了解-微信开发者工具
Some ideas and skills suitable for pinduoduo small business accessories
Jenkins multitâche construction simultanée
文心大模型扬起新“帆”,产业应用大潮已至
[重要通知]星球线上培训第三期来袭!讲解如何在QTYX上构建自己的量化策略!...
Detailed explanation of php7 garbage collection mechanism
Spark partition operators partitionby, coalesce, repartition
Phpcms realizes product multi condition screening function
Mysql database UDF authorization learning
Decrypt redis to help the e-commerce seckill system behind the double 11
Review of HCIA
Non boost ASIO notes: UDP UART socketcan multicast UDS
Win11 highlights of win11 system
NLP introduction + practice: Chapter 1: deep learning and neural network
Is software testing still popular in 2022?
Ora-12899 error caused by nchar character