当前位置:网站首页>Adaptive batch job scheduler: automatically derive parallelism for Flink batch jobs
Adaptive batch job scheduler: automatically derive parallelism for Flink batch jobs
2022-07-16 05:12:00 【Robert's house of Technology】
01 introduction
For most users , by Flink It is not easy for operators to configure proper parallelism . For batch jobs , A small degree of parallelism will cause the job to run for a long time , Slow recovery , And unnecessary large parallelism will lead to a waste of resources , Task deployment and data shuffle The cost will also increase .
In order to control the execution time of batch jobs , The parallelism of an operator should be proportional to the amount of data it needs to process . The user needs to configure the parallelism by estimating the amount of data to be processed by the operator . But it is very difficult to accurately estimate the amount of data that the operator needs to process : The amount of data that needs to be processed may change every day , There may be a large number of UDF And complex operators make it difficult to judge the amount of data they produce .
To solve this problem , We are Flink 1.15 A new scheduler is introduced in : Adaptive batch job scheduler (Adaptive Batch Scheduler). The adaptive batch job scheduler will automatically derive the parallelism according to the actual amount of data each operator needs to process when the job is running . It will bring the following benefits :
Greatly reduce the complexity of batch job concurrency tuning ;
Different parallelism can be configured for different operators according to the amount of data processed , This is applicable to those that can only configure global parallelism SQL Homework is especially helpful ;
It can better adapt to the daily changing data volume .
02 usage
send Flink Automatically deduce the parallelism of operators , The following configuration is required :
Enable adaptive batch job scheduler ;
The parallelism of the configuration operator is -1.
2.1 Enable adaptive batch job scheduler
Enable adaptive batch job scheduler &#x
边栏推荐
- ObjectARX select entities to create block references
- Quickly teach you how to build a data-driven automated testing framework?
- Class notes (3) example (2) -567 Beanfeast
- 边缘计算 KubeEdge+EdgeMash
- fiddler和charles拦截并修改请求和返回值
- 系统总出故障怎么办,或许你该学学稳定性建设!
- P1664 每日打卡心情好【入门】
- See you in Chengdu, starrocks! How can enterprises create a new paradigm of rapid and unified data analysis to help businesses upgrade in an all-round way
- PHP basics explain PHP Basics
- 这几款手机安全浏览器,好用不止一点点
猜你喜欢

快速教你如何搭建数据驱动自动化测试框架?

Is the sub database and sub table really suitable for your system? Talk about how to select sub databases, sub tables and newsql

数字孪生技术打造智慧矿山可视化应用

C语言自定义类型详解 —— 结构体、枚举、联合体

番茄定时调光台灯触摸芯片-DLT8T10S-杰力科创

Skiasharp's WPF self drawn clock (case version)

普通浏览器会泄露信息吗?使用安全浏览器如何做到隐私保护?

美团基于 Flink 的实时数仓平台建设新进展

IE浏览器怎么查看cookie

How to deal with the five obstacles of teamwork
随机推荐
yandex bot user agent
关于mysql group_concat不得不说的事
数字孪生解决方案为化工园区建设注入新动能
QT project summary record
驾照科目一常考知识点
c语言基础篇:扫雷
Basic part of C language: minesweeping
C language foundation: n chess
Class notes (3) example (2) -567 Beanfeast
Dynamic programming leetcode509 Fibonacci number
分布式单体的六大病症
安装pycharm
735. Planetary collision: simple stack simulation application problem
水墨云怎么样?
ObjectARX select entities to create block references
如何理解套接字的形容词前缀:“面向连接”与“无连接”
See you in Chengdu, starrocks! How can enterprises create a new paradigm of rapid and unified data analysis to help businesses upgrade in an all-round way
Mysql-MVCC
开鸿智谷 Niobe 407 正式并入OpenHarmony代码主干
golang开发需要掌握的核心包以及中间件,涵盖项目的各个领域,值得收藏