当前位置：网站首页>Flink production problems (1.10)

Flink production problems (1.10)

2022-06-27 04:58:00 【I had a good shampoo today】

1、TaskManager OOM
Place of occurrence ： It happened in flink To mysql Two phase commit phase
reason ① because checkpoint Interval time yes 5 Second , Large amount of data saved , And did not put json Removal of invalid data in data
reason ②flink Default memory allocation , A portion of the memory will be allocated to managed memory , But I don't use it in my code rocksDB State backend , So this part of memory is not needed , You need to adjust the parameters to increase the heap memory taskmanager.memory.managed.fraction=0
2、OOM GC Restart on TaskManager
because gc Long time , Lead to JobManager To TaskManager My heart is broken , At this point, restart another TaskManager, There will be a waste of startup time . You need to adjust the parameters to increase the heartbeat delay time heartbeat.timeout=300000
3、 Communication problems
Due to too few resources , Lead to jvm Failure to respond in a timely manner , Affect the TM and JM Communication for , You need to adjust the timeout akka.ask.timeout=500 s web.timeout=500000
4、 The business problem
Current business scenarios ：
One 、
① The upstream kafka There are many. partition, Every partition The data in is through canal monitor mysql Of binlog From the log , And the data is tabulated hash Values are assigned to different partition Medium . Therefore, the change data of each table must be in a certain kafka Of partition In order .
② At this point, our demand is , Orderly synchronize the incremental data of each table to the downstream mysql( The downstream mysql There will be three more fields in the table ,canal Of id, Data es, Type of data ), such as , If it is insert operation , Then downstream mysql Also perform once upsert operation , At this time, the saved data is upstream json In the data data data , At this time, the type is insert; If it is update operation , Then downstream mysql Also perform once upsert operation , At this time, the saved data is upstream json In the data data data , The type is update; If it is delete operation , Then downstream mysql Also perform once upsert operation , At this time, the saved data is upstream json In the data old data , The type is delete;
If there is upstream ddl change , Then the downstream also executes once ddl change .
③ From the above scenario , We must consume the change data of each table in sequence , But this leads to a problem ,subtask Uneven distribution of data , Because there are many table change messages , Some tables have few change messages , So through the table hash Values are assigned to different partition The data in will be different , Lead to flink Consumption time , The data of each degree of parallelism is also uneven .
④ If there is no upstream ddl Change message , We can use the table name + Primary keys are used to process data keyby, Try to make the data more scattered .
⑤ reflection ： The current business scenario , Because there is ddl change , How do we put each subtask The data is more uniform ？
d->c->ddl->b->a
hold ddl The previous data were keyby
hold ddl Then the data is processed keyby
It is equivalent to processing batch by batch ,
First, according to the table name keyby, Save a batch of data in each window , Then judge whether there is any in this batch of data ddl, If so, according to ddl Divided into multiple parts , If there is no such thing, it will be a whole ......？

Two 、
The target downstream table has a unique index

原网站

版权声明
本文为[I had a good shampoo today]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/178/202206270455267884.html

当前位置：网站首页>Flink production problems (1.10)

Flink production problems (1.10)

边栏推荐

猜你喜欢

随机推荐