当前位置：网站首页>Flink checkpoint configuration details

Flink checkpoint configuration details

2022-07-24 06:19:00 【sf_ www】

If it's all set , Then the code will be overwritten flink-conf.yaml Configuration in

Set... In the code

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
// Turn on checkpoint Every time 5000ms once
env.enableCheckpointing(5000);
// Set the mode only once At present, we support EXACTLY_ONCE/AT_LEAST_ONCE
env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);
// Set up checkpoint Storage location
env.getCheckpointConfig().setCheckpointStorage("hdfs:///flink/checkpoints");
// Set up savepoint Storage location
env.setDefaultSavepointDirectory("hdfs:///flink/checkpoints");
// Set up checkpoint Timeout for Namely a checkpoint Must be completed within this time Or throw away
env.getCheckpointConfig().setCheckpointTimeout(600000);
// Set twice checkpoint The minimum time interval between
env.getCheckpointConfig().setMinPauseBetweenCheckpoints(500);
// Set up concurrency checkpoint Number of
env.getCheckpointConfig().setMaxConcurrentCheckpoints(1);
// Turn on checkpoints External persistence of I've set it up here eliminate job Keep it checkpoint
// At present, the code cannot set reserved checkpoint Number Leave one by default If you want to keep 3 individual
// Can be in flink-conf.yaml Middle configuration state.checkpoints.num-retained: 3
env.getCheckpointConfig().enableExternalizedCheckpoints(CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);

flink-conf.yaml Middle configuration

execution.checkpointing.interval: 5000
execution.checkpointing.mode: EXACTLY_ONCE
state.backend: filesystem
state.checkpoints.dir: hdfs:///flink/checkpoints
state.savepoints.dir: hdfs:///flink/checkpoints
execution.checkpointing.timeout: 600000
execution.checkpointing.min-pause: 500
execution.checkpointing.max-concurrent-checkpoints: 1
state.checkpoints.num-retained: 3
execution.checkpointing.externalized-checkpoint-retention: RETAIN_ON_CANCELLATION

More configuration items on the official website are attached here ：

https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/config/

Inside Checkpoints and State Backends and Checkpointing Under item

Configuration optimization

Checkpoint The time interval is not easy to be too large . Generally speaking ,Checkpoint The longer the time interval , To be produced State The greater the . In this way , When recovery fails , It takes longer to catch up .

Checkpoint The time interval is not easy to be too small . If Checkpoint The time interval is too small , that Flink Applications will be frequent Checkpoint, As a result, some resources are occupied , Unable to concentrate on data processing .

Checkpoint The time interval is greater than Checkpoint Production time . When Checkpoint Time interval ratio Checkpoint Long production time , Last time Checkpoint When finished , Will not immediately proceed to the next Checkpoint, But will wait for a period of time , Then we can do new Checkpoint. otherwise , Every time Checkpoint When finished , Will immediately start the next Checkpoint, Many resources of the system will be Checkpoint Occupy , And the real task computing resources will be less .

Enable local recovery . If Flink State It's big , In recovery , Need to read from remote storage State Resume , If State File is too large. , This may cause the task to recover slowly , A lot of time is wasted on network transmission . Now you can set Flink Application local State recovery , Applications State Local recovery is not enabled by default , Parameters can be set state.backend.local-recovery The value is true Activate , In general, you don't need .

Set up Checkpoint Save number .Checkpoint By default, the saved number is 1, That is, only the latest Checkpoint Of State file , When doing State When you recover , If the latest Checkpoint When the file is not available ( Such as file damage or other reasons ), that State Recovery will fail , If you set Checkpoint Save number 3, Even the latest Checkpoint Recovery failed , that Flink It will also roll back to the previous Checkpoint State file for recovery . Considering this situation , Can pass state.checkpoints.num-retained Set up Checkpoint Save number .