当前位置:网站首页>To implement a task scheduling system, it is enough to read this article

To implement a task scheduling system, it is enough to read this article

2022-06-23 03:05:00 Yongge Java actual combat sharing

Read an article 「 Timing task framework selection 」 When , A netizen's message electric Here I am :

I have seen so many so-called tutorials , Most of them teach “ How to use tools ” Of , Not much is taught “ How to make tools ” Of , Able to teach “ How to copy tools ” Are already rare , China Software industry , What is missing is that you can “ Production tools ” The programmer , And there is absolutely no shortage of those “ Using tools ” The programmer ! ...... ” The last thing the industry needs is “ Will use XX Tool Engineer ”, It is “ Creative Software Engineers ”! All jobs in the industry , Nature is “ Creative Software Engineers ” Provided !

Write this article , I want to talk about task scheduling from head to toe , I hope that after reading , Be able to understand the core logic of a task scheduling system .

1 Quartz

Quartz Is a Java Open source task scheduling framework , And a lot of Java Engineers are exposed to the starting point of task scheduling .

The following figure shows the overall process of task scheduling :

Quartz The core of is three components .

  • Mission :Job Used to represent a scheduled task ;
  • trigger :Trigger The element that defines the scheduling time , That is, according to what time rule to perform the task . One Job Can be multiple Trigger relation , But one Trigger Only one... Can be associated Job;
  • Scheduler : Factory class creation Scheduler, Schedule the task according to the time rules defined by the trigger .

In the code above Quartz Of JobStore yes RAMJobStore,Trigger and Job Stored in memory .

The core class for executing task scheduling is QuartzSchedulerThread .

  1. Scheduling threads from JobStore Get the trigger list to be executed in , And modify the state of the trigger ;
  2. Fire trigger , Modify trigger information ( The next time the trigger is executed , And trigger status ), And store it .
  3. Finally, create a specific execution task object , adopt worker Thread pool execution task .

Let's talk about Quartz Cluster deployment scheme based on XML .

Quartz Cluster deployment scheme based on XML , Different database types are required (MySQL , ORACLE) Create... On the database instance Quartz surface ,JobStore yes : JobStoreSupport .

This scheme is distributed , There is no node responsible for centralized management , Instead, the database row level lock is used to realize the concurrency control in the cluster environment .

scheduler The instance is first obtained in the cluster mode {0}LOCKS Row locks in tables ,Mysql Statement to obtain row lock :

{0} Will be replaced with the default configuration of the configuration file QRTZ_.sched_name Is the instance name of the application cluster ,lock_name It is the row level lock name .Quartz There are mainly two row level lock triggers to access locks (TRIGGER_ACCESS) and State access lock (STATE_ACCESS).

This architecture solves the problem of distributed task scheduling , Only one node can run on the same task , Other nodes will not perform tasks , When it comes to a lot of short tasks , Each node frequently competes for database locks , The more nodes, the worse the performance .

2 Distributed lock mode

Quartz The cluster mode of can be expanded horizontally , It can also be distributed , But the business party needs to add the corresponding table in the database , It is highly invasive to some extent .

Many R & D students want to avoid this kind of invasion , Also explored Distributed lock mode .

Business scenario : E-commerce projects , The user has not paid for a period of time after placing an order , The system will close the order after the timeout .

Usually we do a scheduled task every two minutes to check the orders for the first half hour , Query the order list without payment , Then restore the inventory of the goods in the order , Then set the order as invalid .

We use Spring Schedule The way to do a scheduled task .

@Scheduled(cron = "0 */2 * * * ? ")
public void doTask() {
   log.info(" Timed tasks start ");
   // Execute the operation of closing the order 
   orderService.closeExpireUnpayOrders();
   log.info(" The timed mission ends ");
 }

The single server runs normally , Considering high availability , Business volume surges , The architecture will evolve into a cluster mode , stay At the same time There are multiple services performing a scheduled task , It may lead to business disorder .

The solution is when the task is executed , Use Redis Distributed locks are used to solve such problems .

@Scheduled(cron = "0 */2 * * * ? ")
public void doTask() {
    log.info(" Timed tasks start ");
    String lockName = "closeExpireUnpayOrdersLock";
    RedisLock redisLock = redisClient.getLock(lockName);
    // Try to lock , Waiting for the most 3 second , After the lock 5 Minutes auto unlock 
    boolean locked = redisLock.tryLock(3, 300, TimeUnit.SECONDS);
    if(!locked){
        log.info(" Distributed lock not obtained :{}" , lockName);
        return;
    }
    try{
       // Execute the operation of closing the order 
       orderService.closeExpireUnpayOrders();
    } finally {
       redisLock.unlock();
    }
    log.info(" The timed mission ends ");
}

Redis Excellent read and write performance , Distributed locks are also better than Quartz Database row level locks are more lightweight . Of course Redis The lock can also be replaced by Zookeeper lock , The same mechanism .

In small projects , Use : Timing task framework (Quartz/Spring Schedule) and Distributed lock (redis/zookeeper) It has a good effect .

But what? ? We can see that this combination has two problems :

  1. The timed task has a runaway in the distributed scene , And the task can not be divided into pieces ;
  2. To trigger a task manually , Additional code must be added to complete .

3 ElasticJob-Lite frame

ElasticJob-Lite Positioning as a lightweight, decentralized solution , Use jar Provides coordination services for distributed tasks in the form of .

The application defines the task class internally , Realization SimpleJob Interface , Write the actual business process of your task .

public class MyElasticJob implements SimpleJob {
    @Override
    public void execute(ShardingContext context) {
        switch (context.getShardingItem()) {
            case 0:
                // do something by sharding item 0
                break;
            case 1:
                // do something by sharding item 1
                break;
            case 2:
                // do something by sharding item 2
                break;
            // case n: ...
        }
    }
}

give an example : application A There are five tasks to perform , Namely A,B,C,D,E. Mission E It needs to be divided into four sub tasks , The application is deployed on two machines .

application A After the start , 5 A task passed Zookeeper After coordination, it is allocated to two machines , adopt Quartz Scheduler Perform different tasks separately .

ElasticJob essentially , The underlying task is scheduled through Quartz , comparison Redis Distributed lock perhaps Quartz Distributed deployment , Its advantage is that it can rely on Zookeeper This big killer , The tasks are distributed to the... In the application through the load balancing algorithm Quartz Scheduler Containers .

From the user's point of view , It's very easy to use . But in terms of Architecture , The scheduler and the actuator are still on the same application side JVM Inside , And after the container is started , Load balancing is still needed . If the application restarts frequently , Constantly choose the Lord , Load balance the partitions , These are relative comparisons heavy The operation of .

in addition ,ElasticJob The console is rough , Display the job status by reading the registry data , Update the registry data and modify the global task configuration .

4 Centralization school

The principle of centralization is : Scheduling and task execution , Separate into two parts : Dispatch centers and actuators . The scheduling center module only needs to be responsible for task scheduling attributes , Trigger scheduling command . The actuator receives the scheduling command , To execute specific business logic , And both can be expanded in a distributed way .

4.1 MQ Pattern

First of all, let's talk about the first centralized structure I came into contact with in elong's promotion team .

Dispatch center dependency Quartz Cluster pattern , When the task is scheduled , Send a message to RabbitMQ . After the business application receives the task message , Consumption task information .

This model makes full use of MQ The characteristics of decoupling , Dispatch center sends tasks , The role of the application as an actuator , Receive the task and execute .

But this design relies heavily on message queuing , Scalability and functionality , The system load is greatly related to the message queue . This architectural design requires architects to be very familiar with message queuing .

4.2 XXL-JOB

XXL-JOB Is a distributed task scheduling platform , Its core design goal is rapid development 、 Learn easy 、 Lightweight 、 Easy to expand . Now open source and access to a number of companies online product lines , Open the box .

xxl-job 2.3.0 Architecture diagram

Let's focus on the architecture diagram :

▍ Network communication server-worker Model

Dispatch centers and actuators The communication between the two modules is server-worker Pattern . The dispatch center itself is a SpringBoot engineering , Start will listen 8080 port .

After the actuator is started , Will start the built-in service ( EmbedServer ) monitor 9994 port . So both sides can send orders to each other .

How does the dispatch center know the address information of the actuator ? Above picture , The actuator will send the registration command regularly , In this way, the dispatching center can obtain the online actuator list .

Through the actuator list , The node can be selected to execute the task according to the routing policy configured by the task . There are three common routing strategies :

  • Random node execution : Select an available execution node in the cluster to execute the scheduling task . Applicable scenario : Offline order settlement .
  • Broadcast execution : In the cluster, all the execution nodes distribute and execute the scheduling tasks . Applicable scenario : Batch update application local cache .
  • Shard to perform : Split according to user-defined slicing logic , Distributed to different nodes in the cluster for parallel execution , Improve the efficiency of resource utilization . Applicable scenario : Massive log statistics .

▍ Scheduler

The scheduler is the core component of the task scheduling system .XXL-JOB The early versions of depend on Quartz.

But in v2.1.0 The version completely removes Quartz Dependence , The original needs to be created Quartz The table is also replaced by a self-developed table .

The core scheduling class is :JobTriggerPoolHelper . call start After the method , Will start two threads :scheduleThread and ringThread .

First scheduleThread The task to be scheduled will be loaded from the database on a regular basis , In essence, only one scheduling center node triggers task scheduling based on database row lock .

Connection conn = XxlJobAdminConfig.getAdminConfig()
                  .getDataSource().getConnection();
connAutoCommit = conn.getAutoCommit();
conn.setAutoCommit(false);
preparedStatement = conn.prepareStatement(
"select * from xxl_job_lock where lock_name = 'schedule_lock' for update");
preparedStatement.execute();
#  Trigger task scheduling  ( Pseudo code )
for (XxlJobInfo jobInfo: scheduleList) {
  //  Omit code 
}
#  Transaction submission 
conn.commit();

The scheduling thread will be based on the 「 Next trigger time 」, Take different actions :

Overdue tasks that need to be executed immediately , Directly put it into the thread pool to trigger execution , The tasks that need to be performed in five seconds are put into ringData In the object .

ringThread After starting , From time to time ringData Object to get the list of tasks to be executed , Put it into the thread pool to trigger execution .

5 Self study on the shoulders of giants

2018 year , I have a self-developed experience in task scheduling system .

The background is : Developed by the compatible technology team RPC frame , The technical team doesn't need to change the code ,RPC Annotation methods can be hosted in the task scheduling system , Perform directly as a task .

In the process of self research , Have studied XXL-JOB Source code , At the same time, Alibaba cloud distributed task scheduling SchedulerX Absorbed a lot of nutrition .

SchedulerX 1.0 Architecture diagram

  • Schedulerx-console It is the console for task scheduling , Used to create 、 Manage scheduled tasks . Responsible for data creation 、 Modification and query . Within the product with schedulerx-server Interaction .
  • Schedulerx-server It is the server of task scheduling , yes Scheduler Core components . Responsible for the scheduling and triggering of client tasks and the monitoring of task execution status .
  • Schedulerx-client It is the client of task scheduling . Each application process connected to the client is one Worker. Worker Responsible for working with schedulerx-server Establish communication , Give Way schedulerx-server Discover the client machine . And to schedulerx-server Register the group where the current application is located , such schedulerx-server To regularly trigger tasks to the client .

We imitated SchedulerX Module , The architecture design is as follows :

I chose RocketMQ Source code communication module remoting As the communication framework of self-developed dispatching system . Based on the following two points :

  1. I am well-known in the industry Dubbo Not familiar with , and remoting I have made several wheels , I believe I can handle it ;
  2. In the reading SchedulerX 1.0 client Source code , Find out SchedulerX Communication framework and RocketMQ Remoting Many places are very similar . Its source code has ready-made engineering implementation , It's just a treasure .

I will RocketMQ remoting Module name service code , Made a certain degree of customization .

stay RocketMQ Of remoting in , Server using Processor Pattern .

The dispatch center needs to register two processors : Callback result handler CallBackProcessor And the heartbeat processor HeartBeatProcessor . The actuator needs to register the trigger task processor TriggerTaskProcessor .

public void registerProcessor(
             int requestCode,
             NettyRequestProcessor processor,
             ExecutorService executor);

Processor interface :

public interface NettyRequestProcessor {
 RemotingCommand processRequest(
                 ChannelHandlerContext ctx,
                 RemotingCommand request) throws Exception;
 boolean rejectRequest();
}

For the communication framework , I don't need to pay attention to communication details , Just implement the processor interface .

To trigger the task processor TriggerTaskProcessor give an example :

After finishing the network communication , How the scheduler is designed ? In the end, I chose Quartz Cluster pattern . Mainly based on the following reasons :

  1. When the scheduling volume is small ,Quartz The cluster mode is stable enough , And can be compatible with the original XXL-JOB Mission ;
  2. Using the time wheel , I don't have enough practical experience , Worry about problems . in addition , How to make tasks pass through different scheduling services (schedule-server) Trigger , Need to have a coordinator . So I thought of Zookeeper. But in that case , New components have been introduced .
  3. The R & D cycle should not be too long , Want to get results quickly .

The self-developed dispatching service has been online for one and a half months . The system runs very stably , The access of the R & D team is also very smooth . The dispatch volume is not large , In four months, it was close to 4000 Wan to 5000 The dispatch volume between 10000 .

Frankly speaking , The bottleneck of self-developed version , I can often see in my mind . Large amount of data , I can handle the sub database and sub table , but Quartz The cluster is based on row level locking , It is doomed that the upper limit will not be too high .

In order to remove the confusion in my heart , I write a wheel DEMO See if you can work:

  1. Remove the external registry , The scheduler (schedule-server) Manage sessions ;
  2. introduce zookeeper, adopt zk Coordinate dispatching services . however HA The mechanism is very rough , It is equivalent to a task scheduling service running , Another service standby;
  3. Quartz Replace with a time wheel ( Reference resources Dubbo Time wheel source code in ).

This Demo Version can run in the development environment , But there are many details that need to be optimized , Just a toy , No chance to run into production environment .

I recently read an article from Alibaba cloud 《 How to realize million rule alarm through task scheduling 》,SchedulerX2.0 The high availability architecture is shown in the figure below :

The article mentions :

Each application will do three backups , adopt zk Grab the lock , One master and two standby , If a station Server Hang up , Will be carried out in failover, By other Server Take over the scheduling task .

In terms of architecture, the self-developed task scheduling system , Is not complicated , Realized XXL-JOB Core functions , Also compatible with the technical team RPC frame , But workflow and mapreduce Fragmentation .

SchedulerX The upgrade to the 2.0 Then based on the new Akka framework , This architecture It's called Implement high-performance workflow engine , Realize interprocess communication , Reduce network communication code .

In the open source task scheduling system I investigated ,PowerJob Is based on Akka framework , At the same time, workflow and MapReduce Execution mode .

I am right. PowerJob Very interested , I will also output relevant articles after learning and practice , Coming soon .

6 Technology selection

First, we will schedule tasks for open source products and commercial products SchedulerX Put together , Generate a cross reference table :

Quartz and ElasticJob In essence, it belongs to the framework level .

Centralized products are more clear in terms of architecture , The dispatching level is more flexible , It can support more complex scheduling (mapreduce Dynamic segmentation , workflow ).

XXL-JOB From the product level, we have achieved minimalism , Open the box , The scheduling mode can meet the needs of most R & D teams . Simple and easy to use + Can fight , So it is very popular .

In fact, each technical team has different technical reserves , The scene is also different , Therefore, technology selection cannot be generalized .

No matter which technology is used , When writing task business code , There are still two points to note :

  • idempotent . When a task is repeated , Or when the distributed lock fails , The program can still output the correct results ;
  • The task will not run away , Don't panic . Check the scheduling log ,JVM Level use Jstack Command to view the stack , The network communication needs to add a timeout , Generally, it can solve most problems .

7 Wrote last

2015 In fact, it was a very interesting year .ElasticJob and XXL-JOB These two different genres of task scheduling projects are open source .

stay XXL-JOB Source code , A dynamic screenshot of Mr. Xu Xueli in open source China is still preserved :

The task scheduling framework just written ,Web Dynamic management tasks , In real time , Hot . If there is no accident , It will be delivered to at noon tomorrow git.osc Up . ha-ha , Go downstairs and fry noodles with a poached egg to celebrate .

See this screenshot , There should be a kind of empathy in my heart , The corners of the mouth can't help rising .

I think of :2016 year ,ElasticJob Teacher Zhang Liang, the author of sharding-jdbc . I am here github Created a private project on , Reference resources sharding-jdbc Source code , Realize the function of sub database and sub table . The first class is called :ShardingDataSource, Time is fixed on 2016/3/29.

I don't know how to define “ Creative Software Engineers ”, But I believe : One who is curious , Study hard , Willing to share , Engineers willing to help others , I'm sure my luck won't be too bad .

原网站

版权声明
本文为[Yongge Java actual combat sharing]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/01/202201241939517976.html