当前位置：网站首页>Introduction to canal deployment, principle and use

Introduction to canal deployment, principle and use

2022-06-26 06:07:00 【Super code meow】

canal Introduction and canal Deploy 、 Principle and Application Introduction

canal introduction

What is? canal

Alibaba B2B company , Because of the nature of the business , The sellers are mainly concentrated in China , Buyers are mainly concentrated abroad , Therefore, the demand for remote computer rooms in Hangzhou and the United States is derived , from 2010 Year begins , Ali department company began to gradually try log parsing based on database , Get incremental changes to synchronize , This leads to incremental subscriptions & Consumer business .

canal Yes, it is java Developed incremental log parsing based on database , Provide incremental data subscription & Consumer Middleware . at present ,canal It mainly supports MySQL Of binlog analysis , Only after the parsing is completed can canal client Used to process the relevant data obtained .（ Database synchronization requires Alibaba's otter middleware , be based on canal）.

Here we can simply put canal Understood as a tool for synchronizing incremental data ：
Insert picture description here
canal adopt binlog Get the changed data synchronously , And then send it to the storage destination , such as MySQL,Kafka,Elastic Search Isochronous multi-source synchronization .

canal Use scenarios

scene 1： The original scene , Ali otter Part of middleware
Insert picture description here
scene 2： Update cache

scene 3： Grab business data and add a change table , Used for making zipper watch .( Zipper table ： Record the lifecycle of each piece of information , Once the life cycle of a record ends , It's about to start a new record , And put the current date into the effective start date )

scene 4： Grab the new change data of the business table , Used to make real-time statistics .

canal Operation principle

Insert picture description here
The replication process is divided into three steps ：

Master The master library will change records , Write to binary log (binary log) in
Slave From library to mysql master send out dump agreement , take master The main library binary log events Copy to its trunk log (relay log);
Slave Read and redo the events in the relay log from the library , Synchronize the changed data to your own database .

canal It works very simply , Is to pretend to be slave, Pretend to be from master Copy the data .
Insert picture description here

MySQL Of binlog Introduce

What is? binlog

MySQL The binary log of can be said to be MySQL The most important log , It records everything DDL and DML( In addition to data query statements ) sentence , Record as an event , It also contains the time consumed by statement execution ,MySQL The binary log of is transaction safe .

Generally speaking, there will be 1% Loss of performance . Binary has two most important usage scenarios :

firstly ：MySQL Replication stay Master End open binlog,Mster Pass its binary log to slaves In order to achieve master-slave Data consistent purpose .

second ： By using mysqlbinlog Tools to recover data .

Binary logs include two types of files ： Binary log index file （ The file name suffix is .index） Used to record all binary files , Binary log file （ The file name suffix is .00000*） Record everything in the database DDL and DML( In addition to data query statements ) Statement event .

Turn on MySQL Of binlog

stay mysql Start and restart in the configuration file of MySQL take effect , commonly Linux Under the system MySQL The configuration file paths are basically /etc/my.cnf ;log-bin=mysql-bin

The said binlog The prefix of the log is mysql-bin , The log files generated in the future are mysql-bin.123456 The numbers after the file are generated in order . Every time mysql Restart or reach the threshold of single file size , A new file , Number... In sequence .

binlog Classification settings

MySQL Of binlog There are three formats for , Namely STATEMENT、MIXED、ROW. In the configuration file, you can configure options to specify ：binlog_format=

statement [ Sentence level ]

Sentence level ,binlog The statement that performs a write operation each time is recorded .

relative row Mode saves space , But there can be inconsistencies , for example ：update table_name set create_date=now();

If you use binlog Log recovery , Due to the different execution time, the data may be different ( master When dropping data create_date by 2021-08-08 11:10:30 , but binlog When executing a statement from a library create_date The time may become 2021-08-08 11:11:23 , The main reason is that the statement execution time is asynchronous )

advantage ： Save a space

shortcoming ： It may cause data inconsistency

row [ Row level ]

Row level , binlog It records the changes of each line after each operation .

advantage ： Keep the data absolutely consistent . Because I don 't care sql What is it? , What functions are referenced , He only records the effect of execution .

shortcoming ： Take up a lot of space .

mixed [ Combine statement level and row level ]

statement Upgraded version , To a certain extent, it solves the problem of statement Pattern inconsistencies

In some cases, for example ：

○ When the function contains UUID() when ;

○ contain AUTO_INCREMENT When the table of fields is updated ;

○ perform INSERT DELAYED When the sentence is ;

○ use UDF when ;

According to ROW The way to deal with

advantage ： Save a space , At the same time, a certain degree of consistency is taken into account .

shortcoming ： There are also very few cases that still cause inconsistencies , in addition statement and mixed For the need to be, for binlog It's not convenient for us to monitor the situation .

Environmental preparation

Machine planning

I used 4 Taiwan machine ：

Machine planning ：ops01、ops02、ops03 For installation kafka + zookeeper + canal colony ;ops04 Used to deploy MySQL service , When testing, you can MySQL Deployed in 3 One of the clusters

11.8.37.50 ops01

11.8.36.63 ops02

11.8.36.76 ops03

11.8.36.86 ops04

4 All the machines are /etc/hosts Configure hostname in hosts analysis

Installation configuration MySQL

Create new databases and tables for business simulation , The installation steps will not be introduced here , If not installed MySQL, You can refer to the previous articles by yourself 《MySQL 5.7 Installation tutorial (win10)》 There is MySQL Detailed installation steps ;

installed MySQL after , Make basic settings and configurations

#  Sign in mysql
root@ops04:/root #mysql -uroot -p123456
mysql: [Warning] Using a password on the command line interface can be insecure. Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 442523
Server version: 5.7.29 MySQL Community Server (GPL)

Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
#  increase canal Users and configure permissions 
mysql> GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'canal'@'%' IDENTIFIED BY 'canal';
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> quit;
Bye
#  modify MySQL The configuration file , Add binlog Related configuration items 
root@ops04:/root #vim /etc/my.cnf
# binlog
server-id=1
log-bin=mysql-bin
binlog_format=row
binlog-do-db=gmall

Create a new one gmall library , In fact, all libraries can , As long as it corresponds to the above configuration file

restart MySQL：

root@ops04:/root #mysql -V
mysql  Ver 14.14 Distrib 5.7.29, for Linux (x86_64) using  EditLine wrapper
root@ops04:/root #systemctl status mysqld
● mysqld.service - MySQL Server
   Loaded: loaded (/usr/lib/systemd/system/mysqld.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2021-05-26 09:30:25 CST; 2 months 22 days ago
     Docs: man:mysqld(8)
           http://dev.mysql.com/doc/refman/en/using-systemd.html
 Main PID: 32911 (mysqld)
   Memory: 530.6M
   CGroup: /system.slice/mysqld.service
           └─32911 /usr/sbin/mysqld --daemonize --pid-file=/var/run/mysqld/mysqld.pid

May 26 09:30:18 ops04 systemd[1]: Starting MySQL Server...
May 26 09:30:25 ops04 systemd[1]: Started MySQL Server.
root@ops04:/root #
root@ops04:/root #systemctl restart mysqld
root@ops04:/root #

【 Be careful 】： In addition binlog After the configuration , restart MySQL After service , In the storage directory, there will be related binlog file , The format is as follows

root@ops04:/var/lib/mysql #ll | grep mysql-bin
-rw-r----- 1 mysql mysql     1741 Aug 17 14:27 mysql-bin.000001
-rw-r----- 1 mysql mysql       19 Aug 17 11:18 mysql-bin.index

verification canal The user login ：

root@ops04:/root #mysql -ucanal -pcanal -e "show databases"
mysql: [Warning] Using a password on the command line interface can be insecure.
+--------------------+
| Database           |
+--------------------+
| information_schema |
| gmall              |
| mysql              |
| performance_schema |
| sys                |
+--------------------+
root@ops04:/root #

stay gmall Create a new table in the library , And insert some sample data for testing ：

CREATE TABLE `canal_test` (
  ` Temperature ` varchar(255) DEFAULT NULL,
  ` height ` varchar(255) DEFAULT NULL,
  ` weight ` varchar(255) DEFAULT NULL,
  ` article ` varchar(255) DEFAULT NULL,
  ` date ` date DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

INSERT INTO `canal_test`(` Temperature `, ` height `, ` weight `, ` article `, ` date `) VALUES ('36.5', '1.70', '180', '4', '2021-06-01');
INSERT INTO `canal_test`(` Temperature `, ` height `, ` weight `, ` article `, ` date `) VALUES ('36.4', '1.70', '160', '8', '2021-06-02');
INSERT INTO `canal_test`(` Temperature `, ` height `, ` weight `, ` article `, ` date `) VALUES ('36.1', '1.90', '134', '1', '2021-06-03');
INSERT INTO `canal_test`(` Temperature `, ` height `, ` weight `, ` article `, ` date `) VALUES ('37.3', '1.70', '110', '14', '2021-06-04');
INSERT INTO `canal_test`(` Temperature `, ` height `, ` weight `, ` article `, ` date `) VALUES ('35.7', '1.70', '133', '0', '2021-06-05');
INSERT INTO `canal_test`(` Temperature `, ` height `, ` weight `, ` article `, ` date `) VALUES ('36.8', '1.90', '200', '6', '2021-06-06');
INSERT INTO `canal_test`(` Temperature `, ` height `, ` weight `, ` article `, ` date `) VALUES ('37.5', '1.70', '132', '25', '2021-06-07');
INSERT INTO `canal_test`(` Temperature `, ` height `, ` weight `, ` article `, ` date `) VALUES ('35.7', '1.70', '160', '2', '2021-06-08');
INSERT INTO `canal_test`(` Temperature `, ` height `, ` weight `, ` article `, ` date `) VALUES ('36.3', '1.80', '131.4', '9', '2021-06-09');
INSERT INTO `canal_test`(` Temperature `, ` height `, ` weight `, ` article `, ` date `) VALUES ('37.3', '1.70', '98.8', '4', '2021-06-10');

install kafka + zookeeper

In order to achieve canal High availability , The specific installation steps are not introduced here , Reduce space , You can refer to the previous articles by yourself 《Kafka Deploy 、 Principle and Application Introduction 》 There is kafka Detailed installation steps for ;

Inquire about kafka and zookeeper Operation status of each port cluster ：

jyc@ops03:/opt/module >ssh ops01 'sudo netstat -tnlpu| grep -E "9092|2181"'
tcp6       0      0 :::9092                 :::*                    LISTEN      42305/java          
tcp6       0      0 :::2181                 :::*                    LISTEN      41773/java          
jyc@ops03:/opt/module >ssh ops02 'sudo netstat -tnlpu| grep -E "9092|2181"'
tcp6       0      0 :::9092                 :::*                    LISTEN      33518/java          
tcp6       0      0 :::2181                 :::*                    LISTEN      33012/java          
jyc@ops03:/opt/module >ssh ops03 'sudo netstat -tnlpu| grep -E "9092|2181"'
tcp6       0      0 :::9092                 :::*                    LISTEN      102886/java         
tcp6       0      0 :::2181                 :::*                    LISTEN      102422/java

Installation and deployment canal

Ali's canal The project address is ：https://github.com/alibaba/canal, The download link can be found at github Click on the right side of the page release View each version to download , It is suggested that if you have the energy, you can check more popular items on the homepage of Alibaba , Many projects are becoming more and more popular .

Download installation package

#  Download installation package 
jyc@ops03:/opt/software >wget https://github.com/alibaba/canal/releases/download/canal-1.1.5/canal.deployer-1.1.5.tar.gz
jyc@ops03:/opt/software >ll | grep canal
-rw-r--r-- 1 jyc jyc  60205298 Aug 17 11:23 canal.deployer-1.1.5.tar.gz

Unpack the installation

#  newly build canal Unpack the directory 【 Be careful 】:  There are no top-level projects extracted from the official project canal Catalog , So create a new directory to decompress components 
jyc@ops03:/opt/software >mkdir -p /opt/module/canal
jyc@ops03:/opt/software >tar -xf canal.deployer-1.1.5.tar.gz -C /opt/module/canal/

modify canal The master configuration

#  modify canal Master profile 
jyc@ops03:/opt/module/canal >cd conf/
jyc@ops03:/opt/module/canal/conf >ll
total 28
-rwxrwxr-x 1 jyc jyc  319 Apr 19 15:48 canal_local.properties
-rwxrwxr-x 1 jyc jyc 6277 Apr 19 15:48 canal.properties
drwxrwxr-x 2 jyc jyc 4096 Aug 17 13:49 example
-rwxrwxr-x 1 jyc jyc 3437 Apr 19 15:48 logback.xml
drwxrwxr-x 2 jyc jyc 4096 Aug 17 13:49 metrics
drwxrwxr-x 3 jyc jyc 4096 Aug 17 13:49 spring
#  Change the following configuration ： zk |  Synchronization policy target mode  | kafka
jyc@ops03:/opt/module/canal/conf >vim canal.properties 
canal.zkServers =ops01:2181,ops02:2181,ops03:2181
canal.serverMode = kafka
kafka.bootstrap.servers = ops01:9092,ops02:9092,ops03:9092

modify canal Instance configuration for - (mysql to kafka)

#  Configure instance related configurations ：canal You can start multiple instances , An instance corresponds to a directory configuration , For example, put example Copy the directory to xxx, hold xxx The configuration change under the directory starts , Is a new example 
jyc@ops03:/opt/module/canal/conf >cd example/
jyc@ops03:/opt/module/canal/conf/example >ll
total 4
-rwxrwxr-x 1 jyc jyc 2106 Apr 19 15:48 instance.properties
#  Be careful 11.8.38.86:3306 Need to change to your own environment mysql Address and port , Secondly, the user name and password are changed to those of your own environment ,topic Customize a 
jyc@ops03:/opt/module/canal/conf/example >vim instance.properties 
canal.instance.master.address=11.8.38.86:3306
canal.instance.dbUsername=canal
canal.instance.dbPassword=canal
canal.mq.topic=jyc_test_canal
canal.mq.partitionsNum=12

Distribute the installation directory

#  Will modify the canal The catalog is distributed to another 2 Servers ：
jyc@ops03:/opt/module >scp -r /opt/module/canal ops01:/opt/module/
jyc@ops03:/opt/module >scp -r /opt/module/canal ops02:/opt/module/

start-up canal colony

#  Each server starts the cluster in turn canal
jyc@ops03:/opt/module >cd /opt/module/canal/bin/
jyc@ops03:/opt/module/canal/bin >./startup.sh 

jyc@ops02:/home/jyc >cd /opt/module/canal/bin/
jyc@ops02:/opt/module/canal/bin >./startup.sh 

jyc@ops01:/home/jyc >cd /opt/module/canal/bin/
jyc@ops01:/opt/module/canal/bin >./startup.sh

The verification results

#  Monitor... On a server kafka
jyc@ops03:/opt/module/canal/bin >kafka-console-consumer.sh --bootstrap-server ops01:9092,ops02:9092,ops03:9092 --topic jyc_test_canal
[2021-08-17 14:21:29,924] WARN [Consumer clientId=consumer-console-consumer-17754-1, groupId=console-consumer-17754] Error while fetching metadata with correlation id 2 : {
    jyc_test_canal=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)

As expected , If the monitoring is successful now ops04 On MySQL in gmall library , So in gmall If there is any data change in the table in the library , Then the console will output information and update it to the foreground in real time

Data in the current table ：
Insert picture description here
Change the data in the table to observe the console output ：

1. take 2021-06-10 -> 2021-08-17

2. Add a piece of data

3. Change a value 1 -> 1111

jyc@ops03:/opt/module/canal/bin >kafka-console-consumer.sh --bootstrap-server ops01:9092,ops02:9092,ops03:9092 --topic jyc_test_canal
[2021-08-17 14:21:29,924] WARN [Consumer clientId=consumer-console-consumer-17754-1, groupId=console-consumer-17754] Error while fetching metadata with correlation id 2 : {
    jyc_test_canal=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)

{
    "data":[{
    " Temperature ":"37.3"," height ":"1.70"," weight ":"98.8"," article ":"4"," date ":"2021-08-17"}],"database":"gmall","es":1629185045000,"id":6,"isDdl":false,"mysqlType":{
    " Temperature ":"varchar(255)"," height ":"varchar(255)"," weight ":"varchar(255)"," article ":"varchar(255)"," date ":"date"},"old":[{
    " date ":"2021-06-10"}],"pkNames":null,"sql":"","sqlType":{
    " Temperature ":12," height ":12," weight ":12," article ":12," date ":91},"table":"canal_test","ts":1629185063194,"type":"UPDATE"}

{
    "data":[{
    " Temperature ":"35.55"," height ":"1.999"," weight ":"99.99"," article ":"999"," date ":"2021-08-17"}],"database":"gmall","es":1629185086000,"id":7,"isDdl":false,"mysqlType":{
    " Temperature ":"varchar(255)"," height ":"varchar(255)"," weight ":"varchar(255)"," article ":"varchar(255)"," date ":"date"},"old":null,"pkNames":null,"sql":"","sqlType":{
    " Temperature ":12," height ":12," weight ":12," article ":12," date ":91},"table":"canal_test","ts":1629185104967,"type":"INSERT"}

{
    "data":[{
    " Temperature ":"36.1"," height ":"1.90"," weight ":"134"," article ":"1111"," date ":"2021-06-03"}],"database":"gmall","es":1629185104000,"id":8,"isDdl":false,"mysqlType":{
    " Temperature ":"varchar(255)"," height ":"varchar(255)"," weight ":"varchar(255)"," article ":"varchar(255)"," date ":"date"},"old":[{
    " article ":"1"}],"pkNames":null,"sql":"","sqlType":{
    " Temperature ":12," height ":12," weight ":12," article ":12," date ":91},"table":"canal_test","ts":1629185122499,"type":"UPDATE"}

It is obvious that each change can be shown in the record ,old Data and current data can be mapped one by one , As of now canal The whole process chain is all inclusive ,canal The methods of synchronizing to different storage media are basically the same .

Expand ：

Can be in zookeeper View from the command line canal Information ：

jyc@ops01:/opt/module/canal/bin >zkCli.sh
Connecting to localhost:2181
[zk: localhost:2181(CONNECTED) 0] ls -w /
[hbase, kafka, otter, jyc, zookeeper]
[zk: localhost:2181(CONNECTED) 1] ls -w /otter
[canal]
[zk: localhost:2181(CONNECTED) 2] ls -w /otter/canal
[cluster, destinations]