当前位置:网站首页>Introduction to canal deployment, principle and use
Introduction to canal deployment, principle and use
2022-06-26 06:07:00 【Super code meow】
canal Introduction and canal Deploy 、 Principle and Application Introduction
canal introduction
What is? canal
Alibaba B2B company , Because of the nature of the business , The sellers are mainly concentrated in China , Buyers are mainly concentrated abroad , Therefore, the demand for remote computer rooms in Hangzhou and the United States is derived , from 2010 Year begins , Ali department company began to gradually try log parsing based on database , Get incremental changes to synchronize , This leads to incremental subscriptions & Consumer business .
canal Yes, it is java Developed incremental log parsing based on database , Provide incremental data subscription & Consumer Middleware . at present ,canal It mainly supports MySQL Of binlog analysis , Only after the parsing is completed can canal client Used to process the relevant data obtained .( Database synchronization requires Alibaba's otter middleware , be based on canal).
Here we can simply put canal Understood as a tool for synchronizing incremental data :
canal adopt binlog Get the changed data synchronously , And then send it to the storage destination , such as MySQL,Kafka,Elastic Search Isochronous multi-source synchronization .
canal Use scenarios
scene 1: The original scene , Ali otter Part of middleware 
scene 2: Update cache 
scene 3: Grab business data and add a change table , Used for making zipper watch .( Zipper table : Record the lifecycle of each piece of information , Once the life cycle of a record ends , It's about to start a new record , And put the current date into the effective start date )
scene 4: Grab the new change data of the business table , Used to make real-time statistics .
canal Operation principle

The replication process is divided into three steps :
Master The master library will change records , Write to binary log (binary log) in
Slave From library to mysql master send out dump agreement , take master The main library binary log events Copy to its trunk log (relay log);
Slave Read and redo the events in the relay log from the library , Synchronize the changed data to your own database .
canal It works very simply , Is to pretend to be slave, Pretend to be from master Copy the data .
MySQL Of binlog Introduce
What is? binlog
MySQL The binary log of can be said to be MySQL The most important log , It records everything DDL and DML( In addition to data query statements ) sentence , Record as an event , It also contains the time consumed by statement execution ,MySQL The binary log of is transaction safe .
Generally speaking, there will be 1% Loss of performance . Binary has two most important usage scenarios :
firstly :MySQL Replication stay Master End open binlog,Mster Pass its binary log to slaves In order to achieve master-slave Data consistent purpose .
second : By using mysqlbinlog Tools to recover data .
Binary logs include two types of files : Binary log index file ( The file name suffix is .index) Used to record all binary files , Binary log file ( The file name suffix is .00000*) Record everything in the database DDL and DML( In addition to data query statements ) Statement event .
Turn on MySQL Of binlog
stay mysql Start and restart in the configuration file of MySQL take effect , commonly Linux Under the system MySQL The configuration file paths are basically /etc/my.cnf ;log-bin=mysql-bin
The said binlog The prefix of the log is mysql-bin , The log files generated in the future are mysql-bin.123456 The numbers after the file are generated in order . Every time mysql Restart or reach the threshold of single file size , A new file , Number... In sequence .
binlog Classification settings
MySQL Of binlog There are three formats for , Namely STATEMENT、MIXED、ROW. In the configuration file, you can configure options to specify :binlog_format=
statement [ Sentence level ]
Sentence level ,binlog The statement that performs a write operation each time is recorded .
relative row Mode saves space , But there can be inconsistencies , for example :update table_name set create_date=now();
If you use binlog Log recovery , Due to the different execution time, the data may be different ( master When dropping data create_date by 2021-08-08 11:10:30 , but binlog When executing a statement from a library create_date The time may become 2021-08-08 11:11:23 , The main reason is that the statement execution time is asynchronous )
advantage : Save a space
shortcoming : It may cause data inconsistency
row [ Row level ]
Row level , binlog It records the changes of each line after each operation .
advantage : Keep the data absolutely consistent . Because I don 't care sql What is it? , What functions are referenced , He only records the effect of execution .
shortcoming : Take up a lot of space .
mixed [ Combine statement level and row level ]
statement Upgraded version , To a certain extent, it solves the problem of statement Pattern inconsistencies
In some cases, for example :
○ When the function contains UUID() when ;
○ contain AUTO_INCREMENT When the table of fields is updated ;
○ perform INSERT DELAYED When the sentence is ;
○ use UDF when ;
According to ROW The way to deal with
advantage : Save a space , At the same time, a certain degree of consistency is taken into account .
shortcoming : There are also very few cases that still cause inconsistencies , in addition statement and mixed For the need to be, for binlog It's not convenient for us to monitor the situation .
Environmental preparation
Machine planning
I used 4 Taiwan machine :
Machine planning :ops01、ops02、ops03 For installation kafka + zookeeper + canal colony ;ops04 Used to deploy MySQL service , When testing, you can MySQL Deployed in 3 One of the clusters
11.8.37.50 ops01
11.8.36.63 ops02
11.8.36.76 ops03
11.8.36.86 ops04
4 All the machines are /etc/hosts Configure hostname in hosts analysis
Installation configuration MySQL
Create new databases and tables for business simulation , The installation steps will not be introduced here , If not installed MySQL, You can refer to the previous articles by yourself 《MySQL 5.7 Installation tutorial (win10)》 There is MySQL Detailed installation steps ;
installed MySQL after , Make basic settings and configurations
# Sign in mysql
root@ops04:/root #mysql -uroot -p123456
mysql: [Warning] Using a password on the command line interface can be insecure. Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 442523
Server version: 5.7.29 MySQL Community Server (GPL)
Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
# increase canal Users and configure permissions
mysql> GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'canal'@'%' IDENTIFIED BY 'canal';
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> quit;
Bye
# modify MySQL The configuration file , Add binlog Related configuration items
root@ops04:/root #vim /etc/my.cnf
# binlog
server-id=1
log-bin=mysql-bin
binlog_format=row
binlog-do-db=gmall
Create a new one gmall library , In fact, all libraries can , As long as it corresponds to the above configuration file
restart MySQL:
root@ops04:/root #mysql -V
mysql Ver 14.14 Distrib 5.7.29, for Linux (x86_64) using EditLine wrapper
root@ops04:/root #systemctl status mysqld
● mysqld.service - MySQL Server
Loaded: loaded (/usr/lib/systemd/system/mysqld.service; enabled; vendor preset: disabled)
Active: active (running) since Wed 2021-05-26 09:30:25 CST; 2 months 22 days ago
Docs: man:mysqld(8)
http://dev.mysql.com/doc/refman/en/using-systemd.html
Main PID: 32911 (mysqld)
Memory: 530.6M
CGroup: /system.slice/mysqld.service
└─32911 /usr/sbin/mysqld --daemonize --pid-file=/var/run/mysqld/mysqld.pid
May 26 09:30:18 ops04 systemd[1]: Starting MySQL Server...
May 26 09:30:25 ops04 systemd[1]: Started MySQL Server.
root@ops04:/root #
root@ops04:/root #systemctl restart mysqld
root@ops04:/root #
【 Be careful 】: In addition binlog After the configuration , restart MySQL After service , In the storage directory, there will be related binlog file , The format is as follows
root@ops04:/var/lib/mysql #ll | grep mysql-bin
-rw-r----- 1 mysql mysql 1741 Aug 17 14:27 mysql-bin.000001
-rw-r----- 1 mysql mysql 19 Aug 17 11:18 mysql-bin.index
verification canal The user login :
root@ops04:/root #mysql -ucanal -pcanal -e "show databases"
mysql: [Warning] Using a password on the command line interface can be insecure.
+--------------------+
| Database |
+--------------------+
| information_schema |
| gmall |
| mysql |
| performance_schema |
| sys |
+--------------------+
root@ops04:/root #
stay gmall Create a new table in the library , And insert some sample data for testing :
CREATE TABLE `canal_test` (
` Temperature ` varchar(255) DEFAULT NULL,
` height ` varchar(255) DEFAULT NULL,
` weight ` varchar(255) DEFAULT NULL,
` article ` varchar(255) DEFAULT NULL,
` date ` date DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
INSERT INTO `canal_test`(` Temperature `, ` height `, ` weight `, ` article `, ` date `) VALUES ('36.5', '1.70', '180', '4', '2021-06-01');
INSERT INTO `canal_test`(` Temperature `, ` height `, ` weight `, ` article `, ` date `) VALUES ('36.4', '1.70', '160', '8', '2021-06-02');
INSERT INTO `canal_test`(` Temperature `, ` height `, ` weight `, ` article `, ` date `) VALUES ('36.1', '1.90', '134', '1', '2021-06-03');
INSERT INTO `canal_test`(` Temperature `, ` height `, ` weight `, ` article `, ` date `) VALUES ('37.3', '1.70', '110', '14', '2021-06-04');
INSERT INTO `canal_test`(` Temperature `, ` height `, ` weight `, ` article `, ` date `) VALUES ('35.7', '1.70', '133', '0', '2021-06-05');
INSERT INTO `canal_test`(` Temperature `, ` height `, ` weight `, ` article `, ` date `) VALUES ('36.8', '1.90', '200', '6', '2021-06-06');
INSERT INTO `canal_test`(` Temperature `, ` height `, ` weight `, ` article `, ` date `) VALUES ('37.5', '1.70', '132', '25', '2021-06-07');
INSERT INTO `canal_test`(` Temperature `, ` height `, ` weight `, ` article `, ` date `) VALUES ('35.7', '1.70', '160', '2', '2021-06-08');
INSERT INTO `canal_test`(` Temperature `, ` height `, ` weight `, ` article `, ` date `) VALUES ('36.3', '1.80', '131.4', '9', '2021-06-09');
INSERT INTO `canal_test`(` Temperature `, ` height `, ` weight `, ` article `, ` date `) VALUES ('37.3', '1.70', '98.8', '4', '2021-06-10');
install kafka + zookeeper
In order to achieve canal High availability , The specific installation steps are not introduced here , Reduce space , You can refer to the previous articles by yourself 《Kafka Deploy 、 Principle and Application Introduction 》 There is kafka Detailed installation steps for ;
Inquire about kafka and zookeeper Operation status of each port cluster :
jyc@ops03:/opt/module >ssh ops01 'sudo netstat -tnlpu| grep -E "9092|2181"'
tcp6 0 0 :::9092 :::* LISTEN 42305/java
tcp6 0 0 :::2181 :::* LISTEN 41773/java
jyc@ops03:/opt/module >ssh ops02 'sudo netstat -tnlpu| grep -E "9092|2181"'
tcp6 0 0 :::9092 :::* LISTEN 33518/java
tcp6 0 0 :::2181 :::* LISTEN 33012/java
jyc@ops03:/opt/module >ssh ops03 'sudo netstat -tnlpu| grep -E "9092|2181"'
tcp6 0 0 :::9092 :::* LISTEN 102886/java
tcp6 0 0 :::2181 :::* LISTEN 102422/java
Installation and deployment canal
Ali's canal The project address is :https://github.com/alibaba/canal, The download link can be found at github Click on the right side of the page release View each version to download , It is suggested that if you have the energy, you can check more popular items on the homepage of Alibaba , Many projects are becoming more and more popular .
Download installation package
# Download installation package
jyc@ops03:/opt/software >wget https://github.com/alibaba/canal/releases/download/canal-1.1.5/canal.deployer-1.1.5.tar.gz
jyc@ops03:/opt/software >ll | grep canal
-rw-r--r-- 1 jyc jyc 60205298 Aug 17 11:23 canal.deployer-1.1.5.tar.gz
Unpack the installation
# newly build canal Unpack the directory 【 Be careful 】: There are no top-level projects extracted from the official project canal Catalog , So create a new directory to decompress components
jyc@ops03:/opt/software >mkdir -p /opt/module/canal
jyc@ops03:/opt/software >tar -xf canal.deployer-1.1.5.tar.gz -C /opt/module/canal/
modify canal The master configuration
# modify canal Master profile
jyc@ops03:/opt/module/canal >cd conf/
jyc@ops03:/opt/module/canal/conf >ll
total 28
-rwxrwxr-x 1 jyc jyc 319 Apr 19 15:48 canal_local.properties
-rwxrwxr-x 1 jyc jyc 6277 Apr 19 15:48 canal.properties
drwxrwxr-x 2 jyc jyc 4096 Aug 17 13:49 example
-rwxrwxr-x 1 jyc jyc 3437 Apr 19 15:48 logback.xml
drwxrwxr-x 2 jyc jyc 4096 Aug 17 13:49 metrics
drwxrwxr-x 3 jyc jyc 4096 Aug 17 13:49 spring
# Change the following configuration : zk | Synchronization policy target mode | kafka
jyc@ops03:/opt/module/canal/conf >vim canal.properties
canal.zkServers =ops01:2181,ops02:2181,ops03:2181
canal.serverMode = kafka
kafka.bootstrap.servers = ops01:9092,ops02:9092,ops03:9092
modify canal Instance configuration for - (mysql to kafka)
# Configure instance related configurations :canal You can start multiple instances , An instance corresponds to a directory configuration , For example, put example Copy the directory to xxx, hold xxx The configuration change under the directory starts , Is a new example
jyc@ops03:/opt/module/canal/conf >cd example/
jyc@ops03:/opt/module/canal/conf/example >ll
total 4
-rwxrwxr-x 1 jyc jyc 2106 Apr 19 15:48 instance.properties
# Be careful 11.8.38.86:3306 Need to change to your own environment mysql Address and port , Secondly, the user name and password are changed to those of your own environment ,topic Customize a
jyc@ops03:/opt/module/canal/conf/example >vim instance.properties
canal.instance.master.address=11.8.38.86:3306
canal.instance.dbUsername=canal
canal.instance.dbPassword=canal
canal.mq.topic=jyc_test_canal
canal.mq.partitionsNum=12
Distribute the installation directory
# Will modify the canal The catalog is distributed to another 2 Servers :
jyc@ops03:/opt/module >scp -r /opt/module/canal ops01:/opt/module/
jyc@ops03:/opt/module >scp -r /opt/module/canal ops02:/opt/module/
start-up canal colony
# Each server starts the cluster in turn canal
jyc@ops03:/opt/module >cd /opt/module/canal/bin/
jyc@ops03:/opt/module/canal/bin >./startup.sh
jyc@ops02:/home/jyc >cd /opt/module/canal/bin/
jyc@ops02:/opt/module/canal/bin >./startup.sh
jyc@ops01:/home/jyc >cd /opt/module/canal/bin/
jyc@ops01:/opt/module/canal/bin >./startup.sh
The verification results
# Monitor... On a server kafka
jyc@ops03:/opt/module/canal/bin >kafka-console-consumer.sh --bootstrap-server ops01:9092,ops02:9092,ops03:9092 --topic jyc_test_canal
[2021-08-17 14:21:29,924] WARN [Consumer clientId=consumer-console-consumer-17754-1, groupId=console-consumer-17754] Error while fetching metadata with correlation id 2 : {
jyc_test_canal=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
As expected , If the monitoring is successful now ops04 On MySQL in gmall library , So in gmall If there is any data change in the table in the library , Then the console will output information and update it to the foreground in real time
Data in the current table :
Change the data in the table to observe the console output :
1. take 2021-06-10 -> 2021-08-17
2. Add a piece of data
3. Change a value 1 -> 1111
jyc@ops03:/opt/module/canal/bin >kafka-console-consumer.sh --bootstrap-server ops01:9092,ops02:9092,ops03:9092 --topic jyc_test_canal
[2021-08-17 14:21:29,924] WARN [Consumer clientId=consumer-console-consumer-17754-1, groupId=console-consumer-17754] Error while fetching metadata with correlation id 2 : {
jyc_test_canal=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
{
"data":[{
" Temperature ":"37.3"," height ":"1.70"," weight ":"98.8"," article ":"4"," date ":"2021-08-17"}],"database":"gmall","es":1629185045000,"id":6,"isDdl":false,"mysqlType":{
" Temperature ":"varchar(255)"," height ":"varchar(255)"," weight ":"varchar(255)"," article ":"varchar(255)"," date ":"date"},"old":[{
" date ":"2021-06-10"}],"pkNames":null,"sql":"","sqlType":{
" Temperature ":12," height ":12," weight ":12," article ":12," date ":91},"table":"canal_test","ts":1629185063194,"type":"UPDATE"}
{
"data":[{
" Temperature ":"35.55"," height ":"1.999"," weight ":"99.99"," article ":"999"," date ":"2021-08-17"}],"database":"gmall","es":1629185086000,"id":7,"isDdl":false,"mysqlType":{
" Temperature ":"varchar(255)"," height ":"varchar(255)"," weight ":"varchar(255)"," article ":"varchar(255)"," date ":"date"},"old":null,"pkNames":null,"sql":"","sqlType":{
" Temperature ":12," height ":12," weight ":12," article ":12," date ":91},"table":"canal_test","ts":1629185104967,"type":"INSERT"}
{
"data":[{
" Temperature ":"36.1"," height ":"1.90"," weight ":"134"," article ":"1111"," date ":"2021-06-03"}],"database":"gmall","es":1629185104000,"id":8,"isDdl":false,"mysqlType":{
" Temperature ":"varchar(255)"," height ":"varchar(255)"," weight ":"varchar(255)"," article ":"varchar(255)"," date ":"date"},"old":[{
" article ":"1"}],"pkNames":null,"sql":"","sqlType":{
" Temperature ":12," height ":12," weight ":12," article ":12," date ":91},"table":"canal_test","ts":1629185122499,"type":"UPDATE"}
It is obvious that each change can be shown in the record ,old Data and current data can be mapped one by one , As of now canal The whole process chain is all inclusive ,canal The methods of synchronizing to different storage media are basically the same .
Expand :
Can be in zookeeper View from the command line canal Information :
jyc@ops01:/opt/module/canal/bin >zkCli.sh
Connecting to localhost:2181
[zk: localhost:2181(CONNECTED) 0] ls -w /
[hbase, kafka, otter, jyc, zookeeper]
[zk: localhost:2181(CONNECTED) 1] ls -w /otter
[canal]
[zk: localhost:2181(CONNECTED) 2] ls -w /otter/canal
[cluster, destinations]
边栏推荐
猜你喜欢

原型模式,咩咩乱叫

MySQL-09

421- binary tree (226. reversed binary tree, 101. symmetric binary tree, 104. maximum depth of binary tree, 222. number of nodes of complete binary tree)

跨域的五种解决方案

Prototype mode, Baa Baa

小程序如何关联微信小程序二维码,实现二码聚合

组合模式、透明方式和安全方式

Keepalived to achieve high service availability

Younger sister Juan takes you to learn JDBC -- two days' Sprint Day2
![[C language] deep analysis of data storage in memory](/img/2e/ff0b5326d796b9436f4a10c10cfe22.png)
[C language] deep analysis of data storage in memory
随机推荐
Yamaha robot splits visual strings
Basic construction of SSM framework
Spark source code analysis (I): RDD collection data - partition data allocation
Tencent WXG internship experience (has offered), I hope it will help you!
302. 包含全部黑色像素的最小矩形 BFS
Day3 - variables and operators
Cython入门
Day4 branch and loop
421-二叉树(226. 翻转二叉树、101. 对称二叉树、104.二叉树的最大深度、222.完全二叉树的节点个数)
Status mode, body can change at will
Force buckle 875 Coco, who likes bananas
MySQL-06
去哪儿网BI平台建设演进史
Redis multithreading and ACL
ByteDance starts the employee's sudden wealth plan and buys back options with a large amount of money. Some people can earn up to 175%
04. basic data type - list, tuple
Combined mode, transparent mode and secure mode
numpy.frombuffer()
Keepalived to achieve high service availability
On site commissioning - final method of kb4474419 for win7 x64 installation and vs2017 flash back