当前位置:网站首页>Introduction to canal deployment, principle and use
Introduction to canal deployment, principle and use
2022-06-26 06:07:00 【Super code meow】
canal Introduction and canal Deploy 、 Principle and Application Introduction
canal introduction
What is? canal
Alibaba B2B company , Because of the nature of the business , The sellers are mainly concentrated in China , Buyers are mainly concentrated abroad , Therefore, the demand for remote computer rooms in Hangzhou and the United States is derived , from 2010 Year begins , Ali department company began to gradually try log parsing based on database , Get incremental changes to synchronize , This leads to incremental subscriptions & Consumer business .
canal Yes, it is java Developed incremental log parsing based on database , Provide incremental data subscription & Consumer Middleware . at present ,canal It mainly supports MySQL Of binlog analysis , Only after the parsing is completed can canal client Used to process the relevant data obtained .( Database synchronization requires Alibaba's otter middleware , be based on canal).
Here we can simply put canal Understood as a tool for synchronizing incremental data :
canal adopt binlog Get the changed data synchronously , And then send it to the storage destination , such as MySQL,Kafka,Elastic Search Isochronous multi-source synchronization .
canal Use scenarios
scene 1: The original scene , Ali otter Part of middleware
scene 2: Update cache
scene 3: Grab business data and add a change table , Used for making zipper watch .( Zipper table : Record the lifecycle of each piece of information , Once the life cycle of a record ends , It's about to start a new record , And put the current date into the effective start date )
scene 4: Grab the new change data of the business table , Used to make real-time statistics .
canal Operation principle
The replication process is divided into three steps :
Master The master library will change records , Write to binary log (binary log) in
Slave From library to mysql master send out dump agreement , take master The main library binary log events Copy to its trunk log (relay log);
Slave Read and redo the events in the relay log from the library , Synchronize the changed data to your own database .
canal It works very simply , Is to pretend to be slave, Pretend to be from master Copy the data .
MySQL Of binlog Introduce
What is? binlog
MySQL The binary log of can be said to be MySQL The most important log , It records everything DDL and DML( In addition to data query statements ) sentence , Record as an event , It also contains the time consumed by statement execution ,MySQL The binary log of is transaction safe .
Generally speaking, there will be 1% Loss of performance . Binary has two most important usage scenarios :
firstly :MySQL Replication stay Master End open binlog,Mster Pass its binary log to slaves In order to achieve master-slave Data consistent purpose .
second : By using mysqlbinlog Tools to recover data .
Binary logs include two types of files : Binary log index file ( The file name suffix is .index) Used to record all binary files , Binary log file ( The file name suffix is .00000*) Record everything in the database DDL and DML( In addition to data query statements ) Statement event .
Turn on MySQL Of binlog
stay mysql Start and restart in the configuration file of MySQL take effect , commonly Linux Under the system MySQL The configuration file paths are basically /etc/my.cnf ;log-bin=mysql-bin
The said binlog The prefix of the log is mysql-bin , The log files generated in the future are mysql-bin.123456 The numbers after the file are generated in order . Every time mysql Restart or reach the threshold of single file size , A new file , Number... In sequence .
binlog Classification settings
MySQL Of binlog There are three formats for , Namely STATEMENT、MIXED、ROW. In the configuration file, you can configure options to specify :binlog_format=
statement [ Sentence level ]
Sentence level ,binlog The statement that performs a write operation each time is recorded .
relative row Mode saves space , But there can be inconsistencies , for example :update table_name set create_date=now();
If you use binlog Log recovery , Due to the different execution time, the data may be different ( master When dropping data create_date by 2021-08-08 11:10:30 , but binlog When executing a statement from a library create_date The time may become 2021-08-08 11:11:23 , The main reason is that the statement execution time is asynchronous )
advantage : Save a space
shortcoming : It may cause data inconsistency
row [ Row level ]
Row level , binlog It records the changes of each line after each operation .
advantage : Keep the data absolutely consistent . Because I don 't care sql What is it? , What functions are referenced , He only records the effect of execution .
shortcoming : Take up a lot of space .
mixed [ Combine statement level and row level ]
statement Upgraded version , To a certain extent, it solves the problem of statement Pattern inconsistencies
In some cases, for example :
○ When the function contains UUID() when ;
○ contain AUTO_INCREMENT When the table of fields is updated ;
○ perform INSERT DELAYED When the sentence is ;
○ use UDF when ;
According to ROW The way to deal with
advantage : Save a space , At the same time, a certain degree of consistency is taken into account .
shortcoming : There are also very few cases that still cause inconsistencies , in addition statement and mixed For the need to be, for binlog It's not convenient for us to monitor the situation .
Environmental preparation
Machine planning
I used 4 Taiwan machine :
Machine planning :ops01、ops02、ops03 For installation kafka + zookeeper + canal colony ;ops04 Used to deploy MySQL service , When testing, you can MySQL Deployed in 3 One of the clusters
11.8.37.50 ops01
11.8.36.63 ops02
11.8.36.76 ops03
11.8.36.86 ops04
4 All the machines are /etc/hosts Configure hostname in hosts analysis
Installation configuration MySQL
Create new databases and tables for business simulation , The installation steps will not be introduced here , If not installed MySQL, You can refer to the previous articles by yourself 《MySQL 5.7 Installation tutorial (win10)》 There is MySQL Detailed installation steps ;
installed MySQL after , Make basic settings and configurations
# Sign in mysql
root@ops04:/root #mysql -uroot -p123456
mysql: [Warning] Using a password on the command line interface can be insecure. Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 442523
Server version: 5.7.29 MySQL Community Server (GPL)
Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
# increase canal Users and configure permissions
mysql> GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'canal'@'%' IDENTIFIED BY 'canal';
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> quit;
Bye
# modify MySQL The configuration file , Add binlog Related configuration items
root@ops04:/root #vim /etc/my.cnf
# binlog
server-id=1
log-bin=mysql-bin
binlog_format=row
binlog-do-db=gmall
Create a new one gmall library , In fact, all libraries can , As long as it corresponds to the above configuration file
restart MySQL:
root@ops04:/root #mysql -V
mysql Ver 14.14 Distrib 5.7.29, for Linux (x86_64) using EditLine wrapper
root@ops04:/root #systemctl status mysqld
● mysqld.service - MySQL Server
Loaded: loaded (/usr/lib/systemd/system/mysqld.service; enabled; vendor preset: disabled)
Active: active (running) since Wed 2021-05-26 09:30:25 CST; 2 months 22 days ago
Docs: man:mysqld(8)
http://dev.mysql.com/doc/refman/en/using-systemd.html
Main PID: 32911 (mysqld)
Memory: 530.6M
CGroup: /system.slice/mysqld.service
└─32911 /usr/sbin/mysqld --daemonize --pid-file=/var/run/mysqld/mysqld.pid
May 26 09:30:18 ops04 systemd[1]: Starting MySQL Server...
May 26 09:30:25 ops04 systemd[1]: Started MySQL Server.
root@ops04:/root #
root@ops04:/root #systemctl restart mysqld
root@ops04:/root #
【 Be careful 】: In addition binlog After the configuration , restart MySQL After service , In the storage directory, there will be related binlog file , The format is as follows
root@ops04:/var/lib/mysql #ll | grep mysql-bin
-rw-r----- 1 mysql mysql 1741 Aug 17 14:27 mysql-bin.000001
-rw-r----- 1 mysql mysql 19 Aug 17 11:18 mysql-bin.index
verification canal The user login :
root@ops04:/root #mysql -ucanal -pcanal -e "show databases"
mysql: [Warning] Using a password on the command line interface can be insecure.
+--------------------+
| Database |
+--------------------+
| information_schema |
| gmall |
| mysql |
| performance_schema |
| sys |
+--------------------+
root@ops04:/root #
stay gmall Create a new table in the library , And insert some sample data for testing :
CREATE TABLE `canal_test` (
` Temperature ` varchar(255) DEFAULT NULL,
` height ` varchar(255) DEFAULT NULL,
` weight ` varchar(255) DEFAULT NULL,
` article ` varchar(255) DEFAULT NULL,
` date ` date DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
INSERT INTO `canal_test`(` Temperature `, ` height `, ` weight `, ` article `, ` date `) VALUES ('36.5', '1.70', '180', '4', '2021-06-01');
INSERT INTO `canal_test`(` Temperature `, ` height `, ` weight `, ` article `, ` date `) VALUES ('36.4', '1.70', '160', '8', '2021-06-02');
INSERT INTO `canal_test`(` Temperature `, ` height `, ` weight `, ` article `, ` date `) VALUES ('36.1', '1.90', '134', '1', '2021-06-03');
INSERT INTO `canal_test`(` Temperature `, ` height `, ` weight `, ` article `, ` date `) VALUES ('37.3', '1.70', '110', '14', '2021-06-04');
INSERT INTO `canal_test`(` Temperature `, ` height `, ` weight `, ` article `, ` date `) VALUES ('35.7', '1.70', '133', '0', '2021-06-05');
INSERT INTO `canal_test`(` Temperature `, ` height `, ` weight `, ` article `, ` date `) VALUES ('36.8', '1.90', '200', '6', '2021-06-06');
INSERT INTO `canal_test`(` Temperature `, ` height `, ` weight `, ` article `, ` date `) VALUES ('37.5', '1.70', '132', '25', '2021-06-07');
INSERT INTO `canal_test`(` Temperature `, ` height `, ` weight `, ` article `, ` date `) VALUES ('35.7', '1.70', '160', '2', '2021-06-08');
INSERT INTO `canal_test`(` Temperature `, ` height `, ` weight `, ` article `, ` date `) VALUES ('36.3', '1.80', '131.4', '9', '2021-06-09');
INSERT INTO `canal_test`(` Temperature `, ` height `, ` weight `, ` article `, ` date `) VALUES ('37.3', '1.70', '98.8', '4', '2021-06-10');
install kafka + zookeeper
In order to achieve canal High availability , The specific installation steps are not introduced here , Reduce space , You can refer to the previous articles by yourself 《Kafka Deploy 、 Principle and Application Introduction 》 There is kafka Detailed installation steps for ;
Inquire about kafka and zookeeper Operation status of each port cluster :
jyc@ops03:/opt/module >ssh ops01 'sudo netstat -tnlpu| grep -E "9092|2181"'
tcp6 0 0 :::9092 :::* LISTEN 42305/java
tcp6 0 0 :::2181 :::* LISTEN 41773/java
jyc@ops03:/opt/module >ssh ops02 'sudo netstat -tnlpu| grep -E "9092|2181"'
tcp6 0 0 :::9092 :::* LISTEN 33518/java
tcp6 0 0 :::2181 :::* LISTEN 33012/java
jyc@ops03:/opt/module >ssh ops03 'sudo netstat -tnlpu| grep -E "9092|2181"'
tcp6 0 0 :::9092 :::* LISTEN 102886/java
tcp6 0 0 :::2181 :::* LISTEN 102422/java
Installation and deployment canal
Ali's canal The project address is :https://github.com/alibaba/canal, The download link can be found at github Click on the right side of the page release View each version to download , It is suggested that if you have the energy, you can check more popular items on the homepage of Alibaba , Many projects are becoming more and more popular .
Download installation package
# Download installation package
jyc@ops03:/opt/software >wget https://github.com/alibaba/canal/releases/download/canal-1.1.5/canal.deployer-1.1.5.tar.gz
jyc@ops03:/opt/software >ll | grep canal
-rw-r--r-- 1 jyc jyc 60205298 Aug 17 11:23 canal.deployer-1.1.5.tar.gz
Unpack the installation
# newly build canal Unpack the directory 【 Be careful 】: There are no top-level projects extracted from the official project canal Catalog , So create a new directory to decompress components
jyc@ops03:/opt/software >mkdir -p /opt/module/canal
jyc@ops03:/opt/software >tar -xf canal.deployer-1.1.5.tar.gz -C /opt/module/canal/
modify canal The master configuration
# modify canal Master profile
jyc@ops03:/opt/module/canal >cd conf/
jyc@ops03:/opt/module/canal/conf >ll
total 28
-rwxrwxr-x 1 jyc jyc 319 Apr 19 15:48 canal_local.properties
-rwxrwxr-x 1 jyc jyc 6277 Apr 19 15:48 canal.properties
drwxrwxr-x 2 jyc jyc 4096 Aug 17 13:49 example
-rwxrwxr-x 1 jyc jyc 3437 Apr 19 15:48 logback.xml
drwxrwxr-x 2 jyc jyc 4096 Aug 17 13:49 metrics
drwxrwxr-x 3 jyc jyc 4096 Aug 17 13:49 spring
# Change the following configuration : zk | Synchronization policy target mode | kafka
jyc@ops03:/opt/module/canal/conf >vim canal.properties
canal.zkServers =ops01:2181,ops02:2181,ops03:2181
canal.serverMode = kafka
kafka.bootstrap.servers = ops01:9092,ops02:9092,ops03:9092
modify canal Instance configuration for - (mysql to kafka)
# Configure instance related configurations :canal You can start multiple instances , An instance corresponds to a directory configuration , For example, put example Copy the directory to xxx, hold xxx The configuration change under the directory starts , Is a new example
jyc@ops03:/opt/module/canal/conf >cd example/
jyc@ops03:/opt/module/canal/conf/example >ll
total 4
-rwxrwxr-x 1 jyc jyc 2106 Apr 19 15:48 instance.properties
# Be careful 11.8.38.86:3306 Need to change to your own environment mysql Address and port , Secondly, the user name and password are changed to those of your own environment ,topic Customize a
jyc@ops03:/opt/module/canal/conf/example >vim instance.properties
canal.instance.master.address=11.8.38.86:3306
canal.instance.dbUsername=canal
canal.instance.dbPassword=canal
canal.mq.topic=jyc_test_canal
canal.mq.partitionsNum=12
Distribute the installation directory
# Will modify the canal The catalog is distributed to another 2 Servers :
jyc@ops03:/opt/module >scp -r /opt/module/canal ops01:/opt/module/
jyc@ops03:/opt/module >scp -r /opt/module/canal ops02:/opt/module/
start-up canal colony
# Each server starts the cluster in turn canal
jyc@ops03:/opt/module >cd /opt/module/canal/bin/
jyc@ops03:/opt/module/canal/bin >./startup.sh
jyc@ops02:/home/jyc >cd /opt/module/canal/bin/
jyc@ops02:/opt/module/canal/bin >./startup.sh
jyc@ops01:/home/jyc >cd /opt/module/canal/bin/
jyc@ops01:/opt/module/canal/bin >./startup.sh
The verification results
# Monitor... On a server kafka
jyc@ops03:/opt/module/canal/bin >kafka-console-consumer.sh --bootstrap-server ops01:9092,ops02:9092,ops03:9092 --topic jyc_test_canal
[2021-08-17 14:21:29,924] WARN [Consumer clientId=consumer-console-consumer-17754-1, groupId=console-consumer-17754] Error while fetching metadata with correlation id 2 : {
jyc_test_canal=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
As expected , If the monitoring is successful now ops04 On MySQL in gmall library , So in gmall If there is any data change in the table in the library , Then the console will output information and update it to the foreground in real time
Data in the current table :
Change the data in the table to observe the console output :
1. take 2021-06-10 -> 2021-08-17
2. Add a piece of data
3. Change a value 1 -> 1111
jyc@ops03:/opt/module/canal/bin >kafka-console-consumer.sh --bootstrap-server ops01:9092,ops02:9092,ops03:9092 --topic jyc_test_canal
[2021-08-17 14:21:29,924] WARN [Consumer clientId=consumer-console-consumer-17754-1, groupId=console-consumer-17754] Error while fetching metadata with correlation id 2 : {
jyc_test_canal=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
{
"data":[{
" Temperature ":"37.3"," height ":"1.70"," weight ":"98.8"," article ":"4"," date ":"2021-08-17"}],"database":"gmall","es":1629185045000,"id":6,"isDdl":false,"mysqlType":{
" Temperature ":"varchar(255)"," height ":"varchar(255)"," weight ":"varchar(255)"," article ":"varchar(255)"," date ":"date"},"old":[{
" date ":"2021-06-10"}],"pkNames":null,"sql":"","sqlType":{
" Temperature ":12," height ":12," weight ":12," article ":12," date ":91},"table":"canal_test","ts":1629185063194,"type":"UPDATE"}
{
"data":[{
" Temperature ":"35.55"," height ":"1.999"," weight ":"99.99"," article ":"999"," date ":"2021-08-17"}],"database":"gmall","es":1629185086000,"id":7,"isDdl":false,"mysqlType":{
" Temperature ":"varchar(255)"," height ":"varchar(255)"," weight ":"varchar(255)"," article ":"varchar(255)"," date ":"date"},"old":null,"pkNames":null,"sql":"","sqlType":{
" Temperature ":12," height ":12," weight ":12," article ":12," date ":91},"table":"canal_test","ts":1629185104967,"type":"INSERT"}
{
"data":[{
" Temperature ":"36.1"," height ":"1.90"," weight ":"134"," article ":"1111"," date ":"2021-06-03"}],"database":"gmall","es":1629185104000,"id":8,"isDdl":false,"mysqlType":{
" Temperature ":"varchar(255)"," height ":"varchar(255)"," weight ":"varchar(255)"," article ":"varchar(255)"," date ":"date"},"old":[{
" article ":"1"}],"pkNames":null,"sql":"","sqlType":{
" Temperature ":12," height ":12," weight ":12," article ":12," date ":91},"table":"canal_test","ts":1629185122499,"type":"UPDATE"}
It is obvious that each change can be shown in the record ,old Data and current data can be mapped one by one , As of now canal The whole process chain is all inclusive ,canal The methods of synchronizing to different storage media are basically the same .
Expand :
Can be in zookeeper View from the command line canal Information :
jyc@ops01:/opt/module/canal/bin >zkCli.sh
Connecting to localhost:2181
[zk: localhost:2181(CONNECTED) 0] ls -w /
[hbase, kafka, otter, jyc, zookeeper]
[zk: localhost:2181(CONNECTED) 1] ls -w /otter
[canal]
[zk: localhost:2181(CONNECTED) 2] ls -w /otter/canal
[cluster, destinations]
边栏推荐
猜你喜欢
canal部署、原理和使用介绍
MySQL-08
技术能力的思考和总结
【C语言】深度剖析数据在内存中的存储
Cython入门
421- binary tree (226. reversed binary tree, 101. symmetric binary tree, 104. maximum depth of binary tree, 222. number of nodes of complete binary tree)
The purpose of writing programs is to solve problems
Typora activation method
[intra group questions semester summary] some reference questions for beginners
Operator priority, associativity, and whether to control the evaluation order [detailed explanation]
随机推荐
Bingc (inheritance)
numpy. frombuffer()
Class and object learning
【C語言】深度剖析數據在內存中的存儲
volatile应用场景
numpy.frombuffer()
kolla-ansible部署openstack yoga版本
Adapter mode
Day2- syntax basis and variables
家庭记账程序(第一版)
Cython入门
Detailed explanation of serial port communication principle 232, 422, 485
Unicloud cloud development obtains applet user openid
A tragedy triggered by "yyyy MM DD" and vigilance before New Year's Day~
工厂方法模式、抽象工厂模式
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
kolla-ansible部署openstack yoga版本
5分钟包你学会正则表达式
Implementation of third-party wechat authorized login for applet
Matching environment of ES6