当前位置:网站首页>Indicator statistics: real time uvpv statistics based on flow computing Oceanus (Flink)
Indicator statistics: real time uvpv statistics based on flow computing Oceanus (Flink)
2022-06-24 04:10:00 【Wuyuntao】
Recently, I sorted out how to use Flink To achieve real-time UV、PV Statistics of indicators , And communicate with colleagues in the micro vision Department of the company . Then the scene is simplified , And found that Flink SQL Come on The statistics of these indicators will be more convenient .
1 Solution description
1.1 summary
This scheme is combined with local self construction Kafka colony 、 Tencent Cloud Computing Oceanus(Flink)、 Cloud database Redis On blogs 、 Shopping and other websites UV、PV Real time visual analysis of indicators . The analysis indicator includes the number of independent visitors to the website (UV )、 Product hits (PV)、 Conversion rate ( Conversion rate = Number of transactions / Clicks ) etc. .
Introduction to related concepts : UV(Unique Visitor): Number of unique visitors . A client visiting your website is a visitor , If the user visits the same page 5 Time , Then the of the page UV Only add 1, because UV It counts the number of users after de duplication, not the number of visits .PV(Page View): Hits or page views . If the user visits the same page 5 Time , Then the of the page PV Will add 5.
1.2 Scheme architecture and advantages
According to the above real-time indicators, make statistics on the scene , The following architecture diagram is designed :
List of products involved :
- Local data center (IDC) Self built Kafka colony
- Private networks VPC
- Private line access / Cloud networking /VPN Connect / Peer to peer connection
- Flow calculation Oceanus
- Cloud database Redis
2 Lead to
Purchase the required Tencent cloud resources , And get through the network . self-built Kafka The cluster shall adopt... According to the region where the cluster is located VPN Connect 、 Special line connection or peer-to-peer connection to realize network interconnection .
2.1 Create a private network VPC
Private networks (VPC) It is a logically isolated cyberspace customized on Tencent cloud , In the build Oceanus colony 、Redis It is recommended to select the same network for services such as components VPC, The network can communicate with each other . Otherwise, you need to use peer-to-peer connection 、NAT gateway 、VPN And other ways to get through the network . Please refer to... For private network creation steps Help document .
2.2 establish Oceanus colony
Flow calculation Oceanus It is a powerful tool for real-time analysis of big data product ecosystem , Is based on Apache Flink Built with one-stop development 、 Seamless connection 、 Sub second delay 、 Low cost 、 Enterprise class real-time big data analysis platform with the characteristics of security and stability . Flow calculation Oceanus The goal is to maximize the value of enterprise data , Accelerate the construction process of real-time digitization of enterprises .
stay Oceanus Console 【 Cluster management 】->【 New cluster 】 Page create cluster , Choose the region 、 Availability zone 、VPC、 journal 、 Storage , Set the initial password, etc .VPC And subnets use the network just created . After you create Flink The clusters are as follows :
2.3 establish Redis colony
stay Redis Console Of 【 New instance 】 Page create cluster , Select the same region as other components , The same private network in the same region VPC, The same subnet is also selected here .
2.4 Configure self built kafka colony
2.4.1 Modify self built Kafka Cluster configuration
build by oneself Kafka When the cluster is connected bootstrap-servers Parameters are often used hostname instead of ip To connect .
Mode one : Use Oceanus platform Set customization DNS Function settings , To map hostname To ip.
Square twelve : Modify self built in the following way Kafka Clustered hostname by ip:
But with self built Kafka The cluster is connected to... On Tencent cloud Oceanus The cluster is a fully managed cluster , Oceanus The self built cluster cannot be resolved on the cluster node hostname And ip The mapping relation of , So you need to change the listener address from hostname by ip Form of address connection .
take config/server.properties In profile advertised.listeners Parameter is configured as IP Address . Example :
# 0.10.X And later versions advertised.listeners=PLAINTEXT://10.1.0.10:9092 # 0.10.X The previous version advertised.host.name=PLAINTEXT://10.1.0.10:9092
Restart after modification Kafka colony .
! If you use self built on the cloud zookeeper Address , We also need to zk The configuration of the hostname modify IP Address form .
2.4.2 Simulate sending data to topic
Use in this case topic by uvpv-demo
1)Kafka client
Enter self built Kafka Cluster nodes , start-up Kafka client , Simulate sending data .
./bin/kafka-console-producer.sh --broker-list 10.1.0.10:9092 --topic uvpv-demo
>{"record_type":0, "user_id": 2, "client_ip": "100.0.0.2", "product_id": 101, "create_time": "2021-09-08 16:20:00"}
>{"record_type":0, "user_id": 3, "client_ip": "100.0.0.3", "product_id": 101, "create_time": "2021-09-08 16:20:00"}
>{"record_type":1, "user_id": 2, "client_ip": "100.0.0.1", "product_id": 101, "create_time": "2021-09-08 16:20:00"}2) Use script to send
A script :Java Code reference :https://cloud.tencent.com/document/product/597/54834
The second script :Python Script . Refer to the previous case Python The script can be modified properly : Based on Tencent cloud Oceanus Realize real-time large screen analysis of live video scenes
2.5 Get through self built IDC Cluster to Tencent cloud network communication
build by oneself Kafka Cluster Unicom Tencent cloud network , This can be done by 3 Two ways to get through self built IDC Network communication to Tencent cloud .
- Private line access Applicable to local data center IDC Connect with Tencent cloud network .
- Cloud networking Applicable to local data center IDC Connect with Tencent cloud network , It can also be used for private networks between different regions on the cloud VPC Get through .
- VPN Connect Applicable to local data center IDC Connect with Tencent cloud network .
- Peer to peer connection + NAT gateway Suitable for private networks between different regions on the cloud VPC Get through .
In this scheme VPN How to connect , Realize local IDC Communication with cloud networks . Reference link : establish VPC To IDC The connection of ( Routing table )
According to this scheme, the following network architecture diagram is drawn :
3 Scheme realization
Next, I will introduce you how to use flow computing through a case Oceanus Implement website UV、PV、 Real time statistics of conversion index .
3.1 Business objectives
Only the following are listed here 3 Two statistical indicators :
- Number of unique visitors to the website UV.Oceanus After treatment in Redis Pass through set Store the number of unique visitors , At the same time, it also achieves the purpose of de duplication of the data of the same visitor .
- The number of hits on the website's product page PV.Oceanus After treatment in Redis Use in list Store page hits .
- Conversion rate ( Conversion rate = Number of transactions / Clicks ).Oceanus After treatment in Redis of use String Just store .
3.2 Source data format
Browse record / Purchase record Kafka topic:uvpv-demo
Field | type | meaning |
|---|---|---|
record_type | int | Customer number |
user_id | varchar | Customer ip Address |
client_ip | varchar | Your room number, |
product_id | Int | Time to enter the room |
create_time | timestamp | Creation time |
Kafka Internal use json Format store , The displayed data format is as follows :
# Browse record
{
"record_type":0,
"user_id": 6,
"client_ip": "100.0.0.6",
"product_id": 101,
"create_time": "2021-09-08 16:20:00"
}
# Purchase record
{
"record_type":1,
"user_id": 6,
"client_ip": "100.0.0.6",
"product_id": 101,
"create_time": "2021-09-08 16:20:00"
}3.3 To write Flink SQL Homework
The example implements UV、PV And conversion 3 Acquisition logic of indicators , And write Sink End .
1、 Definition Source
CREATE TABLE `input_web_record` (
`record_type` INT,
`user_id` INT,
`client_ip` VARCHAR,
`product_id` INT,
`create_time` TIMESTAMP,
`times` AS create_time,
WATERMARK FOR times AS times - INTERVAL '10' MINUTE
) WITH (
'connector' = 'kafka', -- Optional 'kafka','kafka-0.11'. Pay attention to select the corresponding built-in Connector
'topic' = 'uvpv-demo',
'scan.startup.mode' = 'earliest-offset',
'properties.bootstrap.servers' = '10.1.0.10:9092',
'properties.group.id' = 'WebRecordGroup', -- Required parameters , Be sure to designate Group ID
-- Define the data format (JSON Format )
'format' = 'json',
'json.ignore-parse-errors' = 'true', -- Ignore JSON Structure Parsing exception
'json.fail-on-missing-field' = 'false' -- If set to true, An error will be reported if a missing field is encountered Set to false The missing field is set to null
);2、 Definition Sink
-- UV sink CREATE TABLE `output_uv` ( `userids` STRING, `user_id` STRING ) WITH ( 'connector' = 'redis', -- Output to Redis 'command' = 'sadd', -- Use a collection to save uv( Support command :set、lpush、sadd、hset、zadd) 'nodes' = '192.28.28.217:6379', -- redis Connection address , Cluster mode is used by multiple nodes '','' Separate . -- 'additional-key' = '<key>', -- Is used to specify the hset and zadd Of key.hset、zadd You have to set . 'password' = 'yourpassword' -- Optional parameters , password ); -- PV sink CREATE TABLE `output_pv` ( `pagevisits` STRING, `product_id` STRING, `hour_count` BIGINT ) WITH ( 'connector' = 'redis', -- Output to Redis 'command' = 'lpush', -- Save with a list pv( Support command :set、lpush、sadd、hset、zadd) 'nodes' = '192.28.28.217:6379', -- redis Connection address , Cluster mode is used by multiple nodes '','' Separate . -- 'additional-key' = '<key>', -- Is used to specify the hset and zadd Of key.hset、zadd You have to set . 'password' = 'yourpassword' -- Optional parameters ); -- Conversion rate sink CREATE TABLE `output_conversion_rate` ( `conversion_rate` STRING, `rate` STRING ) WITH ( 'connector' = 'redis', -- Output to Redis 'command' = 'set', -- Save with a list pv( Support command :set、lpush、sadd、hset、zadd) 'nodes' = '192.28.28.217:6379', -- redis Connection address , Cluster mode is used by multiple nodes '','' Separate . -- 'additional-key' = '<key>', -- Is used to specify the hset and zadd Of key.hset、zadd You have to set . 'password' = 'yourpassword' -- Optional parameters );
3、 Business logic
-- To produce UV indicators , Count all the time UV INSERT INTO output_uv SELECT 'userids' AS `userids`, CAST(user_id AS string) AS user_id FROM input_web_record ; -- Process and get PV indicators , Count every 10 Minutes of the PV INSERT INTO output_pv SELECT 'pagevisits' AS pagevisits, CAST(product_id AS string) AS product_id, SUM(product_id) AS hour_count FROM input_web_record WHERE record_type = 0 GROUP BY HOP(times, INTERVAL '5' MINUTE, INTERVAL '10' MINUTE), product_id, user_id; -- Process and obtain the conversion index , Count every 10 Conversion rate in minutes INSERT INTO output_conversion_rate SELECT 'conversion_rate' AS conversion_rate, CAST( (((SELECT COUNT(1) FROM input_web_record WHERE record_type=0)*1.0)/SUM(a.product_id)) as string) FROM (SELECT * FROM input_web_record where record_type = 1) AS a GROUP BY HOP(times, INTERVAL '5' MINUTE, INTERVAL '10' MINUTE), product_id;
3.4 The results verify that
General situation , Will pass Web Website to show the statistics UV/PV indicators , And here for simplicity . Directly in Redis Console Log in to query :
userids: Storage UV
pagevisits: Storage PV
conversion_rate: Storage conversion rate , That is, the number of times to buy goods / Total page hits .
4 summary
By self building Kafka Clusters collect data , In stream computing Oceanus (Flink) Field accumulation in real time 、 Window aggregation and other operations , Store the processed data in the cloud database Redis, Statistics to real-time refresh UV、PV Equal index . This program is in Kafka json In order to be easy to understand, the format design is simplified , Put the browsing records and product purchase records in the same topic in , Focus on opening up self built IDC And Tencent cloud products to show the whole scheme . For very large-scale UV duplicate removal , Colleagues with micro vision adopted Redis hyperloglog To achieve UV Statistics . Compared with direct use set Type mode has the advantage of minimal memory space , See the link for details :https://cloud.tencent.com/developer/article/1889162.
边栏推荐
- Received status code 502 from server: Bad Gateway
- How to select the application of the server?
- 3. go deep into tidb: perform optimization explanation
- Web penetration test - 5. Brute force cracking vulnerability - (7) MySQL password cracking
- 多任务视频推荐方案,百度工程师实战经验分享
- [code Capriccio - dynamic planning] t392 Judgement subsequence
- Student information management system user manual
- The practice of tidb slow log in accompanying fish
- Live broadcast Reservation: Micro build practice - quickly build a catering reservation applet
- How to select a telemedicine program system? These four points are the key!
猜你喜欢

On game safety (I)

618 promotion: mobile phone brand "immortal fight", high-end market "who dominates the ups and downs"?

Black hat SEO practice: General 301 weight PR hijacking

Clickhouse (02) Clickhouse architecture design introduction overview and Clickhouse data slicing design

Brief ideas and simple cases of JVM tuning - how to tune

多任务视频推荐方案,百度工程师实战经验分享

Changjiang Dayong, director of openeuler community: jointly promote the new open source model of Euler and jointly build a new open source system

抢先报名丨新一代 HTAP 数据库如何在云上重塑?TiDB V6 线上发布会即将揭晓!

英特尔 XTU 官方超频工具已支持 Win11 22H2 和 13 代酷睿 Raptor Lake 处理器

黑帽SEO实战之目录轮链批量生成百万页面
随机推荐
Easyplayer consumes traffic but does not play video and reports an error libdecoder Wasm404 troubleshooting
TCP three handshakes and four waves
Oceanbase community OBD deployment example primary replica
How to monitor multiple platforms simultaneously when easydss/easygbs platform runs real-time monitoring?
Through the fog: location notes of Flink crash with a multi component timeout
Can the video streams of devices connected to easygbs from the intranet and the public network go through their respective networks?
黑帽SEO实战之通用301权重pr劫持
golang clean a slice
Black hat actual combat SEO: never be found hijacking
"." in the structure of C language And "- & gt;" Differences between
The results of the 2022 open source summer were announced, and 449 college students will contribute to open source projects
[hot promotion] Tencent cloud enterprise cloud disk solution
Tencent cloud console work order submission Guide
The collection method of penetration test, and which methods can be used to find the real IP
How should the server be placed?
Unable to access the CVM self built container outside the TKE cluster pod
How to set up a web server what is the price of the server
The practice of tidb slow log in accompanying fish
[code Capriccio - dynamic planning] t392 Judgement subsequence
High quality travel on national day, visual start of smart Tourism