当前位置:网站首页>Alibaba cloud full link data governance

Alibaba cloud full link data governance

2022-06-24 07:40:00 Concise programming

The whole process of Alibaba cloud full link data governance experiment

The whole process of Alibaba cloud full link data governance experiment

Experimental address

https://developer.aliyun.com/adc/scenario/eed3362c8a0943a9b7ab7ec774aed335?spm=a2c6h.13788135.J_2488678810.32.1b1124448p2AbX

Experimental process

Time :20220622

adopt DataWorks Collect log data to MaxCompute

explain : This scenario has provided you with OSS Data sources and RDS data source .

newly build OSS data source .

Double click to open the virtual desktop Chromium Web browser .

stay RAM Click next... In the user login box , Copy and paste the sub-user password in the upper left corner of the page into the user password input box , Click login .
 Insert picture description here

 Insert picture description here

 Insert picture description here

Copy the address below , stay Chromium The web browser opens a new tab , Paste and access DataWorks Console .

https://workbench.data.aliyun.com/

 Insert picture description here

explain : If you visit DataWorks At the console , The big data basic service instructions page appears , Please refresh the web page .
On the overview page , Find the area where your resources are located , Find your workspace , Then click data integration .
 Insert picture description here
 Insert picture description here
In the left navigation bar , Click data source .

 Insert picture description here
On the data source management page , Click new data source at the top right .
 Insert picture description here
In the new data source dialog box , Select the data source type as OSS.
 Insert picture description here
In addition OSS In the data source dialog box , Configure various parameters , Click more options .

Parameter description :

 Data source name : Input oss_workshop_log.
    Endpoint: Input http://oss-cn-shanghai-internal.aliyuncs.com.
    Bucket: Input new-dataworks-workshop.
    AccessKey ID: Input LTAI4FvGT3iU4xjKotpUMAjS.
    AccessKey Secret: Input 9RSUoRmNxpRC9EhC4m9PjuG7Jzy7px.

 Insert picture description here
 Insert picture description here
In the resource group list , Click test connectivity... After the common resource group .
 Insert picture description here
In addition OSS In the data source dialog box , Click finish .
 Insert picture description here
In the new data source dialog box , Select the data source type as MySQL Insert picture description here

In addition MySQL In the data source dialog box , Configure various parameters , Click more options .

Parameter description :

 Data source type : Select alicloud instance mode .
     Data source name : Input rds_workshop_log.
     regional : Choose East China 2( Shanghai ).
    RDS example ID: Input rm-bp1z69dodhh85z9qa.
    RDS Instance master account ID: Input 1156529087455811.
     Database name : Input workshop.
     user name : Input workshop.
     password : Input workshop#2017.

 Insert picture description here

 Insert picture description here

 Insert picture description here
In addition MySQL In the data source dialog box , Click finish

 Insert picture description here

Create business processes

On the data source management page , Click... On the top left Icon , single click DataStudio( Data development ).
 Insert picture description here
On the data development page , Right click the business process , Select new business process
 Insert picture description here

In the new business process dialog box , Enter the business name , for example test, Click New .
 Insert picture description here
In the business process development panel , Click the virtual node and drag it to the edit page on the right
 Insert picture description here

In the new node dialog , The node name is entered as workshop_start, Click Submit .
 Insert picture description here
In the business process development panel , Click offline sync and drag to the edit page on the right .
 Insert picture description here
In the new node dialog , The node name is entered as OSS_ Data synchronization , Click Submit .

explain : Because the virtual desktop in the current experimental environment does not support copying and pasting Chinese directly ( There will be garbled code when copying and pasting ), You can click... In the lower right corner of the virtual desktop Icon to switch the input method of the virtual desktop , Enter Chinese manually .

 Insert picture description here
In the business process development panel , Click offline sync and drag to the edit page on the right .
In the new node dialog , The node name is entered as rds_ Data synchronization , Click Submit .

explain : Because the virtual desktop in the current experimental environment does not support copying and pasting Chinese directly ( There will be garbled code when copying and pasting ), You can click... In the lower right corner of the virtual desktop Icon to switch the input method of the virtual desktop , Enter Chinese manually

 Insert picture description here

On the right side of the edit page , By dragging and dropping the connection , take workshop_start The node is set as the upstream node of two offline synchronization nodes
 Insert picture description here

 Insert picture description here

To configure workshop_start node

On the left side of the data development page , Select the business process > Your business processes > Universal , Double click the virtual node workshop_start.

 Insert picture description here

On the edit page of this node , Click scheduling configuration on the right

 Insert picture description here

In the time attribute area of the scheduling configuration panel , The rerun attribute can be selected as rerun after successful or failed operation . In the scheduling dependency area of the scheduling configuration panel , Click Use workspace root node , Set up workshop_start The upstream node of the node is the workspace root node .

explain :

 Because the new version sets input and output nodes for each node , So we need to give workshop_start The node sets an input . Set its upstream node as the workspace root node , Usually named the workspace name _root.
 Due to the display problem of the virtual desktop browser in the current experimental environment , If the parameters on the current scheduling configuration page are misaligned , You can set the display percentage of the browser to be smaller .

 Insert picture description here

 Insert picture description here
At the top left of the editing page of this node , single click Icon to save the configuration .

 Insert picture description here

new table

On the data development page , Select the business process >MaxCompute, Right click the table , Click new table .

 Insert picture description here
In the new table dialog box , Table name input ods_raw_log_d, Click New .

In the table ods_raw_log_d Edit page for , single click DDL Pattern .
 Insert picture description here
stay DDL Mode dialog , Enter the following to create OSS The log corresponds to the table creation statement of the target table , Click generate table structure .

CREATE TABLE IF NOT EXISTS  ods_raw_log_d (
    col STRING
)
PARTITIONED BY (
    dt STRING
);

 Insert picture description here
In the table ods_raw_log_d Edit page for , The Chinese name is entered as OSS The log corresponds to the target table , Click submit to production .
 Insert picture description here
On the data development page , Select the business process >MaxCompute, Right click the table , Click new table .

In the new table dialog box , Table name input ods_user_info_d, Click Submit .

 Insert picture description here
In the table ods_user_info_d Edit page for , single click DDL Pattern .

stay DDL Mode dialog , Enter the following to create RDS Create a table statement corresponding to the target table , Click generate table structure .

CREATE TABLE IF NOT  EXISTS ods_user_info_d (
    uid STRING COMMENT 'uid',
    gender STRING COMMENT 'gender',
    age_range STRING COMMENT 'age_range',
    zodiac STRING COMMENT 'zodiac'
)
PARTITIONED BY (
    dt STRING
);

 Insert picture description here
In the table ods_user_info_d Edit page for , The Chinese name is entered as RDS Corresponding to the target table , Click submit to production .

 Insert picture description here

Configure the offline synchronization node .

On the left side of the data development page , Select the business process > Your business processes > Data integration , double-click oss_di( In the figure below oss_ Data synchronization ).

 Insert picture description here

stay oss_di The data source of the page , Configure the following parameters , Other configurations remain default .

Parameter description :

 data source : choice OSS>oss_workshop_log data source .
     file name ( Including path ): Input user_log.txt.
     file type : choice text type .
     Column separator : The input column separator is |.

 Insert picture description here

stay oss_di The data destination of the page , Configure the following parameters , Other configurations are saved by default .

Parameter description :

 data source : choice ODPS>odps_first data source .

explain :

odps_first The data source is a workspace binding MaxCompute When an instance , The default data source automatically generated by the system .
    odps_first The data source is written to... In the current workspace MaxCompute In the project .
     surface : Select... In the data source ods_raw_log_d surface .

 Insert picture description here

In the time attribute area of the scheduling configuration panel , The rerun attribute can be selected as rerun after successful or failed operation . In the output area of this node of the scheduling dependency panel , Enter the output name of this node as the workspace name .ods_raw_log_d, Click Add .
explain : You can view the workspace name in the cloud product resource list
 Insert picture description here
In the data integration resource group configuration panel , Click more options .

In the data integration resource group configuration panel , Scheme selection debug resource group
 Insert picture description here
On the left side of the data development page , Select the business process > Data integration , double-click rds_di( In the figure below rds_ Data synchronization ).

stay rds_di The data source of the page , Configure the following parameters , Other configurations remain default .

Parameter description :

 data source : choice MySQL>rds_workshop_log data source .
     surface : Select... In the data source ods_user_info_d surface .

 Insert picture description here
stay rds_di The data destination of the page , Configure the following parameters , Other configurations are saved by default .

Parameter description :

 data source : choice ODPS>odps_first data source .
     surface : Select... In the data source ods_user_info_d surface .

 Insert picture description here
In the time attribute area of the scheduling configuration panel , The rerun attribute can be selected as rerun after successful or failed operation . In the output area of this node of the scheduling dependency panel , Enter the output name of this node as the workspace name .ods_user_info_d, Click Add .

explain : You can view the workspace name in the cloud product resource list .
 Insert picture description here
In the data integration resource group configuration panel , Click more options .

 Insert picture description here

Submit the business process .

On the left side of the data development page , Double click your business process .

 Insert picture description here
 Insert picture description here

Return to the following page , Wait for all nodes to submit successfully , single click Icon .
 Insert picture description here

Running business processes

In the upper menu bar , single click Icon .
 Insert picture description here
On the edit page of the business process , Right click rds_di( In the figure below rds_ Data synchronization ).

 Insert picture description here
 Insert picture description here
 Insert picture description here

 Insert picture description here

Confirm whether the data is successfully imported MaxCompute.

On the left navigation bar of the data development page , single click Icon .

 Insert picture description here
On the left side of the temporary query page , Right click temporary query , Select new node >ODPS SQL.
 Insert picture description here
 Insert picture description here
stay SQL Query tab , Enter the following SQL sentence , single click Icon , View import ods_raw_log_d and ods_user_info_d Number of records .

explain :SQL Fields in the statement dt You need to update to business date . for example , The date the task was run is 20180717, Then the business date is 20180716, That is, the day before the task running date .

select count(*) from ods_raw_log_d where dt=${
    bdp.system.bizdate};
select count(*) from ods_user_info_d where dt=${
    bdp.system.bizdate};

 Insert picture description here
 Insert picture description here
 Insert picture description here
 Insert picture description here

 Insert picture description here

 Insert picture description here

adopt DataWorks Calculate and analyze the collected data

Create three data tables

They are data operation layer tables (ods_log_info_d)、 Data warehouse layer table (dw_user_info_all_d) And data product layer table (rpt_user_info_d)

In the left navigation of the temporary query page , single click Icon .
In the new table dialog box , The table name is entered as ods_log_info_d, Click Submit .
 Insert picture description here
In the table ods_log_info_d Edit page for , single click DDL Pattern .

stay DDL Mode dialog , Enter the following statement to create the data operation layer table , Click generate table structure .

CREATE TABLE IF NOT EXISTS ods_log_info_d (
  ip STRING COMMENT 'ip',
  uid STRING COMMENT 'uid',
  time STRING COMMENT 'timeyyyymmddhh:mi:ss',
  status STRING COMMENT 'status',
  bytes STRING COMMENT 'bytes',
  region STRING COMMENT 'region',
  method STRING COMMENT 'method',
  url STRING COMMENT 'url',
  protocol STRING COMMENT 'protocol',
  referer STRING COMMENT 'referer',
  device STRING COMMENT 'device',
  identity STRING COMMENT 'identity'
)
PARTITIONED BY (
  dt STRING
);

 Insert picture description here
In the table ods_log_info_d Edit page for , The Chinese name is entered as the data operation layer table , Click submit to production .
 Insert picture description here

Repeat the above steps , Create a table according to the following statement , newly build dw_user_info_all_d Table and rpt_user_info_d surface , The Chinese names are entered as data warehouse layer table and data product layer table respectively , Then click submit to production .

CREATE TABLE IF NOT EXISTS dw_user_info_all_d (
  uid STRING COMMENT 'uid',
  gender STRING COMMENT 'gender',
  age_range STRING COMMENT 'age_range',
  zodiac STRING COMMENT 'zodiac',
  region STRING COMMENT 'region',
  device STRING COMMENT 'device',
  identity STRING COMMENT 'identity',
  method STRING COMMENT 'method',
  url STRING COMMENT 'url',
  referer STRING COMMENT 'referer',
  time STRING COMMENT 'timeyyyymmddhh:mi:ss'
)
PARTITIONED BY (
  dt STRING
);
CREATE TABLE IF NOT EXISTS rpt_user_info_d (
  uid STRING COMMENT 'uid',
  region STRING COMMENT 'uid',
  device STRING COMMENT 'device',
  pv BIGINT COMMENT 'pv',
  gender STRING COMMENT 'gender',
  age_range STRING COMMENT 'age_range',
  zodiac STRING COMMENT 'zodiac'
)
PARTITIONED BY (
  dt STRING
);                         

 Insert picture description here

Design business processes

On the left side of the data development page , Double click your business process .
In the business process development panel , single click ODPS SQL And drag it to the edit page on the right .
In the new node dialog , The node name is entered as ods_log_info_d, Click Submit .
 Insert picture description here
In the new node dialog , The node name is entered as dw_user_info_all_d, Click Submit .

 Insert picture description here
In the new node dialog , The node name is entered as rpt_user_info_d, Click Submit .
 Insert picture description here
On the right side of the edit page , By dragging and dropping the connection , Configure the dependencies shown in the following figure

 Insert picture description here

Create user defined functions .

Copy the address below , stay Chromium The browser opens a new tab , Paste and access , download ip2region.jar.

https://docs-aliyun.cn-hangzhou.oss.aliyun-inc.com/assets/attach/85298/cn_zh/1532163718650/ip2region.jar?spm=a2c4g.11186623.0.0.43df4d0dwSRLzd&file=ip2region.jar

On the data development page , Select the business process > Your business processes >MaxCompute, Right click the resource , Choose new >JAR.
 Insert picture description here
 Insert picture description here
 Insert picture description here
 Insert picture description here
On the data development page , Select the business process > Your business processes >MaxCompute, Right click the function , Click new function .

 Insert picture description here
In the new function dialog box , The function name is entered as getregion, Click New .

 Insert picture description here

On the register function tab , Configure the following parameters , Other configurations remain default , single click Icon .

Parameter description :

 Class name : Input org.alidata.odps.udf.Ip2Region.
     resource list : Input ip2region.jar.
     describe : Input IP Address translation region .
     Command format : Input getregion(‘ip’).
     Parameter description : Input IP Address .

 Insert picture description here
 Insert picture description here

To configure ODPS SQL node

On the data development page , Select the business process > Your business processes >MaxCompute> Data development , double-click ods_log_info_d.

 Insert picture description here

stay ods_log_info_d Node edit page , Enter the following SQL sentence , single click Icon .

INSERT OVERWRITE TABLE ods_log_info_d PARTITION (dt=${
    bdp.system.bizdate})
SELECT ip
  , uid
  , time
  , status
  , bytes 
  , getregion(ip) AS region -- Use customization UDF adopt IP Get the region .
  , regexp_substr(request, '(^[^ ]+ )') AS method -- By regularizing request The difference is 3 A field .
  , regexp_extract(request, '^[^ ]+ (.*) [^ ]+$') AS url
  , regexp_substr(request, '([^ ]+$)') AS protocol 
  , regexp_extract(referer, '^[^/]+://([^/]+){
    1}') AS referer -- By regular clarity refer, Get more accurate URL.
  , CASE
    WHEN TOLOWER(agent) RLIKE 'android' THEN 'android' -- adopt agent Get terminal information and access form .
    WHEN TOLOWER(agent) RLIKE 'iphone' THEN 'iphone'
    WHEN TOLOWER(agent) RLIKE 'ipad' THEN 'ipad'
    WHEN TOLOWER(agent) RLIKE 'macintosh' THEN 'macintosh'
    WHEN TOLOWER(agent) RLIKE 'windows phone' THEN 'windows_phone'
    WHEN TOLOWER(agent) RLIKE 'windows' THEN 'windows_pc'
    ELSE 'unknown'
  END AS device
  , CASE
    WHEN TOLOWER(agent) RLIKE '(bot|spider|crawler|slurp)' THEN 'crawler'
    WHEN TOLOWER(agent) RLIKE 'feed'
    OR regexp_extract(request, '^[^ ]+ (.*) [^ ]+$') RLIKE 'feed' THEN 'feed'
    WHEN TOLOWER(agent) NOT RLIKE '(bot|spider|crawler|feed|slurp)'
    AND agent RLIKE '^[Mozilla|Opera]'
    AND regexp_extract(request, '^[^ ]+ (.*) [^ ]+$') NOT RLIKE 'feed' THEN 'user'
    ELSE 'unknown'
  END AS identity
  FROM (
    SELECT SPLIT(col, '##@@')[0] AS ip
    , SPLIT(col, '##@@')[1] AS uid
    , SPLIT(col, '##@@')[2] AS time
    , SPLIT(col, '##@@')[3] AS request
    , SPLIT(col, '##@@')[4] AS status
    , SPLIT(col, '##@@')[5] AS bytes
    , SPLIT(col, '##@@')[6] AS referer
    , SPLIT(col, '##@@')[7] AS agent
  FROM ods_raw_log_d
  WHERE dt = ${
    bdp.system.bizdate}
) a;

 Insert picture description here
 Insert picture description here

On the data development page , Select the business process > Your business processes >MaxCompute> Data development , double-click dw_user_info_all_d.
 Insert picture description here

stay dw_user_info_all_d Node edit page , Enter the following SQL sentence , single click Icon .

INSERT OVERWRITE TABLE dw_user_info_all_d PARTITION (dt='${
    bdp.system.bizdate}')
SELECT COALESCE(a.uid, b.uid) AS uid
  , b.gender
  , b.age_range
  , b.zodiac
  , a.region
  , a.device
  , a.identity
  , a.method
  , a.url
  , a.referer
  , a.time
FROM (
  SELECT *
  FROM ods_log_info_d
  WHERE dt = ${
    bdp.system.bizdate}
) a
LEFT OUTER JOIN (
  SELECT *
  FROM ods_user_info_d
  WHERE dt = ${
    bdp.system.bizdate}
) b
ON a.uid = b.uid;

 Insert picture description here
On the data development page , Select the business process > Your business processes >MaxCompute> Data development , double-click rpt_user_info_d.

 Insert picture description here

stay rpt_user_info_d Node edit page , Enter the following SQL sentence , single click Icon .

 Insert picture description here

Submit the business process

On the left side of the data development page , Double click your business process .

 Insert picture description here
In the menu bar above the edit page of the business process , single click Icon , Submit the configured nodes in the business process .

 Insert picture description here
In the submit dialog , Select all nodes , Note enter submit business process , Select to ignore the alarm of inconsistent input and output , Click Submit .
 Insert picture description here

 Insert picture description here

Running business processes .

In the menu bar above the edit page of the business process , single click Icon .
 Insert picture description here
 Insert picture description here
In the left navigation bar , single click Icon
In the temporary query panel , Right click temporary query , Select new node >ODPS SQL.
 Insert picture description here
In the new node dialog , Click Submit .
 Insert picture description here
stay SQL Query tab , Enter the following SQL sentence , single click Icon , see rpt_user_info_d Data , Confirm data output .

explain :SQL Fields in the statement dt You need to update to business date . for example , The date the task was run is 20180717, Then the business date is 20180716, That is, the day before the task running date .

select * from rpt_user_info_d where dt=${
    bdp.system.bizdate} limit 10;

 Insert picture description here
 Insert picture description here
 Insert picture description here
Returns the following , Indicates that the data has been produced
 Insert picture description here

Running tasks in a production environment .

At the top of the publish list page , Click O & M Center .

 Insert picture description here
On the O & M center page , Select periodic task operation and maintenance > Cycle task .
 Insert picture description here

On the cycle task page , single click workshop_start The business process .

 Insert picture description here
stay DAG In the figure , Right click workshop_start node , Select supplementary data > Current node and downstream node .

 Insert picture description here

In the supplementary data dialog box , Select all tasks , Click OK .

 Insert picture description here
Return to the following page , Click refresh , wait for 3~5 minute , Until all tasks run successfully .
 Insert picture description here

Monitoring data quality 、 Set the quality monitoring rules and monitoring reminders of the table

Enter table ods_raw_log_d Monitoring rules page of .

stay Chromium In the web browser , Switch to the data development tab . Click the icon at the top left , Select all products > Data governance > Data quality .

 Insert picture description here
In the left navigation bar , Select rule configuration > Configure according to the table .

 Insert picture description here

On the configure by table page , single click ods_raw_log_d Configuration monitoring rules after the table .
 Insert picture description here

Configuration table ods_raw_log_d Monitoring rules for .

In the partition expression module , single click +.
 Insert picture description here

In the add partition dialog , Partition expression selection dt=$[yyyymmdd-1], Click OK .

 Insert picture description here
In the table ods_raw_log_d Monitoring rules page of , Click Create rule .

 Insert picture description here
In the create rule panel , Select template rule > Add monitoring rule .

 Insert picture description here

In the create rule panel , Configure relevant parameters as follows , Other configurations remain default , Click batch add .

explain : This rule is mainly to avoid no data in the partition , The data source of the downstream task is empty .

Parameter description :

 Rule name : Input ods_raw_log_d Table rules .
     Strong and weak : Select strong .
     Rule templates : Select the number of table rows , Fixed value .
     Comparison mode : The choice is greater than 

 Insert picture description here

In the configuration table ods_raw_log_d Monitoring rules page of , Click test run  Insert picture description here
 Insert picture description here

 Insert picture description here

Perform associated scheduling .

Data quality support is associated with scheduling tasks . After the table rule is bound to the scheduling task , After the task instance runs, the data quality check will be triggered .

 In the table ods_raw_log_d Monitoring rules page of , Click associate schedule .

 Insert picture description here

In the associated scheduling dialog box , The task node name is entered as oss_ Data synchronization , Click Add .
 Insert picture description here
 Insert picture description here

To configure ods_user_info_d Table rules .

explain :ods_user_info_d Is the user information table , When you configure rules , You need to configure the table row number verification and primary key uniqueness verification , Avoid data duplication .

stay Chromium In the web browser , Switch to the configuration by table tab . On the configure by table page , single click ods_user_info_d Configuration monitoring rules after the table .
 Insert picture description here

In the partition expression module , single click +
 Insert picture description here

 Insert picture description here
 Insert picture description here
In the table ods_user_info_d Monitoring rules page of , Click Create rule
In the create rule panel , Configure relevant parameters as follows , Other configurations remain default , Click Add monitoring rule .

Parameter description :

 Rule name : Enter table level rules .
     Strong and weak : Select strong .
     Rule templates : Select the number of table rows , Fixed value .
     Comparison mode : The choice is greater than .
     Expectations : Input 0.

 Insert picture description here
In the create rule panel , Configure relevant parameters as follows , Other configurations remain default , Click batch add .

Parameter description :

 Rule name : Enter column rule .
     Strong and weak : Select weak .
     Rule field : choice uid(string).
     Rule templates : Select the number of duplicate values , Fixed value .
     Comparison mode : Choose less than .
     Expectations : Input 1.

 Insert picture description here

To configure ods_log_info_d Table rules

ods_log_info_d The data mainly comes from analysis ods_raw_log_d Table data . Since the data in the log cannot be configured with too much monitoring , You only need to configure the verification rule that the table data is not empty .

stay Chromium In the web browser , Switch to the configuration by table tab . On the configure by table page , single click ods_log_info_d Configuration monitoring rules after the table .

 Insert picture description here
In the add partition dialog , Partition expression selection dt=$[yyyymmdd-1], Click OK .
 Insert picture description here
In the create rule panel , Configure relevant parameters as follows , Other configurations remain default , Click batch add .

Parameter description :

 Rule name : Input ods_log_info_d Table level rules .
     Strong and weak : Select strong .
     Rule templates : Select the number of table rows , Fixed value .
     Comparison mode : Choice doesn't mean .
     Expectations : Input 0.

 Insert picture description here

To configure rpt_user_info_d Table rules .

stay Chromium In the web browser , Switch to the configuration by table tab . On the configure by table page , single click rpt_user_info_d Configuration monitoring rules after the table .

 Insert picture description here

In the add partition dialog , Partition expression selection dt=$[yyyymmdd-1], Click OK .
 Insert picture description here

In the create rule panel , Configure relevant parameters as follows , Other configurations remain default , Click Add monitoring rule .

Parameter description :

 The rules of : Input rpt_user_info_d Column level rules .
     Strong and weak : Select weak .
     Rule field : choice uid(string).
     Rule templates : Select the number of duplicate values , Fixed value .
     Comparison mode : Choose less than .
     Expectations : Input 1

 Insert picture description here

In the create rule panel , Configure relevant parameters as follows , Other configurations remain default , Click batch add .

Parameter description :

 Rule name : Input rpt_user_info_d Table level rules .
     Strong and weak : Select weak .
     Rule templates : Select the number of table rows ,7 Day volatility .
     Orange threshold : Input 1.

 Insert picture description here

adopt Quick BI Create a dashboard for user analysis portraits of a website , Implement data table rpt_user_info_d Visualization

Copy the following address , stay Chromium The web browser opens a new tab , Paste and access Quick BI Console .

http://das.base.shuju.aliyun.com/console.htm

 Insert picture description here

 Insert picture description here
 Insert picture description here

 Insert picture description here

On the data source page , Click new data source... In the upper right .

 Insert picture description here
In the add data source dialog box , Choose cloud database >MaxCompute.
Adding MaxCompute In the data source dialog box , Configure the following parameters , Click connection test , After the data source connectivity is normal , Click OK .

Parameter description :

 The display name : Custom display name , An example is test.
 Database address : Use default address , There is no need to modify .
 Project name : Input MaxCompute Project name .
AcessKey ID: Enter the... Of the sub account AccessKey ID.
AcessKey Secret: Enter the... Of the sub account AcessKey Secret.

explain : You can view... In the cloud product resource list MaxCompute Project name 、AccessKey ID and AcessKey Secret.

 Insert picture description here

 Insert picture description here
 Insert picture description here

On the data source page , Select the data source you just added , Click synchronize and refresh page .

 Insert picture description here

On the data source page , find rpt_user_info_d surface , Click under the action column Icon .

 Insert picture description here

Convert the dimension type of the field

Convert the dimension type of the date field .

In the dimension area of the dataset edit page , Right click dt Field , Select dimension type switch > date >yyyyMMdd.

 Insert picture description here

Convert the dimension type of geographic information field ( Ignore )

Note here , According to my experimental verification, the conversion will report an error !!!

In the dimension area of the dataset edit page , Right click region Field , Select dimension type switch > Geography > province / Municipalities directly under the central government .

 Insert picture description here

Save the data set

In the save dataset dialog box , The name is entered as rpt_user, Location select the root directory , Click OK .
 Insert picture description here

Make a dashboard .

As the data is updated , Let the report visually show the latest data , This process is called making a dashboard . The production process of the dashboard is : Determine content 、 Layout and style 、 Make charts and complete dynamic linkage query .

On the dataset edit page , Select Start analysis at the top right > Create a dashboard .

 Insert picture description here

On the right side of the dashboard edit page , Click on the line graph , Select indicators > Index Kanban .
 Insert picture description here

In the data panel , Put... In the dimension dt(year) Drag to filter , Will measure pv Drag to Kanban indicator / Measure .

 Insert picture description here
In the indicator Kanban panel , Click in filter dt(year) Of Icon

 Insert picture description here

In the set Filter dialog , The filter condition starts when the input is 0, The end of the filter condition is entered as 0, Click OK .
 Insert picture description here

In the indicator Kanban panel , Click Update .

 Insert picture description here
 Insert picture description here

In the upper menu bar , Drag the line graph to the canvas below .
 Insert picture description here
In the data panel , The dimension of dt(day) Drag to category axis / dimension , Will measure pv Drag to the value axis / Measure , The dimension of age_range Drag to color legend / dimension , Put... In the dimension dt(year) Drag to filter .

 Insert picture description here

Online diagram panel , Click in filter dt(year) Of Icon .
 Insert picture description here

Online diagram panel , Click Update

The result is shown in Fig.

 Insert picture description here

原网站

版权声明
本文为[Concise programming]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/175/202206240216122278.html